Loop Id: 561 | Module: libqmcwfs.so | Source: MultiBsplineRef.hpp:252-270 | Coverage: 8.79% |
---|
Loop Id: 561 | Module: libqmcwfs.so | Source: MultiBsplineRef.hpp:252-270 | Coverage: 8.79% |
---|
0x50af8 VMOVAPD (%R12,%RAX,1),%ZMM13 [9] |
0x50aff VMOVAPD (%RDI,%RAX,1),%ZMM11 [2] |
0x50b06 MOV -0x178(%RBP),%RCX [13] |
0x50b0d VMOVAPD (%R9,%RAX,1),%ZMM14 [8] |
0x50b14 VMULPD %ZMM28,%ZMM11,%ZMM0 |
0x50b1a VMULPD %ZMM26,%ZMM13,%ZMM1 |
0x50b20 VMOVAPD (%RCX,%RAX,1),%ZMM5 [1] |
0x50b27 MOV -0x170(%RBP),%RCX [13] |
0x50b2e VMULPD %ZMM24,%ZMM11,%ZMM30 |
0x50b34 VMULPD %ZMM20,%ZMM11,%ZMM11 |
0x50b3a VFMADD231PD %ZMM29,%ZMM14,%ZMM0 |
0x50b40 VFMADD231PD %ZMM27,%ZMM5,%ZMM1 |
0x50b46 VFMADD231PD %ZMM25,%ZMM14,%ZMM30 |
0x50b4c VFMADD231PD %ZMM21,%ZMM14,%ZMM11 |
0x50b52 VMULPD %ZMM18,%ZMM13,%ZMM14 |
0x50b58 VADDPD %ZMM1,%ZMM0,%ZMM0 |
0x50b5e VMULPD %ZMM22,%ZMM13,%ZMM1 |
0x50b64 VMOVAPD %ZMM0,%ZMM13 |
0x50b6a VFMADD213PD (%R15,%RAX,1),%ZMM17,%ZMM13 [15] |
0x50b71 VFMADD231PD %ZMM23,%ZMM5,%ZMM1 |
0x50b77 VFMADD132PD %ZMM19,%ZMM14,%ZMM5 |
0x50b7d VMOVUPD %ZMM13,(%R15,%RAX,1) [15] |
0x50b84 VMOVAPD %ZMM0,%ZMM13 |
0x50b8a VADDPD %ZMM5,%ZMM11,%ZMM5 |
0x50b90 VMOVAPD %ZMM0,%ZMM11 |
0x50b96 VADDPD %ZMM1,%ZMM30,%ZMM1 |
0x50b9c VFMADD213PD (%RDX,%RAX,1),%ZMM16,%ZMM11 [7] |
0x50ba3 VMOVAPD %ZMM1,%ZMM14 |
0x50ba9 VMOVUPD %ZMM11,(%RDX,%RAX,1) [7] |
0x50bb0 VMOVAPD %ZMM1,%ZMM11 |
0x50bb6 VFMADD213PD (%R13,%RAX,1),%ZMM10,%ZMM14 [11] |
0x50bbe VMOVUPD %ZMM14,(%R13,%RAX,1) [11] |
0x50bc6 VMOVAPD %ZMM0,%ZMM14 |
0x50bcc VFMADD213PD (%R14,%RAX,1),%ZMM15,%ZMM13 [5] |
0x50bd3 VMOVUPD %ZMM13,(%R14,%RAX,1) [5] |
0x50bda VFMADD213PD (%RSI,%RAX,1),%ZMM9,%ZMM11 [4] |
0x50be1 VMOVUPD %ZMM11,(%RSI,%RAX,1) [4] |
0x50be8 VFMADD213PD (%R8,%RAX,1),%ZMM7,%ZMM5 [3] |
0x50bef VMOVUPD %ZMM5,(%R8,%RAX,1) [3] |
0x50bf6 VMOVAPD %ZMM0,%ZMM5 |
0x50bfc VFMADD213PD (%R10,%RAX,1),%ZMM7,%ZMM0 [10] |
0x50c03 VFMADD213PD (%RBX,%RAX,1),%ZMM10,%ZMM5 [12] |
0x50c0a VMOVUPD %ZMM0,(%R10,%RAX,1) [10] |
0x50c11 VMOVUPD %ZMM5,(%RBX,%RAX,1) [12] |
0x50c18 VFMADD213PD (%R11,%RAX,1),%ZMM9,%ZMM14 [6] |
0x50c1f VMOVUPD %ZMM14,(%R11,%RAX,1) [6] |
0x50c26 VFMADD213PD (%RCX,%RAX,1),%ZMM7,%ZMM1 [14] |
0x50c2d VMOVUPD %ZMM1,(%RCX,%RAX,1) [14] |
0x50c34 ADD $0x40,%RAX |
0x50c38 CMP %RAX,-0x180(%RBP) [13] |
0x50c3f JNE 50af8 |
/home/kcamus/qaas_runs/169-390-4082/intel/miniqmc/build/miniqmc/src/Numerics/Spline2/MultiBsplineRef.hpp: 252 - 270 |
-------------------------------------------------------------------------------- |
252: T coefsv = coefs[n]; |
253: T coefsvzs = coefszs[n]; |
254: T coefsv2zs = coefs2zs[n]; |
255: T coefsv3zs = coefs3zs[n]; |
256: |
257: T sum0 = c[0] * coefsv + c[1] * coefsvzs + c[2] * coefsv2zs + c[3] * coefsv3zs; |
258: T sum1 = dc[0] * coefsv + dc[1] * coefsvzs + dc[2] * coefsv2zs + dc[3] * coefsv3zs; |
259: T sum2 = d2c[0] * coefsv + d2c[1] * coefsvzs + d2c[2] * coefsv2zs + d2c[3] * coefsv3zs; |
260: |
261: hxx[n] += pre20 * sum0; |
262: hxy[n] += pre11 * sum0; |
263: hxz[n] += pre10 * sum1; |
264: hyy[n] += pre02 * sum0; |
265: hyz[n] += pre01 * sum1; |
266: hzz[n] += pre00 * sum2; |
267: gx[n] += pre10 * sum0; |
268: gy[n] += pre01 * sum0; |
269: gz[n] += pre00 * sum1; |
270: vals[n] += pre00 * sum0; |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►50.00+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | libqmcwfs.so |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:100 | libqmcwfs.so |
○ | qmcplusplus::WaveFunction::rat[...] | WaveFunction.cpp:202 | libqmcwfs.so |
○ | main._omp_fn.1 | stl_vector.h:1123 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
►37.50+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | libqmcwfs.so |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:100 | libqmcwfs.so |
○ | qmcplusplus::WaveFunction::rat[...] | WaveFunction.cpp:202 | libqmcwfs.so |
○ | main._omp_fn.1 | stl_vector.h:1123 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
►8.33+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | libqmcwfs.so |
○ | qmcplusplus::SPOSet::evaluate_[...] | OhmmsVector.h:144 | libqmcwfs.so |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:263 | libqmcwfs.so |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:238 | libqmcwfs.so |
○ | qmcplusplus::WaveFunction::eva[...] | WaveFunction.cpp:178 | libqmcwfs.so |
○ | main._omp_fn.0 | miniqmc.cpp:390 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
►4.17+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | libqmcwfs.so |
○ | qmcplusplus::SPOSet::evaluate_[...] | OhmmsVector.h:144 | libqmcwfs.so |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:263 | libqmcwfs.so |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:238 | libqmcwfs.so |
○ | qmcplusplus::WaveFunction::eva[...] | WaveFunction.cpp:177 | libqmcwfs.so |
○ | main._omp_fn.0 | miniqmc.cpp:390 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.03 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.20 |
Bottlenecks | micro-operation queue, |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source | MultiBsplineRef.hpp:252-270 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 15.00 |
CQA cycles if no scalar integer | 14.50 |
CQA cycles if FP arith vectorized | 15.00 |
CQA cycles if fully vectorized | 15.00 |
Front-end cycles | 15.00 |
DIV/SQRT cycles | 12.50 |
P0 cycles | 12.50 |
P1 cycles | 9.17 |
P2 cycles | 8.83 |
P3 cycles | 10.00 |
P4 cycles | 12.50 |
P5 cycles | 1.00 |
P6 cycles | 9.00 |
P7 cycles | 0.00 |
Inter-iter dependencies cycles | 1 |
FE+BE cycles (UFS) | 15.33 |
Stall cycles (UFS) | 0.00 |
Nb insns | 51.00 |
Nb uops | 50.00 |
Nb loads | 17.00 |
Nb stores | 10.00 |
Nb stack references | 3.00 |
FLOP/cycle | 21.87 |
Nb FLOP add-sub | 24.00 |
Nb FLOP mul | 48.00 |
Nb FLOP fma | 128.00 |
Nb FLOP div | 0.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 104.00 |
Bytes prefetched | 0.00 |
Bytes loaded | 920.00 |
Bytes stored | 640.00 |
Stride 0 | 1.00 |
Stride 1 | 12.00 |
Stride n | 0.00 |
Stride unknown | 1.00 |
Stride indirect | 0.00 |
Vectorization ratio all | 100.00 |
Vectorization ratio load | 100.00 |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | 100.00 |
Vector-efficiency ratio all | 100.00 |
Vector-efficiency ratio load | 100.00 |
Vector-efficiency ratio store | 100.00 |
Vector-efficiency ratio mul | 100.00 |
Vector-efficiency ratio add_sub | 100.00 |
Vector-efficiency ratio fma | 100.00 |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | 100.00 |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.03 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.20 |
Bottlenecks | micro-operation queue, |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source | MultiBsplineRef.hpp:252-270 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 15.00 |
CQA cycles if no scalar integer | 14.50 |
CQA cycles if FP arith vectorized | 15.00 |
CQA cycles if fully vectorized | 15.00 |
Front-end cycles | 15.00 |
DIV/SQRT cycles | 12.50 |
P0 cycles | 12.50 |
P1 cycles | 9.17 |
P2 cycles | 8.83 |
P3 cycles | 10.00 |
P4 cycles | 12.50 |
P5 cycles | 1.00 |
P6 cycles | 9.00 |
P7 cycles | 0.00 |
Inter-iter dependencies cycles | 1 |
FE+BE cycles (UFS) | 15.33 |
Stall cycles (UFS) | 0.00 |
Nb insns | 51.00 |
Nb uops | 50.00 |
Nb loads | 17.00 |
Nb stores | 10.00 |
Nb stack references | 3.00 |
FLOP/cycle | 21.87 |
Nb FLOP add-sub | 24.00 |
Nb FLOP mul | 48.00 |
Nb FLOP fma | 128.00 |
Nb FLOP div | 0.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 104.00 |
Bytes prefetched | 0.00 |
Bytes loaded | 920.00 |
Bytes stored | 640.00 |
Stride 0 | 1.00 |
Stride 1 | 12.00 |
Stride n | 0.00 |
Stride unknown | 1.00 |
Stride indirect | 0.00 |
Vectorization ratio all | 100.00 |
Vectorization ratio load | 100.00 |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | 100.00 |
Vector-efficiency ratio all | 100.00 |
Vector-efficiency ratio load | 100.00 |
Vector-efficiency ratio store | 100.00 |
Vector-efficiency ratio mul | 100.00 |
Vector-efficiency ratio add_sub | 100.00 |
Vector-efficiency ratio fma | 100.00 |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | 100.00 |
Path / |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source file and lines | MultiBsplineRef.hpp:252-270 |
Module | libqmcwfs.so |
nb instructions | 51 |
nb uops | 50 |
loop length | 333 |
used x86 registers | 15 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 25 |
nb stack references | 3 |
ADD-SUB / MUL ratio | 0.50 |
micro-operation queue | 15.00 cycles |
front end | 15.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 12.50 | 1.00 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
cycles | 12.50 | 12.50 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
Cycles executing div or sqrt instructions | NA |
Longest recurrence chain latency (RecMII) | 1.00 |
FE+BE cycles | 15.33 |
Stall cycles | 0.00 |
Front-end | 15.00 |
Dispatch | 12.50 |
Data deps. | 1.00 |
Overall L1 | 15.00 |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
VMOVAPD (%R12,%RAX,1),%ZMM13 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%RDI,%RAX,1),%ZMM11 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
MOV -0x178(%RBP),%RCX | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVAPD (%R9,%RAX,1),%ZMM14 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMULPD %ZMM28,%ZMM11,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM26,%ZMM13,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD (%RCX,%RAX,1),%ZMM5 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
MOV -0x170(%RBP),%RCX | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMULPD %ZMM24,%ZMM11,%ZMM30 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM20,%ZMM11,%ZMM11 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM29,%ZMM14,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM27,%ZMM5,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM25,%ZMM14,%ZMM30 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM21,%ZMM14,%ZMM11 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM18,%ZMM13,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM1,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM22,%ZMM13,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R15,%RAX,1),%ZMM17,%ZMM13 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM23,%ZMM5,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD132PD %ZMM19,%ZMM14,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM13,(%R15,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VADDPD %ZMM5,%ZMM11,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VADDPD %ZMM1,%ZMM30,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD213PD (%RDX,%RAX,1),%ZMM16,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM1,%ZMM14 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVUPD %ZMM11,(%RDX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM1,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R13,%RAX,1),%ZMM10,%ZMM14 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM14,(%R13,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM14 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R14,%RAX,1),%ZMM15,%ZMM13 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM13,(%R14,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RSI,%RAX,1),%ZMM9,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM11,(%RSI,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R8,%RAX,1),%ZMM7,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM5,(%R8,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R10,%RAX,1),%ZMM7,%ZMM0 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD213PD (%RBX,%RAX,1),%ZMM10,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM0,(%R10,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVUPD %ZMM5,(%RBX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R11,%RAX,1),%ZMM9,%ZMM14 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM14,(%R11,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RCX,%RAX,1),%ZMM7,%ZMM1 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM1,(%RCX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
ADD $0x40,%RAX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %RAX,-0x180(%RBP) | 1 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0.25 | 0.25 | 0 | 1 | 0.50 |
JNE 50af8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source file and lines | MultiBsplineRef.hpp:252-270 |
Module | libqmcwfs.so |
nb instructions | 51 |
nb uops | 50 |
loop length | 333 |
used x86 registers | 15 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 25 |
nb stack references | 3 |
ADD-SUB / MUL ratio | 0.50 |
micro-operation queue | 15.00 cycles |
front end | 15.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 12.50 | 1.00 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
cycles | 12.50 | 12.50 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
Cycles executing div or sqrt instructions | NA |
Longest recurrence chain latency (RecMII) | 1.00 |
FE+BE cycles | 15.33 |
Stall cycles | 0.00 |
Front-end | 15.00 |
Dispatch | 12.50 |
Data deps. | 1.00 |
Overall L1 | 15.00 |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
VMOVAPD (%R12,%RAX,1),%ZMM13 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%RDI,%RAX,1),%ZMM11 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
MOV -0x178(%RBP),%RCX | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVAPD (%R9,%RAX,1),%ZMM14 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMULPD %ZMM28,%ZMM11,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM26,%ZMM13,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD (%RCX,%RAX,1),%ZMM5 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
MOV -0x170(%RBP),%RCX | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMULPD %ZMM24,%ZMM11,%ZMM30 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM20,%ZMM11,%ZMM11 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM29,%ZMM14,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM27,%ZMM5,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM25,%ZMM14,%ZMM30 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM21,%ZMM14,%ZMM11 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM18,%ZMM13,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM1,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM22,%ZMM13,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R15,%RAX,1),%ZMM17,%ZMM13 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM23,%ZMM5,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD132PD %ZMM19,%ZMM14,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM13,(%R15,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VADDPD %ZMM5,%ZMM11,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VADDPD %ZMM1,%ZMM30,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD213PD (%RDX,%RAX,1),%ZMM16,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM1,%ZMM14 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVUPD %ZMM11,(%RDX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM1,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R13,%RAX,1),%ZMM10,%ZMM14 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM14,(%R13,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM14 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R14,%RAX,1),%ZMM15,%ZMM13 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM13,(%R14,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RSI,%RAX,1),%ZMM9,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM11,(%RSI,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R8,%RAX,1),%ZMM7,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM5,(%R8,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R10,%RAX,1),%ZMM7,%ZMM0 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD213PD (%RBX,%RAX,1),%ZMM10,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM0,(%R10,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVUPD %ZMM5,(%RBX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R11,%RAX,1),%ZMM9,%ZMM14 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM14,(%R11,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RCX,%RAX,1),%ZMM7,%ZMM1 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM1,(%RCX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
ADD $0x40,%RAX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %RAX,-0x180(%RBP) | 1 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0.25 | 0.25 | 0 | 1 | 0.50 |
JNE 50af8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Metric | run_0 |
---|---|
Coverage (% app. time) | 8.79 |
Time (s) | 0.12 |
Instance Count | 73728 |
Iteration Count - min | 48 |
Iteration Count - avg | 48 |
Iteration Count - max | 48 |
Cycles per Iteration - min | 29.42 |
Cycles per Iteration - avg | 72.49 |
Cycles per Iteration - max | 16891.79 |
Metric | Value |
---|---|
Bucket Coverage (% loop time) | 75.02 |
Instance Count | 73728 |
ORIG CPI:min | 83.04 |
ORIG CPI:med | 105.63 |
ORIG CPI:max | 123.54 |
DL1 CPI:min | 18.92 |
DL1 CPI:med | 19.92 |
DL1 CPI:max | 21.96 |
ORIG (min) / DL1 (min) | 4.39 |
ORIG (med) / DL1 (med) | 5.30 |
ORIG (max) / DL1 (max) | 5.63 |
Nb Iteration:min | 48 |
Nb Iteration:med | 48.00 |
Nb Iteration:max | 48 |
ORIG: min (cycles) | 3986 |
ORIG: med (cycles) | 5070.00 |
ORIG: max (cycles) | 5930 |
DL1:min (cycles) | 908 |
DL1:med (cycles) | 956.00 |
DL1:max (cycles) | 1054 |
Metric | Value |
---|---|
Bucket Coverage (% loop time) | 24.25 |
Instance Count | 73728 |
ORIG CPI:min | 51.58 |
ORIG CPI:med | 64.33 |
ORIG CPI:max | 85.42 |
DL1 CPI:min | 19.75 |
DL1 CPI:med | 21.13 |
DL1 CPI:max | 22.42 |
ORIG (min) / DL1 (min) | 2.61 |
ORIG (med) / DL1 (med) | 3.05 |
ORIG (max) / DL1 (max) | 3.81 |
Nb Iteration:min | 48 |
Nb Iteration:med | 48.00 |
Nb Iteration:max | 48 |
ORIG: min (cycles) | 2476 |
ORIG: med (cycles) | 3088.00 |
ORIG: max (cycles) | 4100 |
DL1:min (cycles) | 948 |
DL1:med (cycles) | 1014.00 |
DL1:max (cycles) | 1076 |
Metric (average per iteration except for Time and Iteration Count) | ORIG | DL1 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | |
Time | 5070.00 | 5070.00 | 5070.00 | 5070.00 | 3986.00 | 5070.00 | 5930.00 | 956.00 | 956.00 | 956.00 | 956.00 | 908.00 | 956.00 | 1054.00 |
CPI MIN | 83.04 | 18.92 | ||||||||||||
CPI MED | 105.63 | 105.63 | 105.63 | 105.63 | 83.04 | 105.63 | 123.54 | 19.92 | 19.92 | 19.92 | 19.92 | 18.92 | 19.92 | 21.96 |
CPI AVG | 106.33 | 20.27 | ||||||||||||
CPI MAX | 123.54 | 21.96 | ||||||||||||
Iteration Count | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 |
Metric (average per iteration except for Time and Iteration Count) | ORIG | DL1 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | |
Time | 3088.00 | 3088.00 | 3088.00 | 3088.00 | 2476.00 | 3088.00 | 4100.00 | 1014.00 | 1014.00 | 1014.00 | 1014.00 | 948.00 | 1014.00 | 1076.00 |
CPI MIN | 51.58 | 19.75 | ||||||||||||
CPI MED | 64.33 | 64.33 | 64.33 | 64.33 | 51.58 | 64.33 | 85.42 | 21.13 | 21.13 | 21.13 | 21.13 | 19.75 | 21.13 | 22.42 |
CPI AVG | 65.77 | 20.91 | ||||||||||||
CPI MAX | 85.42 | 22.42 | ||||||||||||
Iteration Count | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 | 48.00 |
ORIG | DL1 | Original Code |
---|---|---|
0x75753 ADDQ $0x1,-0x461b(%RIP) 0x7575b VMOVAPD (%R12,%RAX,1),%ZMM13 | 0x75db8 VMOVAPD -0x58c2(%RIP),%ZMM13 | 0x50af8 VMOVAPD (%R12,%RAX,1),%ZMM13 |
0x75762 VMOVAPD (%RDI,%RAX,1),%ZMM11 | 0x75dc2 VMOVAPD -0x58cc(%RIP),%ZMM11 | 0x50aff VMOVAPD (%RDI,%RAX,1),%ZMM11 |
0x75769 MOV -0x178(%RBP),%RCX | 0x75dcc MOV -0x5993(%RIP),%RCX | 0x50b06 MOV -0x178(%RBP),%RCX |
0x75770 VMOVAPD (%R9,%RAX,1),%ZMM14 | 0x75dd3 VMOVAPD -0x58dd(%RIP),%ZMM14 | 0x50b0d VMOVAPD (%R9,%RAX,1),%ZMM14 |
0x75777 VMULPD %ZMM28,%ZMM11,%ZMM0 | 0x75ddd VMULPD %ZMM28,%ZMM11,%ZMM0 | 0x50b14 VMULPD %ZMM28,%ZMM11,%ZMM0 |
0x7577d VMULPD %ZMM26,%ZMM13,%ZMM1 | 0x75de3 VMULPD %ZMM26,%ZMM13,%ZMM1 | 0x50b1a VMULPD %ZMM26,%ZMM13,%ZMM1 |
0x75783 VMOVAPD (%RCX,%RAX,1),%ZMM5 | 0x75de9 VMOVAPD -0x58f3(%RIP),%ZMM5 | 0x50b20 VMOVAPD (%RCX,%RAX,1),%ZMM5 |
0x7578a MOV -0x170(%RBP),%RCX | 0x75df3 MOV -0x597a(%RIP),%RCX | 0x50b27 MOV -0x170(%RBP),%RCX |
0x75791 VMULPD %ZMM24,%ZMM11,%ZMM30 | 0x75dfa VMULPD %ZMM24,%ZMM11,%ZMM30 | 0x50b2e VMULPD %ZMM24,%ZMM11,%ZMM30 |
0x75797 VMULPD %ZMM20,%ZMM11,%ZMM11 | 0x75e00 VMULPD %ZMM20,%ZMM11,%ZMM11 | 0x50b34 VMULPD %ZMM20,%ZMM11,%ZMM11 |
0x7579d VFMADD231PD %ZMM29,%ZMM14,%ZMM0 | 0x75e06 VFMADD231PD %ZMM29,%ZMM14,%ZMM0 | 0x50b3a VFMADD231PD %ZMM29,%ZMM14,%ZMM0 |
0x757a3 VFMADD231PD %ZMM27,%ZMM5,%ZMM1 | 0x75e0c VFMADD231PD %ZMM27,%ZMM5,%ZMM1 | 0x50b40 VFMADD231PD %ZMM27,%ZMM5,%ZMM1 |
0x757a9 VFMADD231PD %ZMM25,%ZMM14,%ZMM30 | 0x75e12 VFMADD231PD %ZMM25,%ZMM14,%ZMM30 | 0x50b46 VFMADD231PD %ZMM25,%ZMM14,%ZMM30 |
0x757af VFMADD231PD %ZMM21,%ZMM14,%ZMM11 | 0x75e18 VFMADD231PD %ZMM21,%ZMM14,%ZMM11 | 0x50b4c VFMADD231PD %ZMM21,%ZMM14,%ZMM11 |
0x757b5 VMULPD %ZMM18,%ZMM13,%ZMM14 | 0x75e1e VMULPD %ZMM18,%ZMM13,%ZMM14 | 0x50b52 VMULPD %ZMM18,%ZMM13,%ZMM14 |
0x757bb VADDPD %ZMM1,%ZMM0,%ZMM0 | 0x75e24 VADDPD %ZMM1,%ZMM0,%ZMM0 | 0x50b58 VADDPD %ZMM1,%ZMM0,%ZMM0 |
0x757c1 VMULPD %ZMM22,%ZMM13,%ZMM1 | 0x75e2a VMULPD %ZMM22,%ZMM13,%ZMM1 | 0x50b5e VMULPD %ZMM22,%ZMM13,%ZMM1 |
0x757c7 VMOVAPD %ZMM0,%ZMM13 | 0x75e30 VMOVAPD %ZMM0,%ZMM13 | 0x50b64 VMOVAPD %ZMM0,%ZMM13 |
0x757cd VFMADD213PD (%R15,%RAX,1),%ZMM17,%ZMM13 | 0x75e36 VFMADD213PD -0x5940(%RIP),%ZMM17,%ZMM13 0x75e40 NOP | 0x50b6a VFMADD213PD (%R15,%RAX,1),%ZMM17,%ZMM13 |
0x757d4 VFMADD231PD %ZMM23,%ZMM5,%ZMM1 | 0x75e41 VFMADD231PD %ZMM23,%ZMM5,%ZMM1 | 0x50b71 VFMADD231PD %ZMM23,%ZMM5,%ZMM1 |
0x757da VFMADD132PD %ZMM19,%ZMM14,%ZMM5 | 0x75e47 VFMADD132PD %ZMM19,%ZMM14,%ZMM5 | 0x50b77 VFMADD132PD %ZMM19,%ZMM14,%ZMM5 |
0x757e0 VMOVUPD %ZMM13,(%R15,%RAX,1) | 0x75e4d VMOVUPD %ZMM13,-0x5917(%RIP) 0x75e57 NOP | 0x50b7d VMOVUPD %ZMM13,(%R15,%RAX,1) |
0x757e7 VMOVAPD %ZMM0,%ZMM13 | 0x75e58 VMOVAPD %ZMM0,%ZMM13 | 0x50b84 VMOVAPD %ZMM0,%ZMM13 |
0x757ed VADDPD %ZMM5,%ZMM11,%ZMM5 | 0x75e5e VADDPD %ZMM5,%ZMM11,%ZMM5 | 0x50b8a VADDPD %ZMM5,%ZMM11,%ZMM5 |
0x757f3 VMOVAPD %ZMM0,%ZMM11 | 0x75e64 VMOVAPD %ZMM0,%ZMM11 | 0x50b90 VMOVAPD %ZMM0,%ZMM11 |
0x757f9 VADDPD %ZMM1,%ZMM30,%ZMM1 | 0x75e6a VADDPD %ZMM1,%ZMM30,%ZMM1 | 0x50b96 VADDPD %ZMM1,%ZMM30,%ZMM1 |
0x757ff VFMADD213PD (%RDX,%RAX,1),%ZMM16,%ZMM11 | 0x75e70 VFMADD213PD -0x597a(%RIP),%ZMM16,%ZMM11 0x75e7a NOP | 0x50b9c VFMADD213PD (%RDX,%RAX,1),%ZMM16,%ZMM11 |
0x75806 VMOVAPD %ZMM1,%ZMM14 | 0x75e7b VMOVAPD %ZMM1,%ZMM14 | 0x50ba3 VMOVAPD %ZMM1,%ZMM14 |
0x7580c VMOVUPD %ZMM11,(%RDX,%RAX,1) | 0x75e81 VMOVUPD %ZMM11,-0x590b(%RIP) 0x75e8b NOP | 0x50ba9 VMOVUPD %ZMM11,(%RDX,%RAX,1) |
0x75813 VMOVAPD %ZMM1,%ZMM11 | 0x75e8c VMOVAPD %ZMM1,%ZMM11 | 0x50bb0 VMOVAPD %ZMM1,%ZMM11 |
0x75819 VFMADD213PD (%R13,%RAX,1),%ZMM10,%ZMM14 | 0x75e92 VFMADD213PD -0x599c(%RIP),%ZMM10,%ZMM14 0x75e9c NOP | 0x50bb6 VFMADD213PD (%R13,%RAX,1),%ZMM10,%ZMM14 |
0x75821 VMOVUPD %ZMM14,(%R13,%RAX,1) | 0x75e9d VMOVUPD %ZMM14,-0x58e7(%RIP) 0x75ea7 NOP | 0x50bbe VMOVUPD %ZMM14,(%R13,%RAX,1) |
0x75829 VMOVAPD %ZMM0,%ZMM14 | 0x75ea8 VMOVAPD %ZMM0,%ZMM14 | 0x50bc6 VMOVAPD %ZMM0,%ZMM14 |
0x7582f VFMADD213PD (%R14,%RAX,1),%ZMM15,%ZMM13 | 0x75eae VFMADD213PD -0x59b8(%RIP),%ZMM15,%ZMM13 0x75eb8 NOP | 0x50bcc VFMADD213PD (%R14,%RAX,1),%ZMM15,%ZMM13 |
0x75836 VMOVUPD %ZMM13,(%R14,%RAX,1) | 0x75eb9 VMOVUPD %ZMM13,-0x58c3(%RIP) 0x75ec3 NOP | 0x50bd3 VMOVUPD %ZMM13,(%R14,%RAX,1) |
0x7583d VFMADD213PD (%RSI,%RAX,1),%ZMM9,%ZMM11 | 0x75ec4 VFMADD213PD -0x59ce(%RIP),%ZMM9,%ZMM11 0x75ece NOP | 0x50bda VFMADD213PD (%RSI,%RAX,1),%ZMM9,%ZMM11 |
0x75844 VMOVUPD %ZMM11,(%RSI,%RAX,1) | 0x75ecf VMOVUPD %ZMM11,-0x5899(%RIP) 0x75ed9 NOP | 0x50be1 VMOVUPD %ZMM11,(%RSI,%RAX,1) |
0x7584b VFMADD213PD (%R8,%RAX,1),%ZMM7,%ZMM5 | 0x75eda VFMADD213PD -0x59e4(%RIP),%ZMM7,%ZMM5 0x75ee4 NOP | 0x50be8 VFMADD213PD (%R8,%RAX,1),%ZMM7,%ZMM5 |
0x75852 VMOVUPD %ZMM5,(%R8,%RAX,1) | 0x75ee5 VMOVUPD %ZMM5,-0x586f(%RIP) 0x75eef NOP | 0x50bef VMOVUPD %ZMM5,(%R8,%RAX,1) |
0x75859 VMOVAPD %ZMM0,%ZMM5 | 0x75ef0 VMOVAPD %ZMM0,%ZMM5 | 0x50bf6 VMOVAPD %ZMM0,%ZMM5 |
0x7585f VFMADD213PD (%R10,%RAX,1),%ZMM7,%ZMM0 | 0x75ef6 VFMADD213PD -0x5a00(%RIP),%ZMM7,%ZMM0 0x75f00 NOP | 0x50bfc VFMADD213PD (%R10,%RAX,1),%ZMM7,%ZMM0 |
0x75866 VFMADD213PD (%RBX,%RAX,1),%ZMM10,%ZMM5 | 0x75f01 VFMADD213PD -0x5a0b(%RIP),%ZMM10,%ZMM5 0x75f0b NOP | 0x50c03 VFMADD213PD (%RBX,%RAX,1),%ZMM10,%ZMM5 |
0x7586d VMOVUPD %ZMM0,(%R10,%RAX,1) | 0x75f0c VMOVUPD %ZMM0,-0x5856(%RIP) 0x75f16 NOP | 0x50c0a VMOVUPD %ZMM0,(%R10,%RAX,1) |
0x75874 VMOVUPD %ZMM5,(%RBX,%RAX,1) | 0x75f17 VMOVUPD %ZMM5,-0x5821(%RIP) 0x75f21 NOP | 0x50c11 VMOVUPD %ZMM5,(%RBX,%RAX,1) |
0x7587b VFMADD213PD (%R11,%RAX,1),%ZMM9,%ZMM14 | 0x75f22 VFMADD213PD -0x5a2c(%RIP),%ZMM9,%ZMM14 0x75f2c NOP | 0x50c18 VFMADD213PD (%R11,%RAX,1),%ZMM9,%ZMM14 |
0x75882 VMOVUPD %ZMM14,(%R11,%RAX,1) | 0x75f2d VMOVUPD %ZMM14,-0x57f7(%RIP) 0x75f37 NOP | 0x50c1f VMOVUPD %ZMM14,(%R11,%RAX,1) |
0x75889 VFMADD213PD (%RCX,%RAX,1),%ZMM7,%ZMM1 | 0x75f38 VFMADD213PD -0x5a42(%RIP),%ZMM7,%ZMM1 0x75f42 NOP | 0x50c26 VFMADD213PD (%RCX,%RAX,1),%ZMM7,%ZMM1 |
0x75890 VMOVUPD %ZMM1,(%RCX,%RAX,1) | 0x75f43 VMOVUPD %ZMM1,-0x57cd(%RIP) 0x75f4d NOP | 0x50c2d VMOVUPD %ZMM1,(%RCX,%RAX,1) |
0x75897 ADD $0x40,%RAX | 0x75f4e ADD $0x40,%RAX | 0x50c34 ADD $0x40,%RAX |
0x7589b CMP %RAX,-0x180(%RBP) | 0x75f52 CMP %RAX,-0x5a99(%RIP) | 0x50c38 CMP %RAX,-0x180(%RBP) |
0x758a2 JNE 75753 <_ZN16miniqmcreference19MultiBsplineEvalRef12evaluate_vghIdEEvPKN11qmcplusplus14bspline_traitsIT_Lj3EE10SplineTypeES4_S4_S4_PS4_S9_S9_m+0x253e3> | 0x75f59 JNE 75db8 <_ZN16miniqmcreference19MultiBsplineEvalRef12evaluate_vghIdEEvPKN11qmcplusplus14bspline_traitsIT_Lj3EE10SplineTypeES4_S4_S4_PS4_S9_S9_m+0x25a48> | 0x50c3f JNE 50af8 <_ZN16miniqmcreference19MultiBsplineEvalRef12evaluate_vghIdEEvPKN11qmcplusplus14bspline_traitsIT_Lj3EE10SplineTypeES4_S4_S4_PS4_S9_S9_m+0x788> |
Path / |
Metric | ORIG | DL1 | Original |
---|---|---|---|
FP operations per cycle L1 | 21.16, 21.16, | 18.74, 18.74, | 21.87, 21.87, |
cycles L1 CQA | 15.50 | 17.50 | 15.00 |
cycles UFS | 15.82 | 17.74 | 15.33 |
bytes loaded | 928.00 | 920.00 | 920.00 |
bytes stored | 648.00 | 640.00 | 640.00 |
nb loads | 18.00 | 17.00 | 17.00 |
nb stores | 11.00 | 10.00 | 10.00 |
cycles dispatch | 12.50 | 12.50 | 12.50 |
cycles front end | 15.50 | 17.50 | 15.00 |
cycles P0 | 12.50 | 12.50 | 12.50 |
cycles P1 | 12.50 | 12.50 | 12.50 |
cycles P2 | 9.67 | 9.17 | 9.17 |
cycles P3 | 9.67 | 8.83 | 8.83 |
cycles P4 | 11.00 | 10.00 | 10.00 |
cycles P5 | 12.50 | 12.50 | 12.50 |
cycles P6 | 1.50 | 1.00 | 1.00 |
cycles P7 | 9.67 | 9.00 | 9.00 |
stall cycles | 0.00 | 0.00 | 0.00 |
LB full | 0.00 | 0.00 | 0.00 |
LM full | 0.00 | 0.00 | 0.00 |
PRF full | 0.00 | 0.00 | 0.00 |
PRF_FLOAT full | 0.00 | 0.00 | 0.00 |
PRF_INT full | 0.00 | 0.00 | 0.00 |
ROB full | 0.00 | 0.00 | 0.00 |
RS full | 0.00 | 0.00 | 0.00 |
SB full | 0.00 | 0.00 | 0.00 |
nb uops | 52.00 | 70.00 | 50.00 |
uops P0 | 12.50 | 12.50 | 12.50 |
uops P1 | 1.50 | 1.00 | 1.00 |
uops P2 | 9.67 | 9.17 | 9.17 |
uops P3 | 9.67 | 8.83 | 8.83 |
uops P4 | 11.00 | 10.00 | 10.00 |
uops P5 | 12.50 | 12.50 | 12.50 |
uops P6 | 1.50 | 1.00 | 1.00 |
uops P7 | 9.67 | 9.00 | 9.00 |
ID | 568 | 570 | 561 |
Metric | ORIG | DL1 | Original |
---|---|---|---|
FP operations per cycle L1 | 21.16, 21.16, | 18.74, 18.74, | 21.87, 21.87, |
cycles L1 CQA | 15.50 | 17.50 | 15.00 |
cycles UFS | 15.82 | 17.74 | 15.33 |
bytes loaded | 928.00 | 920.00 | 920.00 |
bytes stored | 648.00 | 640.00 | 640.00 |
nb loads | 18.00 | 17.00 | 17.00 |
nb stores | 11.00 | 10.00 | 10.00 |
cycles dispatch | 12.50 | 12.50 | 12.50 |
cycles front end | 15.50 | 17.50 | 15.00 |
cycles P0 | 12.50 | 12.50 | 12.50 |
cycles P1 | 12.50 | 12.50 | 12.50 |
cycles P2 | 9.67 | 9.17 | 9.17 |
cycles P3 | 9.67 | 8.83 | 8.83 |
cycles P4 | 11.00 | 10.00 | 10.00 |
cycles P5 | 12.50 | 12.50 | 12.50 |
cycles P6 | 1.50 | 1.00 | 1.00 |
cycles P7 | 9.67 | 9.00 | 9.00 |
stall cycles | 0.00 | 0.00 | 0.00 |
LB full | 0.00 | 0.00 | 0.00 |
LM full | 0.00 | 0.00 | 0.00 |
PRF full | 0.00 | 0.00 | 0.00 |
PRF_FLOAT full | 0.00 | 0.00 | 0.00 |
PRF_INT full | 0.00 | 0.00 | 0.00 |
ROB full | 0.00 | 0.00 | 0.00 |
RS full | 0.00 | 0.00 | 0.00 |
SB full | 0.00 | 0.00 | 0.00 |
nb uops | 52.00 | 70.00 | 50.00 |
uops P0 | 12.50 | 12.50 | 12.50 |
uops P1 | 1.50 | 1.00 | 1.00 |
uops P2 | 9.67 | 9.17 | 9.17 |
uops P3 | 9.67 | 8.83 | 8.83 |
uops P4 | 11.00 | 10.00 | 10.00 |
uops P5 | 12.50 | 12.50 | 12.50 |
uops P6 | 1.50 | 1.00 | 1.00 |
uops P7 | 9.67 | 9.00 | 9.00 |
ID | 568 | 570 | 561 |