Loop Id: 739 | Module: exec | Source: MultiBsplineRef.hpp:252-270 | Coverage: 8.67% |
---|
Loop Id: 739 | Module: exec | Source: MultiBsplineRef.hpp:252-270 | Coverage: 8.67% |
---|
0x45bf50 MOV 0x248(%RSP),%R9 [14] |
0x45bf58 VMOVAPD (%R11,%RAX,1),%ZMM11 [11] |
0x45bf5f VMOVAPD (%R14,%RAX,1),%ZMM13 [8] |
0x45bf66 VMOVAPD (%R13,%RAX,1),%ZMM6 [2] |
0x45bf6e VMOVAPD (%R9,%RAX,1),%ZMM8 [13] |
0x45bf75 VMULPD %ZMM26,%ZMM11,%ZMM0 |
0x45bf7b MOV 0x250(%RSP),%R9 [14] |
0x45bf83 VMULPD %ZMM22,%ZMM11,%ZMM5 |
0x45bf89 VMULPD %ZMM28,%ZMM8,%ZMM14 |
0x45bf8f VFMADD231PD %ZMM27,%ZMM6,%ZMM0 |
0x45bf95 VFMADD231PD %ZMM23,%ZMM6,%ZMM5 |
0x45bf9b VFMADD231PD %ZMM29,%ZMM13,%ZMM14 |
0x45bfa1 VADDPD %ZMM0,%ZMM14,%ZMM0 |
0x45bfa7 VMULPD %ZMM24,%ZMM8,%ZMM14 |
0x45bfad VMULPD %ZMM20,%ZMM8,%ZMM8 |
0x45bfb3 VFMADD231PD %ZMM25,%ZMM13,%ZMM14 |
0x45bfb9 VFMADD231PD %ZMM21,%ZMM13,%ZMM8 |
0x45bfbf VMULPD %ZMM18,%ZMM11,%ZMM13 |
0x45bfc5 VMOVAPD %ZMM0,%ZMM11 |
0x45bfcb VFMADD213PD (%R12,%RAX,1),%ZMM17,%ZMM11 [3] |
0x45bfd2 VADDPD %ZMM5,%ZMM14,%ZMM14 |
0x45bfd8 VMOVAPD %ZMM0,%ZMM5 |
0x45bfde VFMADD132PD %ZMM19,%ZMM13,%ZMM6 |
0x45bfe4 VMOVAPD %ZMM0,%ZMM13 |
0x45bfea VMOVUPD %ZMM11,(%R12,%RAX,1) [3] |
0x45bff1 VMOVAPD %ZMM14,%ZMM11 |
0x45bff7 VFMADD213PD (%RBX,%RAX,1),%ZMM16,%ZMM5 [10] |
0x45bffe VADDPD %ZMM6,%ZMM8,%ZMM6 |
0x45c004 VMOVAPD %ZMM14,%ZMM8 |
0x45c00a VMOVUPD %ZMM5,(%RBX,%RAX,1) [10] |
0x45c011 VMOVAPD %ZMM0,%ZMM5 |
0x45c017 VFMADD213PD (%RDX,%RAX,1),%ZMM10,%ZMM8 [4] |
0x45c01e VMOVUPD %ZMM8,(%RDX,%RAX,1) [4] |
0x45c025 VFMADD213PD (%R15,%RAX,1),%ZMM15,%ZMM13 [5] |
0x45c02c VMOVUPD %ZMM13,(%R15,%RAX,1) [5] |
0x45c033 VFMADD213PD (%RCX,%RAX,1),%ZMM9,%ZMM11 [15] |
0x45c03a VMOVUPD %ZMM11,(%RCX,%RAX,1) [15] |
0x45c041 VFMADD213PD (%RDI,%RAX,1),%ZMM7,%ZMM6 [1] |
0x45c048 VMOVUPD %ZMM6,(%RDI,%RAX,1) [1] |
0x45c04f VMOVAPD %ZMM0,%ZMM6 |
0x45c055 VFMADD213PD (%RSI,%RAX,1),%ZMM7,%ZMM0 [9] |
0x45c05c VFMADD213PD (%R8,%RAX,1),%ZMM10,%ZMM6 [12] |
0x45c063 VMOVUPD %ZMM0,(%RSI,%RAX,1) [9] |
0x45c06a VMOVUPD %ZMM6,(%R8,%RAX,1) [12] |
0x45c071 VFMADD213PD (%R10,%RAX,1),%ZMM9,%ZMM5 [6] |
0x45c078 VMOVUPD %ZMM5,(%R10,%RAX,1) [6] |
0x45c07f VFMADD213PD (%R9,%RAX,1),%ZMM7,%ZMM14 [7] |
0x45c086 VMOVUPD %ZMM14,(%R9,%RAX,1) [7] |
0x45c08d ADD $0x40,%RAX |
0x45c091 CMP %RAX,0x240(%RSP) [14] |
0x45c099 JNE 45bf50 |
/home/kcamus/qaas_runs/169-451-1869/intel/miniqmc/build/miniqmc/src/Numerics/Spline2/MultiBsplineRef.hpp: 252 - 270 |
-------------------------------------------------------------------------------- |
252: T coefsv = coefs[n]; |
253: T coefsvzs = coefszs[n]; |
254: T coefsv2zs = coefs2zs[n]; |
255: T coefsv3zs = coefs3zs[n]; |
256: |
257: T sum0 = c[0] * coefsv + c[1] * coefsvzs + c[2] * coefsv2zs + c[3] * coefsv3zs; |
258: T sum1 = dc[0] * coefsv + dc[1] * coefsvzs + dc[2] * coefsv2zs + dc[3] * coefsv3zs; |
259: T sum2 = d2c[0] * coefsv + d2c[1] * coefsvzs + d2c[2] * coefsv2zs + d2c[3] * coefsv3zs; |
260: |
261: hxx[n] += pre20 * sum0; |
262: hxy[n] += pre11 * sum0; |
263: hxz[n] += pre10 * sum1; |
264: hyy[n] += pre02 * sum0; |
265: hyz[n] += pre01 * sum1; |
266: hzz[n] += pre00 * sum2; |
267: gx[n] += pre10 * sum0; |
268: gy[n] += pre01 * sum0; |
269: gz[n] += pre00 * sum1; |
270: vals[n] += pre00 * sum0; |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►42.49+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | exec |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:100 | exec |
○ | qmcplusplus::WaveFunction::rat[...] | WaveFunction.cpp:202 | exec |
○ | main._omp_fn.1 | miniqmc.cpp:438 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
►41.61+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | exec |
○ | miniqmcreference::DiracDetermi[...] | DiracDeterminantRef.cpp:100 | exec |
○ | qmcplusplus::WaveFunction::rat[...] | WaveFunction.cpp:202 | exec |
○ | main._omp_fn.1 | miniqmc.cpp:438 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
►8.10+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | exec |
○ | miniqmcreference::DiracDetermi[...] | OhmmsVector.h:144 | exec |
○ | qmcplusplus::WaveFunction::eva[...] | WaveFunction.cpp:178 | exec |
○ | main._omp_fn.0 | miniqmc.cpp:390 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
►7.80+ | miniqmcreference::einspline_sp[...] | einspline_spo_ref.hpp:206 | exec |
○ | miniqmcreference::DiracDetermi[...] | OhmmsVector.h:144 | exec |
○ | qmcplusplus::WaveFunction::eva[...] | WaveFunction.cpp:177 | exec |
○ | main._omp_fn.0 | miniqmc.cpp:390 | exec |
○ | GOMP_parallel | libgomp.h:985 | libgomp.so.1.0.0 |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.03 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.20 |
Bottlenecks | micro-operation queue, |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source | MultiBsplineRef.hpp:252-270 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 15.00 |
CQA cycles if no scalar integer | 14.50 |
CQA cycles if FP arith vectorized | 15.00 |
CQA cycles if fully vectorized | 15.00 |
Front-end cycles | 15.00 |
DIV/SQRT cycles | 12.50 |
P0 cycles | 12.50 |
P1 cycles | 9.17 |
P2 cycles | 8.83 |
P3 cycles | 10.00 |
P4 cycles | 12.50 |
P5 cycles | 1.00 |
P6 cycles | 9.00 |
P7 cycles | 0.00 |
Inter-iter dependencies cycles | 1 |
FE+BE cycles (UFS) | 15.33 |
Stall cycles (UFS) | 0.00 |
Nb insns | 51.00 |
Nb uops | 50.00 |
Nb loads | 17.00 |
Nb stores | 10.00 |
Nb stack references | 3.00 |
FLOP/cycle | 21.87 |
Nb FLOP add-sub | 24.00 |
Nb FLOP mul | 48.00 |
Nb FLOP fma | 128.00 |
Nb FLOP div | 0.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 104.00 |
Bytes prefetched | 0.00 |
Bytes loaded | 920.00 |
Bytes stored | 640.00 |
Stride 0 | 1.00 |
Stride 1 | 12.00 |
Stride n | 0.00 |
Stride unknown | 1.00 |
Stride indirect | 0.00 |
Vectorization ratio all | 100.00 |
Vectorization ratio load | 100.00 |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | 100.00 |
Vector-efficiency ratio all | 100.00 |
Vector-efficiency ratio load | 100.00 |
Vector-efficiency ratio store | 100.00 |
Vector-efficiency ratio mul | 100.00 |
Vector-efficiency ratio add_sub | 100.00 |
Vector-efficiency ratio fma | 100.00 |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | 100.00 |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.03 |
CQA speedup if FP arith vectorized | 1.00 |
CQA speedup if fully vectorized | 1.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.20 |
Bottlenecks | micro-operation queue, |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source | MultiBsplineRef.hpp:252-270 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 15.00 |
CQA cycles if no scalar integer | 14.50 |
CQA cycles if FP arith vectorized | 15.00 |
CQA cycles if fully vectorized | 15.00 |
Front-end cycles | 15.00 |
DIV/SQRT cycles | 12.50 |
P0 cycles | 12.50 |
P1 cycles | 9.17 |
P2 cycles | 8.83 |
P3 cycles | 10.00 |
P4 cycles | 12.50 |
P5 cycles | 1.00 |
P6 cycles | 9.00 |
P7 cycles | 0.00 |
Inter-iter dependencies cycles | 1 |
FE+BE cycles (UFS) | 15.33 |
Stall cycles (UFS) | 0.00 |
Nb insns | 51.00 |
Nb uops | 50.00 |
Nb loads | 17.00 |
Nb stores | 10.00 |
Nb stack references | 3.00 |
FLOP/cycle | 21.87 |
Nb FLOP add-sub | 24.00 |
Nb FLOP mul | 48.00 |
Nb FLOP fma | 128.00 |
Nb FLOP div | 0.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 104.00 |
Bytes prefetched | 0.00 |
Bytes loaded | 920.00 |
Bytes stored | 640.00 |
Stride 0 | 1.00 |
Stride 1 | 12.00 |
Stride n | 0.00 |
Stride unknown | 1.00 |
Stride indirect | 0.00 |
Vectorization ratio all | 100.00 |
Vectorization ratio load | 100.00 |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | 100.00 |
Vector-efficiency ratio all | 100.00 |
Vector-efficiency ratio load | 100.00 |
Vector-efficiency ratio store | 100.00 |
Vector-efficiency ratio mul | 100.00 |
Vector-efficiency ratio add_sub | 100.00 |
Vector-efficiency ratio fma | 100.00 |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | 100.00 |
Path / |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source file and lines | MultiBsplineRef.hpp:252-270 |
Module | exec |
nb instructions | 51 |
nb uops | 50 |
loop length | 335 |
used x86 registers | 15 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 25 |
nb stack references | 3 |
ADD-SUB / MUL ratio | 0.50 |
micro-operation queue | 15.00 cycles |
front end | 15.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 12.50 | 1.00 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
cycles | 12.50 | 12.50 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
Cycles executing div or sqrt instructions | NA |
Longest recurrence chain latency (RecMII) | 1.00 |
FE+BE cycles | 15.33 |
Stall cycles | 0.00 |
Front-end | 15.00 |
Dispatch | 12.50 |
Data deps. | 1.00 |
Overall L1 | 15.00 |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
MOV 0x248(%RSP),%R9 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVAPD (%R11,%RAX,1),%ZMM11 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%R14,%RAX,1),%ZMM13 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%R13,%RAX,1),%ZMM6 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%R9,%RAX,1),%ZMM8 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMULPD %ZMM26,%ZMM11,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
MOV 0x250(%RSP),%R9 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMULPD %ZMM22,%ZMM11,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM28,%ZMM8,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM27,%ZMM6,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM23,%ZMM6,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM29,%ZMM13,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM0,%ZMM14,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM24,%ZMM8,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM20,%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM25,%ZMM13,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM21,%ZMM13,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM18,%ZMM11,%ZMM13 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R12,%RAX,1),%ZMM17,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM5,%ZMM14,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD132PD %ZMM19,%ZMM13,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVUPD %ZMM11,(%R12,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM14,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%RBX,%RAX,1),%ZMM16,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM6,%ZMM8,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM14,%ZMM8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVUPD %ZMM5,(%RBX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%RDX,%RAX,1),%ZMM10,%ZMM8 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM8,(%RDX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R15,%RAX,1),%ZMM15,%ZMM13 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM13,(%R15,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RCX,%RAX,1),%ZMM9,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM11,(%RCX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RDI,%RAX,1),%ZMM7,%ZMM6 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM6,(%RDI,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%RSI,%RAX,1),%ZMM7,%ZMM0 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD213PD (%R8,%RAX,1),%ZMM10,%ZMM6 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM0,(%RSI,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVUPD %ZMM6,(%R8,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R10,%RAX,1),%ZMM9,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM5,(%R10,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R9,%RAX,1),%ZMM7,%ZMM14 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM14,(%R9,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
ADD $0x40,%RAX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %RAX,0x240(%RSP) | 1 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0.25 | 0.25 | 0 | 1 | 0.50 |
JNE 45bf50 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Function | void miniqmcreference::MultiBsplineEvalRef::evaluate_vgh |
Source file and lines | MultiBsplineRef.hpp:252-270 |
Module | exec |
nb instructions | 51 |
nb uops | 50 |
loop length | 335 |
used x86 registers | 15 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 25 |
nb stack references | 3 |
ADD-SUB / MUL ratio | 0.50 |
micro-operation queue | 15.00 cycles |
front end | 15.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
---|---|---|---|---|---|---|---|---|
uops | 12.50 | 1.00 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
cycles | 12.50 | 12.50 | 9.17 | 8.83 | 10.00 | 12.50 | 1.00 | 9.00 |
Cycles executing div or sqrt instructions | NA |
Longest recurrence chain latency (RecMII) | 1.00 |
FE+BE cycles | 15.33 |
Stall cycles | 0.00 |
Front-end | 15.00 |
Dispatch | 12.50 |
Data deps. | 1.00 |
Overall L1 | 15.00 |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
other | 100% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|
MOV 0x248(%RSP),%R9 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMOVAPD (%R11,%RAX,1),%ZMM11 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%R14,%RAX,1),%ZMM13 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%R13,%RAX,1),%ZMM6 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMOVAPD (%R9,%RAX,1),%ZMM8 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 5-6 | 0.50 |
VMULPD %ZMM26,%ZMM11,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
MOV 0x250(%RSP),%R9 | 1 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4-5 | 0.50 |
VMULPD %ZMM22,%ZMM11,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM28,%ZMM8,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM27,%ZMM6,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM23,%ZMM6,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM29,%ZMM13,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM0,%ZMM14,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM24,%ZMM8,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM20,%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM25,%ZMM13,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD231PD %ZMM21,%ZMM13,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMULPD %ZMM18,%ZMM11,%ZMM13 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%R12,%RAX,1),%ZMM17,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM5,%ZMM14,%ZMM14 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD132PD %ZMM19,%ZMM13,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM0,%ZMM13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVUPD %ZMM11,(%R12,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM14,%ZMM11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%RBX,%RAX,1),%ZMM16,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VADDPD %ZMM6,%ZMM8,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVAPD %ZMM14,%ZMM8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VMOVUPD %ZMM5,(%RBX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%RDX,%RAX,1),%ZMM10,%ZMM8 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM8,(%RDX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R15,%RAX,1),%ZMM15,%ZMM13 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM13,(%R15,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RCX,%RAX,1),%ZMM9,%ZMM11 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM11,(%RCX,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%RDI,%RAX,1),%ZMM7,%ZMM6 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM6,(%RDI,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVAPD %ZMM0,%ZMM6 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
VFMADD213PD (%RSI,%RAX,1),%ZMM7,%ZMM0 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VFMADD213PD (%R8,%RAX,1),%ZMM10,%ZMM6 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM0,(%RSI,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VMOVUPD %ZMM6,(%R8,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R10,%RAX,1),%ZMM9,%ZMM5 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM5,(%R10,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
VFMADD213PD (%R9,%RAX,1),%ZMM7,%ZMM14 | 1 | 0.50 | 0 | 0.50 | 0.50 | 0 | 0.50 | 0 | 0 | 4 | 0.50 |
VMOVUPD %ZMM14,(%R9,%RAX,1) | 1 | 0 | 0 | 0.33 | 0.33 | 1 | 0 | 0 | 0.33 | 3 | 1 |
ADD $0x40,%RAX | 1 | 0.25 | 0.25 | 0 | 0 | 0 | 0.25 | 0.25 | 0 | 1 | 0.25 |
CMP %RAX,0x240(%RSP) | 1 | 0.25 | 0.25 | 0.50 | 0.50 | 0 | 0.25 | 0.25 | 0 | 1 | 0.50 |
JNE 45bf50 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0.50-1 |
Metric | run_0 |
---|---|
Coverage (% app. time) | 8.67 |
Time (s) | 8.46 |
Instance Count | 589824 |
Iteration Count - min | 384 |
Iteration Count - avg | 384 |
Iteration Count - max | 384 |
Cycles per Iteration - min | 66.7 |
Cycles per Iteration - avg | 86.79 |
Cycles per Iteration - max | 2132.02 |
Metric | Value |
---|---|
Bucket Coverage (% loop time) | 99.36 |
Instance Count | 589824 |
ORIG CPI:min | 82.55 |
ORIG CPI:med | 84.09 |
ORIG CPI:max | 126.64 |
DL1 CPI:min | 17.94 |
DL1 CPI:med | 18.18 |
DL1 CPI:max | 18.29 |
ORIG (min) / DL1 (min) | 4.60 |
ORIG (med) / DL1 (med) | 4.62 |
ORIG (max) / DL1 (max) | 6.92 |
Nb Iteration:min | 384 |
Nb Iteration:med | 384.00 |
Nb Iteration:max | 384 |
ORIG: min (cycles) | 31698 |
ORIG: med (cycles) | 32290.00 |
ORIG: max (cycles) | 48630 |
DL1:min (cycles) | 6888 |
DL1:med (cycles) | 6982.00 |
DL1:max (cycles) | 7024 |
Metric (average per iteration except for Time and Iteration Count) | ORIG | DL1 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | Min (Thread) | Med (Thread) | Avg (Thread) | Max (Thread) | Min (Instances) | Med (Instances) | Max (Instances) | |
Time | 32290.00 | 32290.00 | 32290.00 | 32290.00 | 31698.00 | 32290.00 | 48630.00 | 6982.00 | 6982.00 | 6982.00 | 6982.00 | 6888.00 | 6982.00 | 7024.00 |
CPI MIN | 82.55 | 17.94 | ||||||||||||
CPI MED | 84.09 | 84.09 | 84.09 | 84.09 | 82.55 | 84.09 | 126.64 | 18.18 | 18.18 | 18.18 | 18.18 | 17.94 | 18.18 | 18.29 |
CPI AVG | 86.09 | 18.12 | ||||||||||||
CPI MAX | 126.64 | 18.29 | ||||||||||||
Iteration Count | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 | 384.00 |
ORIG | DL1 | Original Code |
---|---|---|
0x4e6cc9 ADDQ $0x1,-0x2c51(%RIP) 0x4e6cd1 MOV 0x248(%RSP),%R9 | 0x4e7333 MOV -0x4aba(%RIP),%R9 | 0x45bf50 MOV 0x248(%RSP),%R9 |
0x4e6cd9 VMOVAPD (%R11,%RAX,1),%ZMM11 | 0x4e733a VMOVAPD -0x4a04(%RIP),%ZMM11 | 0x45bf58 VMOVAPD (%R11,%RAX,1),%ZMM11 |
0x4e6ce0 VMOVAPD (%R14,%RAX,1),%ZMM13 | 0x4e7344 VMOVAPD -0x4a0e(%RIP),%ZMM13 | 0x45bf5f VMOVAPD (%R14,%RAX,1),%ZMM13 |
0x4e6ce7 VMOVAPD (%R13,%RAX,1),%ZMM6 | 0x4e734e VMOVAPD -0x4a18(%RIP),%ZMM6 | 0x45bf66 VMOVAPD (%R13,%RAX,1),%ZMM6 |
0x4e6cef VMOVAPD (%R9,%RAX,1),%ZMM8 | 0x4e7358 VMOVAPD -0x4a22(%RIP),%ZMM8 | 0x45bf6e VMOVAPD (%R9,%RAX,1),%ZMM8 |
0x4e6cf6 VMULPD %ZMM26,%ZMM11,%ZMM0 | 0x4e7362 VMULPD %ZMM26,%ZMM11,%ZMM0 | 0x45bf75 VMULPD %ZMM26,%ZMM11,%ZMM0 |
0x4e6cfc MOV 0x250(%RSP),%R9 | 0x4e7368 MOV -0x4aaf(%RIP),%R9 | 0x45bf7b MOV 0x250(%RSP),%R9 |
0x4e6d04 VMULPD %ZMM22,%ZMM11,%ZMM5 | 0x4e736f VMULPD %ZMM22,%ZMM11,%ZMM5 | 0x45bf83 VMULPD %ZMM22,%ZMM11,%ZMM5 |
0x4e6d0a VMULPD %ZMM28,%ZMM8,%ZMM14 | 0x4e7375 VMULPD %ZMM28,%ZMM8,%ZMM14 | 0x45bf89 VMULPD %ZMM28,%ZMM8,%ZMM14 |
0x4e6d10 VFMADD231PD %ZMM27,%ZMM6,%ZMM0 | 0x4e737b VFMADD231PD %ZMM27,%ZMM6,%ZMM0 | 0x45bf8f VFMADD231PD %ZMM27,%ZMM6,%ZMM0 |
0x4e6d16 VFMADD231PD %ZMM23,%ZMM6,%ZMM5 | 0x4e7381 VFMADD231PD %ZMM23,%ZMM6,%ZMM5 | 0x45bf95 VFMADD231PD %ZMM23,%ZMM6,%ZMM5 |
0x4e6d1c VFMADD231PD %ZMM29,%ZMM13,%ZMM14 | 0x4e7387 VFMADD231PD %ZMM29,%ZMM13,%ZMM14 | 0x45bf9b VFMADD231PD %ZMM29,%ZMM13,%ZMM14 |
0x4e6d22 VADDPD %ZMM0,%ZMM14,%ZMM0 | 0x4e738d VADDPD %ZMM0,%ZMM14,%ZMM0 | 0x45bfa1 VADDPD %ZMM0,%ZMM14,%ZMM0 |
0x4e6d28 VMULPD %ZMM24,%ZMM8,%ZMM14 | 0x4e7393 VMULPD %ZMM24,%ZMM8,%ZMM14 | 0x45bfa7 VMULPD %ZMM24,%ZMM8,%ZMM14 |
0x4e6d2e VMULPD %ZMM20,%ZMM8,%ZMM8 | 0x4e7399 VMULPD %ZMM20,%ZMM8,%ZMM8 | 0x45bfad VMULPD %ZMM20,%ZMM8,%ZMM8 |
0x4e6d34 VFMADD231PD %ZMM25,%ZMM13,%ZMM14 | 0x4e739f VFMADD231PD %ZMM25,%ZMM13,%ZMM14 | 0x45bfb3 VFMADD231PD %ZMM25,%ZMM13,%ZMM14 |
0x4e6d3a VFMADD231PD %ZMM21,%ZMM13,%ZMM8 | 0x4e73a5 VFMADD231PD %ZMM21,%ZMM13,%ZMM8 | 0x45bfb9 VFMADD231PD %ZMM21,%ZMM13,%ZMM8 |
0x4e6d40 VMULPD %ZMM18,%ZMM11,%ZMM13 | 0x4e73ab VMULPD %ZMM18,%ZMM11,%ZMM13 | 0x45bfbf VMULPD %ZMM18,%ZMM11,%ZMM13 |
0x4e6d46 VMOVAPD %ZMM0,%ZMM11 | 0x4e73b1 VMOVAPD %ZMM0,%ZMM11 | 0x45bfc5 VMOVAPD %ZMM0,%ZMM11 |
0x4e6d4c VFMADD213PD (%R12,%RAX,1),%ZMM17,%ZMM11 | 0x4e73b7 VFMADD213PD -0x4a81(%RIP),%ZMM17,%ZMM11 0x4e73c1 NOP | 0x45bfcb VFMADD213PD (%R12,%RAX,1),%ZMM17,%ZMM11 |
0x4e6d53 VADDPD %ZMM5,%ZMM14,%ZMM14 | 0x4e73c2 VADDPD %ZMM5,%ZMM14,%ZMM14 | 0x45bfd2 VADDPD %ZMM5,%ZMM14,%ZMM14 |
0x4e6d59 VMOVAPD %ZMM0,%ZMM5 | 0x4e73c8 VMOVAPD %ZMM0,%ZMM5 | 0x45bfd8 VMOVAPD %ZMM0,%ZMM5 |
0x4e6d5f VFMADD132PD %ZMM19,%ZMM13,%ZMM6 | 0x4e73ce VFMADD132PD %ZMM19,%ZMM13,%ZMM6 | 0x45bfde VFMADD132PD %ZMM19,%ZMM13,%ZMM6 |
0x4e6d65 VMOVAPD %ZMM0,%ZMM13 | 0x4e73d4 VMOVAPD %ZMM0,%ZMM13 | 0x45bfe4 VMOVAPD %ZMM0,%ZMM13 |
0x4e6d6b VMOVUPD %ZMM11,(%R12,%RAX,1) | 0x4e73da VMOVUPD %ZMM11,-0x4a64(%RIP) 0x4e73e4 NOP | 0x45bfea VMOVUPD %ZMM11,(%R12,%RAX,1) |
0x4e6d72 VMOVAPD %ZMM14,%ZMM11 | 0x4e73e5 VMOVAPD %ZMM14,%ZMM11 | 0x45bff1 VMOVAPD %ZMM14,%ZMM11 |
0x4e6d78 VFMADD213PD (%RBX,%RAX,1),%ZMM16,%ZMM5 | 0x4e73eb VFMADD213PD -0x4ab5(%RIP),%ZMM16,%ZMM5 0x4e73f5 NOP | 0x45bff7 VFMADD213PD (%RBX,%RAX,1),%ZMM16,%ZMM5 |
0x4e6d7f VADDPD %ZMM6,%ZMM8,%ZMM6 | 0x4e73f6 VADDPD %ZMM6,%ZMM8,%ZMM6 | 0x45bffe VADDPD %ZMM6,%ZMM8,%ZMM6 |
0x4e6d85 VMOVAPD %ZMM14,%ZMM8 | 0x4e73fc VMOVAPD %ZMM14,%ZMM8 | 0x45c004 VMOVAPD %ZMM14,%ZMM8 |
0x4e6d8b VMOVUPD %ZMM5,(%RBX,%RAX,1) | 0x4e7402 VMOVUPD %ZMM5,-0x4a4c(%RIP) 0x4e740c NOP | 0x45c00a VMOVUPD %ZMM5,(%RBX,%RAX,1) |
0x4e6d92 VMOVAPD %ZMM0,%ZMM5 | 0x4e740d VMOVAPD %ZMM0,%ZMM5 | 0x45c011 VMOVAPD %ZMM0,%ZMM5 |
0x4e6d98 VFMADD213PD (%RDX,%RAX,1),%ZMM10,%ZMM8 | 0x4e7413 VFMADD213PD -0x4add(%RIP),%ZMM10,%ZMM8 0x4e741d NOP | 0x45c017 VFMADD213PD (%RDX,%RAX,1),%ZMM10,%ZMM8 |
0x4e6d9f VMOVUPD %ZMM8,(%RDX,%RAX,1) | 0x4e741e VMOVUPD %ZMM8,-0x4a28(%RIP) 0x4e7428 NOP | 0x45c01e VMOVUPD %ZMM8,(%RDX,%RAX,1) |
0x4e6da6 VFMADD213PD (%R15,%RAX,1),%ZMM15,%ZMM13 | 0x4e7429 VFMADD213PD -0x4af3(%RIP),%ZMM15,%ZMM13 0x4e7433 NOP | 0x45c025 VFMADD213PD (%R15,%RAX,1),%ZMM15,%ZMM13 |
0x4e6dad VMOVUPD %ZMM13,(%R15,%RAX,1) | 0x4e7434 VMOVUPD %ZMM13,-0x49fe(%RIP) 0x4e743e NOP | 0x45c02c VMOVUPD %ZMM13,(%R15,%RAX,1) |
0x4e6db4 VFMADD213PD (%RCX,%RAX,1),%ZMM9,%ZMM11 | 0x4e743f VFMADD213PD -0x4b09(%RIP),%ZMM9,%ZMM11 0x4e7449 NOP | 0x45c033 VFMADD213PD (%RCX,%RAX,1),%ZMM9,%ZMM11 |
0x4e6dbb VMOVUPD %ZMM11,(%RCX,%RAX,1) | 0x4e744a VMOVUPD %ZMM11,-0x49d4(%RIP) 0x4e7454 NOP | 0x45c03a VMOVUPD %ZMM11,(%RCX,%RAX,1) |
0x4e6dc2 VFMADD213PD (%RDI,%RAX,1),%ZMM7,%ZMM6 | 0x4e7455 VFMADD213PD -0x4b1f(%RIP),%ZMM7,%ZMM6 0x4e745f NOP | 0x45c041 VFMADD213PD (%RDI,%RAX,1),%ZMM7,%ZMM6 |
0x4e6dc9 VMOVUPD %ZMM6,(%RDI,%RAX,1) | 0x4e7460 VMOVUPD %ZMM6,-0x49aa(%RIP) 0x4e746a NOP | 0x45c048 VMOVUPD %ZMM6,(%RDI,%RAX,1) |
0x4e6dd0 VMOVAPD %ZMM0,%ZMM6 | 0x4e746b VMOVAPD %ZMM0,%ZMM6 | 0x45c04f VMOVAPD %ZMM0,%ZMM6 |
0x4e6dd6 VFMADD213PD (%RSI,%RAX,1),%ZMM7,%ZMM0 | 0x4e7471 VFMADD213PD -0x4b3b(%RIP),%ZMM7,%ZMM0 0x4e747b NOP | 0x45c055 VFMADD213PD (%RSI,%RAX,1),%ZMM7,%ZMM0 |
0x4e6ddd VFMADD213PD (%R8,%RAX,1),%ZMM10,%ZMM6 | 0x4e747c VFMADD213PD -0x4b46(%RIP),%ZMM10,%ZMM6 0x4e7486 NOP | 0x45c05c VFMADD213PD (%R8,%RAX,1),%ZMM10,%ZMM6 |
0x4e6de4 VMOVUPD %ZMM0,(%RSI,%RAX,1) | 0x4e7487 VMOVUPD %ZMM0,-0x4991(%RIP) 0x4e7491 NOP | 0x45c063 VMOVUPD %ZMM0,(%RSI,%RAX,1) |
0x4e6deb VMOVUPD %ZMM6,(%R8,%RAX,1) | 0x4e7492 VMOVUPD %ZMM6,-0x495c(%RIP) 0x4e749c NOP | 0x45c06a VMOVUPD %ZMM6,(%R8,%RAX,1) |
0x4e6df2 VFMADD213PD (%R10,%RAX,1),%ZMM9,%ZMM5 | 0x4e749d VFMADD213PD -0x4b67(%RIP),%ZMM9,%ZMM5 0x4e74a7 NOP | 0x45c071 VFMADD213PD (%R10,%RAX,1),%ZMM9,%ZMM5 |
0x4e6df9 VMOVUPD %ZMM5,(%R10,%RAX,1) | 0x4e74a8 VMOVUPD %ZMM5,-0x4932(%RIP) 0x4e74b2 NOP | 0x45c078 VMOVUPD %ZMM5,(%R10,%RAX,1) |
0x4e6e00 VFMADD213PD (%R9,%RAX,1),%ZMM7,%ZMM14 | 0x4e74b3 VFMADD213PD -0x4b7d(%RIP),%ZMM7,%ZMM14 0x4e74bd NOP | 0x45c07f VFMADD213PD (%R9,%RAX,1),%ZMM7,%ZMM14 |
0x4e6e07 VMOVUPD %ZMM14,(%R9,%RAX,1) | 0x4e74be VMOVUPD %ZMM14,-0x4908(%RIP) 0x4e74c8 NOP | 0x45c086 VMOVUPD %ZMM14,(%R9,%RAX,1) |
0x4e6e0e ADD $0x40,%RAX | 0x4e74c9 ADD $0x40,%RAX | 0x45c08d ADD $0x40,%RAX |
0x4e6e12 CMP %RAX,0x240(%RSP) | 0x4e74cd CMP %RAX,-0x4bd4(%RIP) | 0x45c091 CMP %RAX,0x240(%RSP) |
0x4e6e1a JNE 4e6cc9 <_ZN16miniqmcreference19MultiBsplineEvalRef12evaluate_vghIdEEvPKN11qmcplusplus14bspline_traitsIT_Lj3EE10SplineTypeES4_S4_S4_PS4_S9_S9_m+0x8b799> | 0x4e74d4 JNE 4e7333 <_ZN16miniqmcreference19MultiBsplineEvalRef12evaluate_vghIdEEvPKN11qmcplusplus14bspline_traitsIT_Lj3EE10SplineTypeES4_S4_S4_PS4_S9_S9_m+0x8be03> | 0x45c099 JNE 45bf50 <_ZN16miniqmcreference19MultiBsplineEvalRef12evaluate_vghIdEEvPKN11qmcplusplus14bspline_traitsIT_Lj3EE10SplineTypeES4_S4_S4_PS4_S9_S9_m+0xa20> |
Path / |
Metric | ORIG | DL1 | Original |
---|---|---|---|
FP operations per cycle L1 | 21.16, 21.16, | 18.74, 18.74, | 21.87, 21.87, |
cycles L1 CQA | 15.50 | 17.50 | 15.00 |
cycles UFS | 15.83 | 17.80 | 15.33 |
bytes loaded | 928.00 | 920.00 | 920.00 |
bytes stored | 648.00 | 640.00 | 640.00 |
nb loads | 18.00 | 17.00 | 17.00 |
nb stores | 11.00 | 10.00 | 10.00 |
cycles dispatch | 12.50 | 12.50 | 12.50 |
cycles front end | 15.50 | 17.50 | 15.00 |
cycles P0 | 12.50 | 12.50 | 12.50 |
cycles P1 | 12.50 | 12.50 | 12.50 |
cycles P2 | 9.67 | 9.17 | 9.17 |
cycles P3 | 9.67 | 8.83 | 8.83 |
cycles P4 | 11.00 | 10.00 | 10.00 |
cycles P5 | 12.50 | 12.50 | 12.50 |
cycles P6 | 1.50 | 1.00 | 1.00 |
cycles P7 | 9.67 | 9.00 | 9.00 |
stall cycles | 0.00 | 0.00 | 0.00 |
LB full | 0.00 | 0.00 | 0.00 |
LM full | 0.00 | 0.00 | 0.00 |
PRF full | 0.00 | 0.00 | 0.00 |
PRF_FLOAT full | 0.00 | 0.00 | 0.00 |
PRF_INT full | 0.00 | 0.00 | 0.00 |
ROB full | 0.00 | 0.00 | 0.00 |
RS full | 0.00 | 0.00 | 0.00 |
SB full | 0.00 | 0.00 | 0.00 |
nb uops | 52.00 | 70.00 | 50.00 |
uops P0 | 12.50 | 12.50 | 12.50 |
uops P1 | 1.50 | 1.00 | 1.00 |
uops P2 | 9.67 | 9.17 | 9.17 |
uops P3 | 9.67 | 8.83 | 8.83 |
uops P4 | 11.00 | 10.00 | 10.00 |
uops P5 | 12.50 | 12.50 | 12.50 |
uops P6 | 1.50 | 1.00 | 1.00 |
uops P7 | 9.67 | 9.00 | 9.00 |
ID | 742 | 744 | 739 |
Metric | ORIG | DL1 | Original |
---|---|---|---|
FP operations per cycle L1 | 21.16, 21.16, | 18.74, 18.74, | 21.87, 21.87, |
cycles L1 CQA | 15.50 | 17.50 | 15.00 |
cycles UFS | 15.83 | 17.80 | 15.33 |
bytes loaded | 928.00 | 920.00 | 920.00 |
bytes stored | 648.00 | 640.00 | 640.00 |
nb loads | 18.00 | 17.00 | 17.00 |
nb stores | 11.00 | 10.00 | 10.00 |
cycles dispatch | 12.50 | 12.50 | 12.50 |
cycles front end | 15.50 | 17.50 | 15.00 |
cycles P0 | 12.50 | 12.50 | 12.50 |
cycles P1 | 12.50 | 12.50 | 12.50 |
cycles P2 | 9.67 | 9.17 | 9.17 |
cycles P3 | 9.67 | 8.83 | 8.83 |
cycles P4 | 11.00 | 10.00 | 10.00 |
cycles P5 | 12.50 | 12.50 | 12.50 |
cycles P6 | 1.50 | 1.00 | 1.00 |
cycles P7 | 9.67 | 9.00 | 9.00 |
stall cycles | 0.00 | 0.00 | 0.00 |
LB full | 0.00 | 0.00 | 0.00 |
LM full | 0.00 | 0.00 | 0.00 |
PRF full | 0.00 | 0.00 | 0.00 |
PRF_FLOAT full | 0.00 | 0.00 | 0.00 |
PRF_INT full | 0.00 | 0.00 | 0.00 |
ROB full | 0.00 | 0.00 | 0.00 |
RS full | 0.00 | 0.00 | 0.00 |
SB full | 0.00 | 0.00 | 0.00 |
nb uops | 52.00 | 70.00 | 50.00 |
uops P0 | 12.50 | 12.50 | 12.50 |
uops P1 | 1.50 | 1.00 | 1.00 |
uops P2 | 9.67 | 9.17 | 9.17 |
uops P3 | 9.67 | 8.83 | 8.83 |
uops P4 | 11.00 | 10.00 | 10.00 |
uops P5 | 12.50 | 12.50 | 12.50 |
uops P6 | 1.50 | 1.00 | 1.00 |
uops P7 | 9.67 | 9.00 | 9.00 |
ID | 742 | 744 | 739 |