ID | Module | Source Location | Source Function | Level | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Coverage (% app. time) | Speedup if no scalar integer | Speedup if FP arith vectorized | Speedup if fully vectorized | Speedup if FP only | Number of paths | Vectorization Ratio (%) | Vector Length Use (%) | ORIG / DL1 | DL1/CQA(DL1) | ORIG (cycles per iteration) | STA (ORIG) | DL1 (cycles per iteration) | STA (DL1) | CQA cycles | CQA cycles if no scalar integer | CQA cycles if FP arith vectorized | CQA cycles if fully vectorized | CQA cycles if FP only | Instance Count | min (Iteration count) | avg (Iteration count) | max (Iteration count) | min (Cycles per Iteration) | avg (Cycles per Iteration) | max (Cycles per Iteration) | Nb FP_ADD / CPI | Nb FP_MUL / CPI | CAP(FP) | BW(FP) | SAT(FP) | CAP(L1R) | BW(L1R) | SAT(L1R) | CAP(L1W) | BW(L1W) | SAT(L1W) | CAP(L2) | BW(L2) | SAT(L2) | CAP(L3) | BW(L3) | SAT(L3) | CAP(RAM_R) | CAP(RAM_W) |
▼Loop 672– | exec | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | Innermost | 0.22 | 0.22 | 33.09 | 1.00 | 1.00 | 1.00 | 1.30 | 1 | 100.00 | 100.00 | 7.45 | 2.02 | 52.79 | 0.34 | 7.08 | 0.52 | 3.25 | 3.25 | 3.25 | 3.25 | 2.50 | 252672 | 48 | 48 | 48 | 5.67 | 35.46 | 1806.54 | 0.00 | 0.15 | 1.36 | 16 | 8.52 | 6.06 | 64 | 9.47 | 1.21 | 32 | 3.79 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 6 | | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | | | | 75.37 | 1.00 | 1.00 | 1.00 | 1.30 | 1 | 100.00 | 100.00 | 7.45 | 2.02 | 52.79 | 0.34 | 7.08 | 0.52 | 3.25 | 3.25 | 3.25 | 3.25 | 2.50 | | | | | | | | 0.00 | 0.15 | 1.36 | 16 | 8.52 | 6.06 | 64 | 9.47 | 1.21 | 32 | 3.79 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 5 | | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | | | | 18.93 | 1.00 | 1.00 | 1.00 | 1.30 | 1 | 100.00 | 100.00 | 4.63 | 2.05 | 33.21 | 1.22 | 7.17 | 0.52 | 3.25 | 3.25 | 3.25 | 3.25 | 2.50 | | | | | | | | 0.00 | 0.24 | 2.17 | 16 | 13.55 | 9.64 | 64 | 15.06 | 1.93 | 32 | 6.02 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 7 | | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | | | | 2.84 | 1.00 | 1.00 | 1.00 | 1.30 | 1 | 100.00 | 100.00 | 7.49 | 2.07 | 54.29 | 0.44 | 7.25 | 0.54 | 3.25 | 3.25 | 3.25 | 3.25 | 2.50 | | | | | | | | 0.00 | 0.15 | 1.33 | 16 | 8.29 | 5.89 | 64 | 9.21 | 1.18 | 32 | 3.68 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 4 | | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | | | | 1.89 | 1.00 | 1.00 | 1.00 | 1.30 | 1 | 100.00 | 100.00 | 2.14 | 2.02 | 15.13 | 0.57 | 7.08 | 0.50 | 3.25 | 3.25 | 3.25 | 3.25 | 2.50 | | | | | | | | 0.00 | 0.53 | 4.76 | 16 | 29.75 | 21.16 | 64 | 33.06 | 4.23 | 32 | 13.22 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 3 | | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | | | | 0.7 | 1.00 | 1.00 | 1.00 | 1.30 | 1 | 100.00 | 100.00 | NA | NA | NA | NA | NA | NA | 3.25 | 3.25 | 3.25 | 3.25 | 2.50 | | | | | | | | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 9 | | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | | | | 0.19 | 1.00 | 1.00 | 1.00 | 1.30 | 1 | 100.00 | 100.00 | NA | NA | NA | NA | NA | NA | 3.25 | 3.25 | 3.25 | 3.25 | 2.50 | | | | | | | | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
▼Loop 679– | exec | TinyVectorOps.h:59-59,MultiBsplineData.hpp:71-71,MultiBsplineRef.hpp:249-270 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | Innermost | 0.16 | 0.16 | 23.53 | 1.10 | 1.00 | 1.00 | 1.17 | 1 | 100.00 | 100.00 | 5.10 | 1.20 | 97.83 | 0.26 | 19.17 | 0.08 | 13.50 | 12.25 | 13.50 | 13.50 | 11.50 | 73728 | 48 | 48 | 48 | 28.08 | 71.46 | 787.88 | 0.00 | 0.33 | 3.43 | 16 | 21.47 | 9.24 | 64 | 14.44 | 6.54 | 32 | 20.44 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 7 | | TinyVectorOps.h:59-59,MultiBsplineData.hpp:71-71,MultiBsplineRef.hpp:249-270 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | | | | 72.31 | 1.10 | 1.00 | 1.00 | 1.17 | 1 | 100.00 | 100.00 | 5.10 | 1.20 | 97.83 | 0.26 | 19.17 | 0.08 | 13.50 | 12.25 | 13.50 | 13.50 | 11.50 | | | | | | | | 0.00 | 0.33 | 3.43 | 16 | 21.47 | 9.24 | 64 | 14.44 | 6.54 | 32 | 20.44 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 6 | | TinyVectorOps.h:59-59,MultiBsplineData.hpp:71-71,MultiBsplineRef.hpp:249-270 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | | | | 27.33 | 1.10 | 1.00 | 1.00 | 1.17 | 1 | 100.00 | 100.00 | 3.24 | 1.20 | 61.96 | 0.20 | 19.13 | 0.05 | 13.50 | 12.25 | 13.50 | 13.50 | 11.50 | | | | | | | | 0.00 | 0.52 | 5.42 | 16 | 33.90 | 14.59 | 64 | 22.80 | 10.33 | 32 | 32.28 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 9 | | TinyVectorOps.h:59-59,MultiBsplineData.hpp:71-71,MultiBsplineRef.hpp:249-270 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | | | | 0.23 | 1.10 | 1.00 | 1.00 | 1.17 | 1 | 100.00 | 100.00 | NA | NA | NA | NA | NA | NA | 13.50 | 12.25 | 13.50 | 13.50 | 11.50 | | | | | | | | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
▼Loop 971– | exec | ParticleBConds.h:185-217 | void qmcplusplus::DTD_BConds::computeDistances, qmcplusplus::VectorSoAContainer >, qmcplusplus::VectorSoAContainer > >(qmcplusplus::TinyVector const&, qmcplusplus::VectorSoAContainer > const&, double*, qmcplusplus::VectorSoAContainer >&, int, int, int) const | Single | 0.07 | 0.07 | 11.03 | 1.03 | 1.00 | 1.01 | 1.44 | 1 | 90.91 | 89.22 | 1.10 | 1.59 | 92.44 | 0.02 | 84.15 | 0.02 | 56.00 | 54.50 | 56.00 | 55.50 | 39.00 | 47712 | 8 | 52.71 | 96 | 84.5 | 89.42 | 1615.5 | 2.34 | 1.47 | 9.26 | 16 | 57.89 | 18.35 | 64 | 28.67 | 2.77 | 32 | 8.65 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 7 | | ParticleBConds.h:185-217 | void qmcplusplus::DTD_BConds::computeDistances, qmcplusplus::VectorSoAContainer >, qmcplusplus::VectorSoAContainer > >(qmcplusplus::TinyVector const&, qmcplusplus::VectorSoAContainer > const&, double*, qmcplusplus::VectorSoAContainer >&, int, int, int) const | | | | 98.69 | 1.03 | 1.00 | 1.01 | 1.44 | 1 | 90.91 | 89.22 | 1.10 | 1.59 | 92.44 | 0.02 | 84.15 | 0.02 | 56.00 | 54.50 | 56.00 | 55.50 | 39.00 | | | | | | | | 2.34 | 1.47 | 9.26 | 16 | 57.89 | 18.35 | 64 | 28.67 | 2.77 | 32 | 8.65 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 8 | | ParticleBConds.h:185-217 | void qmcplusplus::DTD_BConds::computeDistances, qmcplusplus::VectorSoAContainer >, qmcplusplus::VectorSoAContainer > >(qmcplusplus::TinyVector const&, qmcplusplus::VectorSoAContainer > const&, double*, qmcplusplus::VectorSoAContainer >&, int, int, int) const | | | | 1.07 | 1.03 | 1.00 | 1.01 | 1.44 | 1 | 90.91 | 89.22 | 1.75 | 1.75 | 108.00 | 0.17 | 92.50 | 0.11 | 56.00 | 54.50 | 56.00 | 55.50 | 39.00 | | | | | | | | 2.00 | 1.26 | 7.93 | 16 | 49.55 | 15.70 | 64 | 24.54 | 2.37 | 32 | 7.41 | NA | 32 | NA | NA | 15 | NA | NA | NA |
▼Loop 654– | exec | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | Innermost | 0.02 | 0.02 | 2.94 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | 11337.47 | 2.14 | 36374.38 | 0.06 | 3.21 | 0.57 | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | 64000 | 48 | 48 | 48 | 3.08 | 62.86 | 37258.17 | 0.00 | 0.00 | 0.00 | 16 | 0.00 | 0.00 | 64 | 0.00 | 0.00 | 32 | 0.01 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 12 | | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | | | | 80.92 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | 11337.47 | 2.14 | 36374.38 | 0.06 | 3.21 | 0.57 | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | | | | | | | | 0.00 | 0.00 | 0.00 | 16 | 0.00 | 0.00 | 64 | 0.00 | 0.00 | 32 | 0.01 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 4 | | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | | | | 8.72 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | 2.53 | 3.14 | 11.92 | 0.46 | 4.71 | 0.40 | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | | | | | | | | 0.00 | 0.67 | 0.67 | 16 | 4.20 | 5.37 | 64 | 8.39 | 5.37 | 32 | 16.78 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 5 | | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | | | | 4.84 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | 5.89 | 3.03 | 26.75 | 1.66 | 4.54 | 0.35 | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | | | | | | | | 0.00 | 0.30 | 0.30 | 16 | 1.87 | 2.39 | 64 | 3.74 | 2.39 | 32 | 7.48 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 3 | | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | | | | 3.49 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | 1.89 | 2.97 | 8.42 | 0.05 | 4.46 | 0.32 | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | | | | | | | | 0.00 | 0.95 | 0.95 | 16 | 5.94 | 7.60 | 64 | 11.88 | 7.60 | 32 | 23.76 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 8 | | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | | | | 1.44 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | 40.57 | 2.97 | 180.88 | 0.04 | 4.46 | 0.32 | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | | | | | | | | 0.00 | 0.04 | 0.04 | 16 | 0.28 | 0.35 | 64 | 0.55 | 0.35 | 32 | 1.11 | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 6 | | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | | | | 0.15 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | NA | NA | NA | NA | NA | NA | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | | | | | | | | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Bucket 2 | | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | | | | 0.02 | 1.00 | 1.00 | 1.00 | 1.25 | 1 | 100.00 | 100.00 | NA | NA | NA | NA | NA | NA | 1.25 | 1.25 | 1.25 | 1.25 | 1.00 | | | | | | | | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 230 | exec | BsplineFunctor.h:236-241 | qmcplusplus::BsplineFunctor::evaluateV(int, int, int, double const*, double*) const | Single | 0.01 | 0.01 | 2.21 | 1.15 | 1.00 | 1.53 | 11.50 | 2 | 92.68 | 71.67 | NA | NA | NA | NA | NA | NA | 11.50 | 10.00 | 11.50 | 7.53 | 1.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 780 | exec | OperatorTags.h:63-63,inner_product.hpp:81-82,DiracDeterminantRef.cpp:157-157 | miniqmcreference::DiracDeterminantRef >::evaluateGL(qmcplusplus::ParticleSet&, qmcplusplus::ParticleAttrib, std::allocator > >&, qmcplusplus::ParticleAttrib >&, bool) | Innermost | 0.01 | 0.01 | 1.47 | 1.00 | 2.00 | 6.86 | 1.00 | 1 | 25.00 | 15.63 | NA | NA | NA | NA | NA | NA | 8.00 | 8.00 | 4.00 | 1.17 | 8.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 228 | exec | BsplineFunctor.h:246-260 | qmcplusplus::BsplineFunctor::evaluateV(int, int, int, double const*, double*) const | Single | 0.01 | 0.01 | 1.47 | 1.03 | 1.00 | 1.00 | 1.06 | 1 | 100.00 | 89.39 | NA | NA | NA | NA | NA | NA | 17.00 | 16.50 | 17.00 | 17.00 | 16.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 991 | exec | ostream:667-667,Tensor.h:213-213,OperatorTags.h:43-183,char_traits.h:409-409,ParticleIOUtility.h:70-91,OhmmsVector.h:223-223,TinyVectorTensorOps.h:150-152,InfoStream.h:37-37 | void qmcplusplus::expandSuperCell(qmcplusplus::ParticleSet&, qmcplusplus::Tensor const&) | Innermost | 0 | 0 | 0.74 | 1.58 | 1.69 | 7.72 | 2.74 | 8 | 42.62 | 17.32 | NA | NA | NA | NA | NA | NA | 26.00 | 16.50 | 15.38 | 3.37 | 9.50 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1263 | exec | | __intel_avx_rep_memcpy | Single | 0 | 0 | 0.74 | 1.00 | 1.00 | 2.00 | 8.00 | 1 | 100.00 | 50.00 | NA | NA | NA | NA | NA | NA | 8.00 | 8.00 | 8.00 | 4.00 | 1.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 241 | exec | TwoBodyJastrowRef.h:153-154 | miniqmcreference::TwoBodyJastrowRef >::ratioGrad(qmcplusplus::ParticleSet&, int, qmcplusplus::TinyVector&) | Innermost | 0 | 0 | 0.74 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | 100.00 | 100.00 | NA | NA | NA | NA | NA | NA | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1264 | exec | | __intel_avx_rep_memset | Single | 0 | 0 | 0.74 | 1.00 | 1.00 | 2.00 | 8.00 | 1 | 100.00 | 50.00 | NA | NA | NA | NA | NA | NA | 8.00 | 8.00 | 8.00 | 4.00 | 1.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 778 | exec | inner_product.hpp:81-82 | miniqmcreference::DiracDeterminantRef >::ratio(qmcplusplus::ParticleSet&, int) | Single | 0 | 0 | 0.74 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | 100.00 | 100.00 | NA | NA | NA | NA | NA | NA | 4.00 | 4.00 | 4.00 | 4.00 | 4.00 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |