ID | Module | Source Location | Source Function | Level | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Coverage (% app. time) | Speedup if no scalar integer | Speedup if FP arith vectorized | Speedup if fully vectorized | Speedup if FP only | Number of paths | Vectorization Ratio (%) | Vector Length Use (%) | CQA cycles | CQA cycles if no scalar integer | CQA cycles if FP arith vectorized | CQA cycles if fully vectorized | CQA cycles if FP only | Instance Count | min (Iteration count) | avg (Iteration count) | max (Iteration count) | min (Cycles per Iteration) | avg (Cycles per Iteration) | max (Cycles per Iteration) | CAP(FP) | BW(FP) | SAT(FP) | CAP(L1R) | BW(L1R) | SAT(L1R) | CAP(L1W) | BW(L1W) | SAT(L1W) | CAP(L2) | BW(L2) | SAT(L2) | CAP(L3) | BW(L3) | SAT(L3) | CAP(RAM_R) | CAP(RAM_W) |
○Loop 831 | exec | MultiBsplineRef.hpp:70-73 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&) | Innermost | 0.29 | 0.29 | 30.21 | 1.00 | 1.20 | 2.00 | 1.20 | 1 | 100.00 | 50.00 | 6.00 | 6.00 | 5.00 | 3.00 | 5.00 | 252672 | 48 | 48 | 48 | 9.88 | 96.11 | 582759.33 | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1167 | exec | ParticleBConds.h:185-217 | void qmcplusplus::DTD_BConds::computeDistances, qmcplusplus::VectorSoAContainer >, qmcplusplus::VectorSoAContainer > >(qmcplusplus::TinyVector const&, qmcplusplus::VectorSoAContainer > const&, double*, qmcplusplus::VectorSoAContainer >&, int, int, int) const | Single | 0.22 | 0.22 | 23.44 | 1.04 | 1.70 | 2.07 | 1.28 | 1 | 92.54 | 46.22 | 48.50 | 46.50 | 28.50 | 23.44 | 38.00 | 47712 | 16 | 105.42 | 192 | 78 | 170.55 | 1747606.12 | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 840 | exec | TinyVectorOps.h:59-59,MultiBsplineData.hpp:71-71,MultiBsplineRef.hpp:249-270 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | Innermost | 0.19 | 0.19 | 20.31 | 1.05 | 1.29 | 2.00 | 1.12 | 1 | 100.00 | 50.00 | 25.75 | 24.50 | 20.00 | 12.88 | 23.00 | 73728 | 48 | 48 | 48 | 36.12 | 178.73 | 583159.04 | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 812 | exec | BsplineAllocator.hpp:179-180 | qmcplusplus::BsplineAllocator >::setCoefficientsForOrbitals(int, int, Array&, multi_UBspline_3d_d*) [clone .extracted] | Innermost | 0.01 | 0.01 | 1.56 | 1.00 | 1.25 | 2.00 | 1.25 | 1 | 100.00 | 50.00 | 1.25 | 1.25 | 1.00 | 0.63 | 1.00 | 64000 | 96 | 96 | 96 | 5.17 | 40.83 | 306568.6 | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1469 | exec | | __intel_avx_rep_memset | Single | 0.01 | 0.01 | 1.56 | 1.00 | 1.00 | 2.00 | 8.00 | 1 | 100.00 | 50.00 | 8.00 | 8.00 | 8.00 | 4.00 | 1.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1246 | exec | NewTimer.h:119-121,stl_tree.h:782-1952 | std::map, double, std::less >, std::allocator const, double> > >::operator[](qmcplusplus::StackKeyParam<2> const&) | Single | 0.01 | 0.01 | 1.56 | 1.00 | 1.00 | 8.00 | 2.00 | 5 | 0.00 | 12.50 | 5.00 | 5.00 | 5.00 | 0.63 | 2.50 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 302 | exec | BsplineFunctor.h:236-241 | qmcplusplus::BsplineFunctor::evaluateV(int, int, int, double const*, double*) const | Single | 0.01 | 0.01 | 1.04 | 1.22 | 1.00 | 3.08 | 2.00 | 1 | 87.88 | 38.92 | 22.00 | 18.00 | 22.00 | 7.14 | 11.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 930 | exec | inner_product.hpp:210-211 | qmcplusplus::DiracMatrix::invert_transpose(qmcplusplus::Matrix > const&, qmcplusplus::Matrix >&, double&, double&) | Innermost | 0 | 0 | 0.52 | 1.07 | 1.00 | 2.90 | 3.75 | 1 | 85.71 | 41.07 | 3.75 | 3.50 | 3.75 | 1.29 | 1.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 240 | exec | OneBodyJastrowRef.h:196-197 | miniqmcreference::OneBodyJastrowRef >::ratioGrad(qmcplusplus::ParticleSet&, int, qmcplusplus::TinyVector&) | Single | 0 | 0 | 0.52 | 1.00 | 1.54 | 2.00 | 1.00 | 1 | 100.00 | 50.00 | 4.00 | 4.00 | 2.60 | 2.00 | 4.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1187 | exec | ostream:667-667,Tensor.h:213-213,OperatorTags.h:43-183,char_traits.h:409-409,ParticleIOUtility.h:70-91,OhmmsVector.h:223-223,TinyVectorTensorOps.h:150-152,InfoStream.h:37-37 | void qmcplusplus::expandSuperCell(qmcplusplus::ParticleSet&, qmcplusplus::Tensor const&) | Innermost | 0 | 0 | 0.52 | 1.58 | 1.69 | 7.72 | 2.74 | 8 | 42.62 | 17.42 | 26.00 | 16.50 | 15.38 | 3.37 | 9.50 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 836 | exec | MultiBsplineRef.hpp:284-295 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | Innermost | 0 | 0 | 0.52 | 1.00 | 1.00 | 2.00 | 2.00 | 1 | 100.00 | 50.00 | 18.00 | 18.00 | 18.00 | 9.00 | 9.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1173 | exec | DistanceTableBA.h:99-99,ParticleBConds.h:249-278 | qmcplusplus::DistanceTableBA::evaluate(qmcplusplus::ParticleSet&) | Innermost | 0 | 0 | 0.52 | 1.02 | 1.44 | 5.19 | 1.33 | 1 | 36.84 | 18.88 | 30.50 | 30.00 | 21.13 | 5.88 | 23.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 834 | exec | MultiBsplineEvalHelper.hpp:47-49,TinyVectorOps.h:59-59,VectorSoAContainer.h:237-237,MultiBsplineData.hpp:68-79,MultiBsplineRef.hpp:193-295,einspline_spo_ref.hpp:206-208,stl_algobase.h:238-931,stl_vector.h:1126-1258 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | Outermost | 0 | 0 | 0.52 | 1.52 | 1.40 | 6.85 | 2.84 | 32 | 35.91 | 17.88 | 102.25 | 67.25 | 73.25 | 14.94 | 36.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 325 | exec | TwoBodyJastrowRef.h:340-345 | miniqmcreference::TwoBodyJastrowRef >::acceptMove(qmcplusplus::ParticleSet&, int) | Single | 0 | 0 | 0.52 | 1.00 | 1.33 | 2.00 | 1.00 | 1 | 100.00 | 50.00 | 4.00 | 4.00 | 3.00 | 2.00 | 4.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1468 | exec | | __intel_avx_rep_memcpy | Single | 0 | 0 | 0.52 | 1.00 | 1.00 | 2.00 | 8.00 | 1 | 100.00 | 50.00 | 8.00 | 8.00 | 8.00 | 4.00 | 1.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 957 | exec | OperatorTags.h:63-63,inner_product.hpp:81-82,DiracDeterminantRef.cpp:157-157 | miniqmcreference::DiracDeterminantRef >::evaluateGL(qmcplusplus::ParticleSet&, qmcplusplus::ParticleAttrib, std::allocator > >&, qmcplusplus::ParticleAttrib >&, bool) | Innermost | 0 | 0 | 0.52 | 1.00 | 2.00 | 6.86 | 1.00 | 1 | 25.00 | 15.63 | 8.00 | 8.00 | 4.00 | 1.17 | 8.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 955 | exec | inner_product.hpp:81-82 | miniqmcreference::DiracDeterminantRef >::ratio(qmcplusplus::ParticleSet&, int) | Single | 0 | 0 | 0.52 | 1.00 | 1.54 | 2.00 | 1.00 | 1 | 100.00 | 50.00 | 4.00 | 4.00 | 2.60 | 2.00 | 4.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 833 | exec | TinyVector.h:146-146,OperatorTags.h:183-183,einspline_spo_ref.hpp:223-227 | miniqmcreference::einspline_spo_ref::evaluate(qmcplusplus::ParticleSet const&, int, qmcplusplus::Vector >&, qmcplusplus::Vector, std::allocator > >&, qmcplusplus::Vector >&) | Innermost | 0 | 0 | 0.52 | 1.00 | 1.00 | 8.00 | 5.00 | 1 | 0.00 | 12.50 | 5.00 | 5.00 | 5.00 | 0.63 | 1.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 1244 | exec | NewTimer.cpp:99-100 | qmcplusplus::TimerType::stop() | Single | 0 | 0 | 0.52 | 1.00 | 1.00 | 16.00 | 1.00 | 1 | 0.00 | 6.25 | 2.00 | 2.00 | 2.00 | 0.13 | 2.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |
○Loop 323 | exec | TwoBodyJastrowRef.h:340-345 | miniqmcreference::TwoBodyJastrowRef >::acceptMove(qmcplusplus::ParticleSet&, int) | Single | 0 | 0 | 0.52 | 1.00 | 1.33 | 2.00 | 1.00 | 1 | 100.00 | 50.00 | 4.00 | 4.00 | 3.00 | 2.00 | 4.00 | NA | NA | NA | NA | NA | NA | NA | NA | 16 | NA | NA | 64 | NA | NA | 32 | NA | NA | 32 | NA | NA | 15 | NA | NA | NA |