Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution
1.21
1.21
1.20
No Scalar Integer
Potential Speedup
1.43
1.05
1.09
Nb Loops to get 80%
6
1
1
FP Vectorised
Potential Speedup
2.71
2.41
2.42
Nb Loops to get 80%
7
6
6
Fully Vectorised
Potential Speedup
3.21
3.04
3.08
Nb Loops to get 80%
7
7
7
Only FP Arithmetic
Potential Speedup
1.60
1.25
1.25
Nb Loops to get 80%
6
1
1
Cumulated Speedup If No Scalar Integer
Cumulated Speedup If FP Vectorized
Cumulated Speedup If Fully Vectorized
Cumulated Speedup If Only FP Arithmetic
Loop Based Profiles
Innermost / Single Loops
Inbetween Loops
Outermost Loops
Cumulated Coverage With All Loops
Innermost Loop Based Profiles
Coverage
Count
Application Categorization
Time
Coverage
Compilation Options
Source Object
Issue
▼md-gcc-Ofast–
▼simulation.cpp–
○
Source Object
Issue
▼md-clang-O3-ffast-math–
▼simulation.cpp–
○
-g is missing for some functions (possibly ones added by the compiler), but debug locations are available. Some analysis may be inaccurate. Try to complement -g with -grecord-gcc-switches or -frecord-command-line.
○
-O2, -O3 or -Ofast is missing.
○
-march=(target) is missing.
Source Object
Issue
▼md-icpx-Ofast–
▼random.h–
○
▼simulation.cpp–
○
Path Count Profiles
Coverage
Count
Low Iteration Count Profiles
Coverage
Count
Average Number of Active Threads
Run 1 - Skylake GCC Ofast
Run 2 - Skylake Clang O3-ffast-math
Run 3 - Skylake ICPX Ofast
Experiment Summaries
r0
r1
r2
Experiment Name
MD scalability Skylake 2-52 threads runs | Version : gcc-Ofast