Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution
1.21
1.65
1.21
1.65
1.20
1.64
No Scalar Integer
Potential Speedup
1.43
1.17
1.05
1.07
1.09
1.09
Nb Loops to get 80%
6
6
1
2
1
2
FP Vectorised
Potential Speedup
2.71
1.59
2.41
1.64
2.42
1.64
Nb Loops to get 80%
7
6
6
5
6
5
Fully Vectorised
Potential Speedup
3.21
2.00
3.04
1.91
3.08
1.91
Nb Loops to get 80%
7
6
7
5
7
5
Only FP Arithmetic
Potential Speedup
1.60
1.35
1.25
1.21
1.25
1.22
Nb Loops to get 80%
6
5
1
2
1
2
Cumulated Speedup If No Scalar Integer
Cumulated Speedup If FP Vectorized
Cumulated Speedup If Fully Vectorized
Cumulated Speedup If Only FP Arithmetic
Loop Based Profiles
Innermost / Single Loops
Inbetween Loops
Outermost Loops
Cumulated Coverage With All Loops
Innermost Loop Based Profiles
Coverage
Count
Application Categorization
Time
Coverage
Compilation Options
Source Object
Issue
▼md-gcc-Ofast–
▼simulation.cpp–
○
Source Object
Issue
▼md-gcc-Ofast–
▼simulation.cpp–
○
-funroll-loops is missing.
Source Object
Issue
▼md-clang-O3-ffast-math–
▼simulation.cpp–
○
-g is missing for some functions (possibly ones added by the compiler), but debug locations are available. Some analysis may be inaccurate. Try to complement -g with -grecord-gcc-switches or -frecord-command-line.
○
-O2, -O3 or -Ofast is missing.
○
-march=(target) is missing.
Source Object
Issue
▼md-clang-O3-ffast-math–
▼simulation.cpp–
○
-g is missing for some functions (possibly ones added by the compiler), but debug locations are available. Some analysis may be inaccurate. Try to complement -g with -grecord-gcc-switches or -frecord-command-line.
○
-O2, -O3 or -Ofast is missing.
○
-march=(target) is missing.
Source Object
Issue
▼md-icpx-Ofast–
▼random.h–
○
▼simulation.cpp–
○
Source Object
Issue
▼md-icpx-Ofast–
▼simulation.cpp–
○
Path Count Profiles
Coverage
Count
Low Iteration Count Profiles
Coverage
Count
Average Number of Active Threads
Run 1 - Skylake GCC Ofast - Base
Run 2 - Skylake GCC Ofast - Naive Compute Forces version
Run 3 - Skylake Clang O3-ffast-math - Base
Run 4 - Skylake Clang O3-ffast-math - Naive Compute Forces version
Run 5 - Skylake ICPX Ofast - Base
Run 6 - Skylake ICPX Ofast - Naive Compute Forces version
Experiment Summaries
r0
r1
r2
r3
r4
r5
Experiment Name
MD scalability Skylake 2-52 threads runs | Version : gcc-Ofast
MD scalability Skylake GCC Ofast 2-52 threads runs, naive compute_forces() with code opts and alignment | Version : gcc-Ofast