Theme: MAQAO_theme darkgrey cyan
Help is available by moving the cursor above any symbol or by checking MAQAO website .
r0: OMP1
r1: OMP2
r2: OMP4
r3: OMP8
r4: OMP16
r5: OMP32
Metric r0 r1 r2 r3 r4 r5 Total Time (s) 1.16 E3 660.97 403.38 277.98 216.25 190.32
Max (Thread Active Time) (s) 1.09 E3 621.67 374.46 253.64 194.44 170.90
Average Active Time (s) 1.09 E3 575.45 313.90 185.90 122.87 96.44
Activity Ratio (%) 93.9 87.1 77.9 67.0 57.0 50.8
Average number of active threads 5.635 10.447 18.676 32.099 54.548 97.291
Affinity Stability (%) 65.9 71.3 87.0 91.1 92.9 91.9
GFLOPS 577.119 1.01 E3 1.66 E3 2.41 E3 3.10 E3 3.52 E3
Time in analyzed loops (%) 90.0 85.8 80.5 71.5 60.1 51.5
Time in analyzed innermost loops (%) 88.9 84.7 79.4 70.5 59.2 50.8
Time in user code (%) 7.76 7.36 6.69 5.65 4.29 2.77
Compilation Options Score (%) 100 100 100 100 100 100
Array Access Efficiency (%) 51.0 50.8 50.7 50.7 50.7 50.7
Potential Speedups
Perfect Flow Complexity 1.00 1.00 1.00 1.00 1.00 1.00
Perfect OpenMP + MPI + Pthread 1.04 1.10 1.16 1.23 1.32 1.35
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution 1.05 1.19 1.40 1.81 2.51 3.32
Scalability - Gap 1.00 1.14 1.39 1.92 2.98 5.25
No Scalar Integer Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 2 2 2 2 2 2 FP Vectorised Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 1 1 1 1 1 1 Fully Vectorised Potential Speedup 1.03 1.02 1.02 1.02 1.01 1.01 Nb Loops to get 80% 4 3 3 3 3 3 Only FP Arithmetic Potential Speedup 1.04 1.03 1.03 1.03 1.02 1.02 Nb Loops to get 80% 5 5 5 5 5 5
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_setran.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_pdgesv0.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_setran.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_pdgesv0.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_setran.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_pdgesv0.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_setran.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_pdgesv0.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_setran.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_pdgesv0.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_setran.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_pdgesv0.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_dlaswp01N.c–
○
r0 r1 r2 r3 r4 r5
Experiment Name
Application ./hpl-2.3/bin/Linux_Intel64/xhpl same as r0 same as r0 same as r0 same as r0 same as r0
Timestamp 2025-06-23 09:35:51 same as r0 same as r0 same as r0 same as r0 same as r0
Experiment Type MPI; MPI; OpenMP; same as r1 same as r1 same as r1 same as r1
Machine isix06.benchmarkcenter.megware.com same as r0 same as r0 same as r0 same as r0 same as r0
Architecture x86_64 same as r0 same as r0 same as r0 same as r0 same as r0
Micro Architecture GRANITE_RAPIDS same as r0 same as r0 same as r0 same as r0 same as r0
Model Name Intel(R) Xeon(R) 6972P same as r0 same as r0 same as r0 same as r0 same as r0
Cache Size 491520 KB same as r0 same as r0 same as r0 same as r0 same as r0
Number of Cores 96 same as r0 same as r0 same as r0 same as r0 same as r0
Maximal Frequency 3.9 GHz same as r0 same as r0 same as r0 same as r0 same as r0
OS Version Linux 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 7 17:08:27 EST 2025 same as r0 same as r0 same as r0 same as r0 same as r0
Architecture used during static analysis x86_64 same as r0 same as r0 same as r0 same as r0 same as r0
Micro Architecture used during static analysis GRANITE_RAPIDS same as r0 same as r0 same as r0 same as r0 same as r0
Compilation Options
xhpl : clang based Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20241008) --intel -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include/Linux_Intel64 -I /cluster/intel/oneapi/2025.0.0/mkl/2025.0/mkl/include -I /cluster/intel/oneapi/2025.0.0/mpi/2021.14/include -o HPL_lmul.o -c -D Add__ -D F77_INTEGER=int -D StringSunStyle -D HPL_DETAILED_TIMING -D HPL_PROGRESS_REPORT -O3 -g -x Host -mprefer-vector-width=512 -Wall -fstrict-aliasing ../HPL_lmul.c -fveclib=SVML -fheinous-gnu-extensions same as r0 same as r0 same as r0 same as r0 same as r0
Number of processes observed 6 same as r0 same as r0 same as r0 same as r0 same as r0
Number of threads observed 6 12 24 48 96 192
Frequency Driver intel_pstate same as r0 same as r0 same as r0 same as r0 same as r0
Frequency Governor powersave same as r0 same as r0 same as r0 same as r0 same as r0
Huge Pages always same as r0 same as r0 same as r0 same as r0 same as r0
Hyperthreading on same as r0 same as r0 same as r0 same as r0 same as r0
Number of sockets 2 same as r0 same as r0 same as r0 same as r0 same as r0
Number of cores per socket 96 same as r0 same as r0 same as r0 same as r0 same as r0
MAQAO version 2025.1.0 same as r0 same as r0 same as r0 same as r0 same as r0
MAQAO build b107544c0173fc3785aa7d997ff783dc12b975d2::20250527-133805 same as r0 same as r0 same as r0 same as r0 same as r0
Comments HPL benchmark compiled with Intel OneAPI 2025.0, using Intel MPI and MKL. Matrix order: 100K, block size 384. Run on Intel GNR with 6 NUMA nodes and 32 cores per NUMA node same as r0 same as r0 same as r0 same as r0 same as r0