Help is available by moving the cursor above any symbol or by checking MAQAO website.
Metric | r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
---|---|---|---|---|---|---|---|---|---|
Total Time (s) | 54.06 | 49.38 | 47.97 | 47.38 | 48.72 | 48.83 | 49.97 | 38.57 | |
Max (Thread Active Time) (s) | 36.18 | 31.71 | 30.35 | 29.91 | 30.96 | 30.78 | 31.99 | 21.27 | |
Average Active Time (s) | 34.34 | 31.20 | 29.88 | 29.47 | 30.54 | 30.39 | 31.59 | 21.01 | |
Activity Ratio (%) | 85.8 | 92.9 | 94.7 | 94.6 | 93.5 | 94.4 | 93.8 | 95.3 | |
Average number of active threads | 3.811 | 45.483 | 59.783 | 74.640 | 80.246 | 89.617 | 106.197 | 104.578 | |
Affinity Stability (%) | 92.7 | 98.6 | 98.7 | 98.5 | 98.7 | 98.7 | 98.7 | 98.4 | |
GFLOPS | 53.537 | 61.449 | 64.316 | 65.399 | 62.763 | 63.141 | 61.377 | 92.468 | |
Time in analyzed loops (%) | 11.2 | 3.44 | 3.46 | 3.06 | 2.92 | 2.58 | 2.34 | 2.43 | |
Time in analyzed innermost loops (%) | 9.93 | 3.28 | 3.34 | 2.96 | 2.83 | 2.49 | 2.27 | 2.35 | |
Time in user code (%) | 60.0 | 14.2 | 13.8 | 13.2 | 11.6 | 11.4 | 11.1 | 12.1 | |
Compilation Options Score (%) | 99.9 | 99.9 | 99.9 | 99.9 | 99.9 | 99.9 | 100.0 | 99.9 | |
Array Access Efficiency (%) | 85.7 | 94.3 | 93.1 | 91.1 | 89.3 | 90.5 | 90.9 | 88.9 | |
Potential Speedups | ![]() | ||||||||
Perfect Flow Complexity | 1.02 | 1.01 | 1.01 | 1.01 | 1.01 | 1.01 | 1.01 | 1.01 | |
Perfect OpenMP/MPI/Pthread/TBB | 1.51 | 4.52 | 4.45 | 4.50 | 5.24 | 5.43 | 5.84 | 4.95 | |
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution | 1.66 | 6.63 | 6.66 | 6.95 | 7.85 | 7.96 | 8.22 | 7.72 | |
Scalability - Gap | 1.00 | 10.96 | 14.20 | 17.53 | 19.22 | 21.68 | 25.88 | 22.83 | |
No Scalar Integer | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 3 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | |
FP Vectorised | Potential Speedup | 1.01 | 1.01 | 1.01 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Nb Loops to get 80% | 4 | 1 | 1 | 3 | 3 | 3 | 3 | 3 | |
Fully Vectorised | Potential Speedup | 1.04 | 1.02 | 1.02 | 1.01 | 1.01 | 1.01 | 1.01 | 1.01 |
Nb Loops to get 80% | 7 | 5 | 4 | 4 | 4 | 4 | 4 | 4 | |
Only FP Arithmetic | Potential Speedup | 1.03 | 1.01 | 1.02 | 1.02 | 1.02 | 1.01 | 1.01 | 1.01 |
Nb Loops to get 80% | 5 | 4 | 3 | 2 | 1 | 2 | 2 | 2 | |
OpenMP perfectly balanced | Potential Speedup | 1.16 | 1.47 | 1.50 | 1.48 | 1.50 | 1.53 | 1.51 | 1.56 |
Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Source Object | Issue |
---|---|
▼libllama.so | |
▼ | |
○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
○ | -O2, -O3 or -Ofast is missing. |
○ | -march=(target) is missing. |
▼libggml-cpu.so | |
▼binary-ops.cpp | |
○ | |
▼vec.cpp | |
○ | |
▼sgemm.cpp | |
○ | |
▼mmq.cpp | |
○ | |
▼ops.cpp | |
○ | |
▼common.h | |
○ | |
▼ggml-cpu.c | |
○ | |
▼quants.c | |
○ | |
▼libggml-blas.so | |
▼ | |
○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
○ | -O2, -O3 or -Ofast is missing. |
○ | -march=(target) is missing. |
▼exec | |
▼ | |
○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
○ | -O2, -O3 or -Ofast is missing. |
○ | -march=(target) is missing. |
▼libggml-base.so | |
▼ | |
○ | -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target) |
○ | -O2, -O3 or -Ofast is missing. |
○ | -march=(target) is missing. |
r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
---|---|---|---|---|---|---|---|---|
Experiment Name | ||||||||
Application | /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/run/binaries/aocc_5/exec | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Timestamp | 2025-10-16 18:16:28 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Experiment Type | MPI; OpenMP; | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Machine | isix06.benchmarkcenter.megware.com | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture | GRANITE_RAPIDS | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Model Name | Intel(R) Xeon(R) 6972P | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Cache Size | 491520 KB | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of Cores | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Maximal Frequency | 3.9 GHz | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
OS Version | Linux 5.14.0-570.39.1.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep 4 05:08:52 EDT 2025 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Architecture used during static analysis | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Micro Architecture used during static analysis | GRANITE_RAPIDS | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Compilation Options | exec: N/A libggml-base.so: N/A libggml-blas.so: N/A libggml-cpu.so: AMD clang version 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24) /home/eoseret/aocc-compiler-5.0.0/bin/clang-17 --driver-mode=g++ -D GGML_BACKEND_BUILD -D GGML_BACKEND_SHARED -D GGML_SCHED_MAX_COPIES=4 -D GGML_SHARED -D GGML_USE_CPU_REPACK -D GGML_USE_LLAMAFILE -D GGML_USE_OPENMP -D _GNU_SOURCE -D _XOPEN_SOURCE=600 -D ggml_cpu_EXPORTS -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/.. -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/. -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/../include -O3 -O3 -march=graniterapids -fno-vectorize -fno-slp-vectorize -fno-openmp-simd -ffast-math -g -fno-omit-frame-pointer -fcf-protection=none -nopie -grecord-command-line -fno-finite-math-only -O3 -D NDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -fopenmp=libomp -MD -MT ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -MF ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o.d -o ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -c /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp libllama.so: N/A | same as r0 | same as r0 | libggml-base.so: N/A libggml-cpu.so: AMD clang version 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24) /home/eoseret/aocc-compiler-5.0.0/bin/clang-17 --driver-mode=g++ -D GGML_BACKEND_BUILD -D GGML_BACKEND_SHARED -D GGML_SCHED_MAX_COPIES=4 -D GGML_SHARED -D GGML_USE_CPU_REPACK -D GGML_USE_LLAMAFILE -D GGML_USE_OPENMP -D _GNU_SOURCE -D _XOPEN_SOURCE=600 -D ggml_cpu_EXPORTS -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/.. -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/. -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/../include -O3 -O3 -march=graniterapids -fno-vectorize -fno-slp-vectorize -fno-openmp-simd -ffast-math -g -fno-omit-frame-pointer -fcf-protection=none -nopie -grecord-command-line -fno-finite-math-only -O3 -D NDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -fopenmp=libomp -MD -MT ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -MF ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o.d -o ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -c /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp libllama.so: N/A | libggml-base.so: N/A libggml-blas.so: N/A libggml-cpu.so: AMD clang version 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24) /home/eoseret/aocc-compiler-5.0.0/bin/clang-17 --driver-mode=g++ -D GGML_BACKEND_BUILD -D GGML_BACKEND_SHARED -D GGML_SCHED_MAX_COPIES=4 -D GGML_SHARED -D GGML_USE_CPU_REPACK -D GGML_USE_LLAMAFILE -D GGML_USE_OPENMP -D _GNU_SOURCE -D _XOPEN_SOURCE=600 -D ggml_cpu_EXPORTS -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/.. -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/. -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu -I /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/../include -O3 -O3 -march=graniterapids -fno-vectorize -fno-slp-vectorize -fno-openmp-simd -ffast-math -g -fno-omit-frame-pointer -fcf-protection=none -nopie -grecord-command-line -fno-finite-math-only -O3 -D NDEBUG -std=gnu++17 -fPIC -Wmissing-declarations -Wmissing-noreturn -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi -fopenmp=libomp -MD -MT ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -MF ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o.d -o ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o -c /beegfs/hackathon/users/eoseret/qaas_runs_test/176-060-7658/intel/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp libllama.so: N/A | same as r3 | same as r4 | same as r3 |
Number of processes observed | 1 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of threads observed | 6 | 72 | 96 | 120 | 128 | 144 | 168 | 192 |
Frequency Driver | intel_pstate | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Frequency Governor | performance | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Huge Pages | always | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Hyperthreading | on | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of sockets | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Number of cores per socket | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO version | 2025.1.2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
MAQAO build | bacb6037f9058e0e37353b68076bc49d02aaaee9::20251016-172501 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |
Comments | OV scalability run using aocc_5 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |