OV - - Loops

40 loops have been discarded from the report because their ratio ((Max Inclusive Time Over Threads * 100) / Max Thread Active Time) is lower than the threshold set by object_coverage_threshold (0.1%). It represents about 0.05% of the application. To include them, change the value of object_coverage_threshold in the experiment directory configuration file, then rerun the command with the additionnal parameter --force-static-analysis

▼Columns Filter

Level Max Thread Time / Walltime Exclusive Coverage Inclusive Coverage Max Exclusive Time Over Threads Max Inclusive Time Over Threads Exclusive Time w.r.t. Wall Time Inclusive Time w.r.t. Wall Time Nb Threads GFLOPS Vectorization Ratio Vector Length Use Speedup If No Scalar Integer Speedup If FP Vectorized Speedup If Fully Vectorized Speedup If Perfect Load Balancing Stride 0 Stride 1 Stride n Stride Unknown Stride Indirect Array Access Efficiency

Loop id	Source Location	Source Function	Level	Max Thread Time / Walltime aocc_5 (%)	Exclusive Coverage aocc_5 (%)	Inclusive Coverage aocc_5 (%)	Max Exclusive Time Over Threads aocc_5 (s)	Max Inclusive Time Over Threads aocc_5 (s)	Exclusive Time w.r.t. Wall Time aocc_5 (s)	Inclusive Time w.r.t. Wall Time aocc_5 (s)	Nb Threads aocc_5	GFLOPS aocc_5	Vectorization Ratio (%)	Vector Length Use (%)	Speedup If No Scalar Integer	Speedup If FP Vectorized	Speedup If Fully Vectorized	Speedup If Perfect Load Balancing aocc_5	Stride 0	Stride 1	Stride n	Stride Unknown	Stride Indirect	Array Access Efficiency
97	libggml-cpu.so - ggml-cpu.c:1291-1297	ggml_compute_forward_mul_mat	Innermost	1.79	1.65	1.65	0.54	0.54	0.24	0.24	189	1.80	0	11.38	1	1	2.46	2.23	NA	NA	NA	NA	NA	0.00
3065	libggml-cpu.so - sgemm.cpp:144-464 [...]	void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 2>(long, long, long)	Innermost	0.40	0.50	0.50	0.12	0.12	0.07	0.07	192	3018.37	NA	NA	NA	NA	NA	1.66	NA	NA	NA	NA	NA	0.00
4	libggml-cpu.so - ggml-impl.h:346-404 [...]	ggml_cpu_fp32_to_fp16	Single	0.30	0.18	0.18	0.09	0.09	0.03	0.03	160	2.21	9.09	8.52	1.46	3.75	13.88	2.95	0	2	0	0	0	100.00
2078	libggml-cpu.so - vec.h:89-89	ggml_compute_forward_soft_max	Innermost	0.23	0.17	0.17	0.07	0.07	0.02	0.02	192	15.13	100	50	1	1	2	2.85	0	2	0	0	0	100.00
386	libggml-cpu.so - mmq.cpp:303-1392 [...]	void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int,...	Innermost	0.18	0.13	0.13	0.05	0.05	0.02	0.02	169	0.00	90.91	38.76	1.47	1	1.41	2.64	23	0	0	9	0	85.94
1299	libggml-cpu.so - vec.h:1084-1115 [...]	ggml_vec_swiglu_f32	Single	0.17	0.10	0.10	0.05	0.05	0.01	0.01	171	1336.35	98	98.13	1	1	1	3.04	0.5	0	0	3	0	56.25

Report Configuration

Loops Index

▶Filters

▼Columns Filter