OV - - Loops

MAQAO

options

Loops Index

41 loops have been discarded from the report because their ratio ((Max Inclusive Time Over Threads * 100) / Max Thread Active Time) is lower than the threshold set by object_coverage_threshold (0.1%). It represents about 0.09% of the application. To include them, change the value of object_coverage_threshold in the experiment directory configuration file, then rerun the command with the additionnal parameter --force-static-analysis

▶Filters

Loop id	Source Location	Source Function	Level	Max Thread Time / Walltime aocc_0 (%)	Exclusive Coverage aocc_0 (%)	Inclusive Coverage aocc_0 (%)	Max Exclusive Time Over Threads aocc_0 (s)	Max Inclusive Time Over Threads aocc_0 (s)	Exclusive Time w.r.t. Wall Time aocc_0 (s)	Inclusive Time w.r.t. Wall Time aocc_0 (s)	Nb Threads aocc_0	GFLOPS aocc_0	Vectorization Ratio (%)	Vector Length Use (%)	Speedup If No Scalar Integer	Speedup If FP Vectorized	Speedup If Fully Vectorized	Speedup If Perfect Load Balancing aocc_0	Stride 0	Stride 1	Stride n	Stride Unknown	Stride Indirect	Array Access Efficiency
97	libggml-cpu.so - ggml-cpu.c:1291-1297	ggml_compute_forward_mul_mat	Innermost	1.30	1.42	1.42	0.41	0.41	0.21	0.21	186	3.22	0	11.38	1	1	2.46	1.93	NA	NA	NA	NA	NA	0.00
3047	libggml-cpu.so - sgemm.cpp:144-464 [...]	void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 2>(long, long, long)	Innermost	0.41	0.50	0.50	0.13	0.13	0.07	0.07	192	2974.14	NA	NA	NA	NA	NA	1.8	NA	NA	NA	NA	NA	0.00
4	libggml-cpu.so - ggml-impl.h:346-404 [...]	ggml_cpu_fp32_to_fp16	Single	0.29	0.19	0.19	0.09	0.09	0.03	0.03	161	2.29	7.69	8.17	1.37	4.39	15.48	2.78	0	2	0	0	0	100.00
2015	libggml-cpu.so - vec.h:89-89	ggml_compute_forward_soft_max	Innermost	0.19	0.18	0.18	0.06	0.06	0.03	0.03	191	16.80	100	50	1	1	2	2.31	0	2	0	0	0	100.00
2113	libggml-cpu.so - ops.cpp:6220-6245 [...]	ggml_compute_forward_rope_f32(ggml_compute_params const, ggml_tensor, bool)	Innermost	0.16	0.14	0.14	0.05	0.05	0.02	0.02	192	678.54	1.96	6.62	1.01	1.14	4.06	2.52	1	2	0	0	0	100.00
384	libggml-cpu.so - mmq.cpp:303-1392 [...]	void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int,...	Innermost	0.14	0.12	0.12	0.05	0.05	0.02	0.02	176	0.00	90.91	38.76	1.47	1	1.41	2.46	23	0	0	9	0	85.94
1236	libggml-cpu.so - vec.h:1084-1116 [...]	ggml_vec_swiglu_f32	Single	0.16	0.11	0.11	0.05	0.05	0.02	0.02	171	1184.97	98	98.13	1.02	1	1	2.69	0.5	0	0	3	0	56.25
1758	libggml-cpu.so - ops.cpp:4325-4326	ggml_compute_forward_rms_norm	Innermost	0.13	0.11	0.11	0.04	0.04	0.02	0.02	192	63.08	0	7.81	1	1.98	13.02	2.56	0	1	0	0	0	100.00

×