OV - - Loops

MAQAO

options

Loops Index

27 loops have been discarded from the report because their ratio ((Max Inclusive Time Over Threads * 100) / Max Thread Active Time) is lower than the threshold set by object_coverage_threshold (0.1%). It represents about 0.03% of the application. To include them, change the value of object_coverage_threshold in the experiment directory configuration file, then rerun the command with the additionnal parameter --force-static-analysis

▶Filters

Loop id	Source Location	Source Function	Level	Max Thread Time / Walltime icx_10 (%)	Exclusive Coverage icx_10 (%)	Inclusive Coverage icx_10 (%)	Max Exclusive Time Over Threads icx_10 (s)	Max Inclusive Time Over Threads icx_10 (s)	Exclusive Time w.r.t. Wall Time icx_10 (s)	Inclusive Time w.r.t. Wall Time icx_10 (s)	Nb Threads icx_10	GFLOPS icx_10	Vectorization Ratio (%)	Vector Length Use (%)	Speedup If No Scalar Integer	Speedup If FP Vectorized	Speedup If Fully Vectorized	Speedup If Perfect Load Balancing icx_10	Stride 0	Stride 1	Stride n	Stride Unknown	Stride Indirect	Array Access Efficiency
2394	libggml-cpu.so - sgemm.cpp:144-399 [...]	void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 2>(long, long, long)	Innermost	0.37	0.48	0.48	0.12	0.12	0.07	0.07	192	3214.07	NA	NA	NA	NA	NA	1.74	NA	NA	NA	NA	NA	0.00
1644	libggml-cpu.so - ops.cpp:5552-5563	ggml_compute_forward_set_rows	Innermost	0.29	0.30	0.30	0.09	0.09	0.04	0.04	192	4.11	0	12.5	1	1	8	2.15	1	0	0	4	0	60.00
540	libggml-cpu.so - mmq.cpp:303-1392 [...]	void parallel_for<(anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int, int)::{lambda(int, int)#1}>(int, (anonymous namespace)::convert_B_packed_format<block_q8_0, 32>(void, block_q8_0 const, int,...	Innermost	0.19	0.12	0.12	0.06	0.06	0.02	0.02	178	0.00	100	41.39	1.42	1	1.34	3.09	15	0	0	5	0	87.50
2393	libggml-cpu.so - sgemm.cpp:144-464 [...]	void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, unsigned short, float>::gemm<4, 6, 2>(long, long, long)	InBetween	0.16	0.12	0.60	0.05	0.15	0.02	0.09	192	3552.90	79.34	29.86	1.01	1.53	2.7	2.79	NA	NA	NA	NA	NA	50.00
595	exec -	__intel_avx_rep_memcpy	Single	0.16	0.12	0.12	0.05	0.05	0.02	0.02	187	0.90	100	50	1	1	2	2.78	0	2	0	0	0	100.00
1724	libggml-cpu.so - ops.cpp:6220-6245 [...]	ggml_compute_forward_rope_f32(ggml_compute_params const, ggml_tensor, bool)	Innermost	0.14	0.10	0.10	0.04	0.04	0.02	0.02	192	629.74	2.17	6.66	1	1.8	5.33	3.03	1	1	0	0	0	100.00

×