- r_1 - engine_NEON1M11-0001_o2_m26_ifx_full_30loops/ - 30 analyzed loop(s)
- Loop 255174 - engine_linux64_intel_ifx_impi
 - Loop 18566 - engine_linux64_intel_ifx_impi
 - Loop 39475 - engine_linux64_intel_ifx_impi
 - Loop 37916 - engine_linux64_intel_ifx_impi
 - Loop 256165 - engine_linux64_intel_ifx_impi
 - Loop 19710 - engine_linux64_intel_ifx_impi
 - Loop 38054 - engine_linux64_intel_ifx_impi
 - Loop 121792 - engine_linux64_intel_ifx_impi
 - Loop 19918 - engine_linux64_intel_ifx_impi
 - Loop 167135 - engine_linux64_intel_ifx_impi
 - Loop 39361 - engine_linux64_intel_ifx_impi
 - Loop 16649 - engine_linux64_intel_ifx_impi
 - Loop 129166 - engine_linux64_intel_ifx_impi
 - Loop 16652 - engine_linux64_intel_ifx_impi
 - Loop 38910 - engine_linux64_intel_ifx_impi
 - Loop 19924 - engine_linux64_intel_ifx_impi
 - Loop 38255 - engine_linux64_intel_ifx_impi
 - Loop 129964 - engine_linux64_intel_ifx_impi
 - Loop 121790 - engine_linux64_intel_ifx_impi
 - Loop 24558 - engine_linux64_intel_ifx_impi
 - Loop 167621 - engine_linux64_intel_ifx_impi
 - Loop 39025 - engine_linux64_intel_ifx_impi
 - Loop 256166 - engine_linux64_intel_ifx_impi
 - Loop 38251 - engine_linux64_intel_ifx_impi
 - Loop 18565 - engine_linux64_intel_ifx_impi
 - Loop 167340 - engine_linux64_intel_ifx_impi
 - Loop 37997 - engine_linux64_intel_ifx_impi
 - Loop 121788 - engine_linux64_intel_ifx_impi
 - Loop 129946 - engine_linux64_intel_ifx_impi
 - Loop 38932 - engine_linux64_intel_ifx_impi
 
 - r_2 - engine_NEON1M11-0001_o2_m26_ifort_full_30loops/ - 30 analyzed loop(s)
- Loop 15282 - engine_linux64_intel_impi
 - Loop 193162 - engine_linux64_intel_impi
 - Loop 30046 - engine_linux64_intel_impi
 - Loop 28971 - engine_linux64_intel_impi
 - Loop 97970 - engine_linux64_intel_impi
 - Loop 15758 - engine_linux64_intel_impi
 - Loop 97971 - engine_linux64_intel_impi
 - Loop 98506 - engine_linux64_intel_impi
 - Loop 92421 - engine_linux64_intel_impi
 - Loop 15966 - engine_linux64_intel_impi
 - Loop 29120 - engine_linux64_intel_impi
 - Loop 97948 - engine_linux64_intel_impi
 - Loop 129229 - engine_linux64_intel_impi
 - Loop 97950 - engine_linux64_intel_impi
 - Loop 14003 - engine_linux64_intel_impi
 - Loop 15970 - engine_linux64_intel_impi
 - Loop 14012 - engine_linux64_intel_impi
 - Loop 29898 - engine_linux64_intel_impi
 - Loop 29269 - engine_linux64_intel_impi
 - Loop 92419 - engine_linux64_intel_impi
 - Loop 29752 - engine_linux64_intel_impi
 - Loop 19322 - engine_linux64_intel_impi
 - Loop 98529 - engine_linux64_intel_impi
 - Loop 29265 - engine_linux64_intel_impi
 - Loop 129922 - engine_linux64_intel_impi
 - Loop 92417 - engine_linux64_intel_impi
 - Loop 194156 - engine_linux64_intel_impi
 - Loop 129257 - engine_linux64_intel_impi
 - Loop 129681 - engine_linux64_intel_impi
 - Loop 29660 - engine_linux64_intel_impi
 
 
| Analysis | Count | Percentage | Weighted Count | 
| ▼Loop Computation Issues– | 85 |  |  | 
| ○Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 48 | 80.00 | 0.69 | 
| ○Presence of a large number of scalar integer instructions | 17 | 28.33 | 0.25 | 
| ○Presence of expensive FP instructions | 16 | 26.67 | 0.16 | 
| ○Large loop body over microp cache size | 2 | 3.33 | 0.03 | 
| ○Bottleneck in the front-end | 2 | 3.33 | 0.03 | 
| ▼Control Flow Issues– | 26 |  |  | 
| ○Presence of 2 to 4 paths | 15 | 25.00 | 0.18 | 
| ○Presence of calls | 6 | 10.00 | 0.24 | 
| ○Non-innermost loop | 3 | 5.00 | 0.03 | 
| ○Presence of more than 4 paths | 2 | 3.33 | 0.03 | 
| ▼Data Access Issues– | 97 |  |  | 
| ○More than 10% of the vector loads instructions are unaligned | 30 | 50.00 | 0.31 | 
| ○More than 20% of the loads are accessing the stack | 23 | 38.33 | 0.45 | 
| ○Presence of indirect access | 15 | 25.00 | 0.21 | 
| ○Presence of constant non-unit stride data access | 12 | 20.00 | 0.19 | 
| ○Presence of special instructions executing on a single port | 9 | 15.00 | 0.12 | 
| ○Presence of expensive instructions: scatter/gather | 8 | 13.33 | 0.09 | 
| ▼Vectorization Roadblocks– | 62 |  |  | 
| ○Presence of indirect access | 15 | 25.00 | 0.21 | 
| ○Presence of 2 to 4 paths | 15 | 25.00 | 0.18 | 
| ○Presence of constant non-unit stride data access | 12 | 20.00 | 0.19 | 
| ○Presence of calls | 6 | 10.00 | 0.24 | 
| ○Presence of more than 4 paths | 6 | 10.00 | 0.24 | 
| ○ERROR | 5 | 8.33 | 0.23 | 
| ○Non-innermost loop | 3 | 5.00 | 0.03 | 
| ▼Inefficient Vectorization– | 27 |  |  | 
| ○Use of masked instructions | 10 | 16.67 | 0.10 | 
| ○Presence of special instructions executing on a single port | 9 | 15.00 | 0.12 | 
| ○Presence of expensive instructions: scatter/gather | 8 | 13.33 | 0.09 | 
 
 
| Analysis | r_1 | r_2 | 
| Loop Computation Issues | Presence of expensive FP instructions | 8 | 8 | 
|---|
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 24 | 24 | 
| Large loop body over microp cache size | 1 | 1 | 
| Presence of a large number of scalar integer instructions | 6 | 11 | 
| Bottleneck in the front-end | 1 | 1 | 
| Control Flow Issues | Presence of calls | 2 | 4 | 
|---|
| Presence of 2 to 4 paths | 8 | 7 | 
| Presence of more than 4 paths | 1 | 1 | 
| Non-innermost loop | 2 | 1 | 
| Data Access Issues | Presence of constant non-unit stride data access | 5 | 7 | 
|---|
| Presence of indirect access | 7 | 8 | 
| More than 10% of the vector loads instructions are unaligned | 17 | 13 | 
| Presence of expensive instructions: scatter/gather | 3 | 5 | 
| Presence of special instructions executing on a single port | 8 | 1 | 
| More than 20% of the loads are accessing the stack | 9 | 14 | 
| Vectorization Roadblocks | Presence of calls | 2 | 4 | 
|---|
| Presence of 2 to 4 paths | 8 | 7 | 
| Presence of more than 4 paths | 3 | 3 | 
| Non-innermost loop | 2 | 1 | 
| Presence of constant non-unit stride data access | 5 | 7 | 
| Presence of indirect access | 7 | 8 | 
| ERROR | 3 | 2 | 
| Inefficient Vectorization | Presence of expensive instructions: scatter/gather | 3 | 5 | 
|---|
| Presence of special instructions executing on a single port | 8 | 1 | 
| Use of masked instructions | 5 | 5 |