Performance Optimization Suggestions
Optimization Priority:
High
Medium
Low
overall
overall summary
Description
details
| Performance Index | Duration(ms) | Duration Ratio |
| Computing Time | 40144.912 | 69.09% |
| -- Flash Attention | 0.000 | 0.00% |
| -- Conv | 0.000 | 0.00% |
| -- Matmul | 0.000 | 0.00% |
| -- Vector | 0.000 | 0.00% |
| -- SDMA(Tensor Move) | 0.000 | 0.00% |
| -- Other Cube | 0.000 | 0.00% |
| Uncovered Communication Time | 16564.843 | 28.51% |
| -- Wait | 5573.454 | 9.59% |
| -- Transmit | 10991.389 | 18.92% |
| Free Time | 1399.231 | 2.41% |
| -- SDMA | 0.000 | 0.00% |
| -- Free | 1399.231 | 2.41% |
| E2E Time | 58108.987 | 100.00% |
performance problem analysis
schedule
Operator Dispatch Issues
| Description | Suggestion |
|---|---|
| Found 142 operator compile issues. | Please place the following code at the entrance of the python script to disable jit compile. Code: `torch_npu.npu.set_compile_mode(jit_compile=False); torch_npu.npu.config.allow_internal_format = False` |
| Issue | Counts | Elapsed Time(us) |
|---|---|---|
| aclopCompileAndExecute | 142 | 6496.6575 |