Performance Optimization Suggestions

Optimization Priority:
High
Medium
Low

overall

overall summary

Description
details
Performance Index Duration(ms) Duration Ratio
Computing Time 40144.912 69.09%
-- Flash Attention 0.000 0.00%
-- Conv 0.000 0.00%
-- Matmul 0.000 0.00%
-- Vector 0.000 0.00%
-- SDMA(Tensor Move) 0.000 0.00%
-- Other Cube 0.000 0.00%
Uncovered Communication Time 16564.843 28.51%
-- Wait 5573.454 9.59%
-- Transmit 10991.389 18.92%
Free Time 1399.231 2.41%
-- SDMA 0.000 0.00%
-- Free 1399.231 2.41%
E2E Time 58108.987 100.00%

performance problem analysis

schedule

Operator Dispatch Issues

Description Suggestion
Found 142 operator compile issues. Please place the following code at the entrance of the python script to disable jit compile. Code: `torch_npu.npu.set_compile_mode(jit_compile=False); torch_npu.npu.config.allow_internal_format = False`
Issue Counts Elapsed Time(us)
aclopCompileAndExecute 142 6496.6575