Ascend NPU Known Issue and Solution
Only greedy sampling is supported in transformers inference While inferring through transformers, you will find that it can only support greedy search without any code change. Once you tried to enable the do_sample feature for generate(), you might encounter ACL issue as below. RuntimeError: ACL stream synchronize failed, error code:507018 [W compiler_depend.ts:409] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.371.588 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeUsedDevices) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.373.863 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.375.369 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.394.159 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.395.596 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.397.006 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.398.441 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.399.848 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) [W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]: [Error]: The aicpu execution is abnormal. Rectify the fault based on the error information in the ascend log. EH9999: Inner Error! rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] EH9999: [PID: 37944] 2025-03-19-06:12:26.401.257 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161] TraceBack (most recent call last): (function npuSynchronizeDevice) /root/miniconda3/envs/MindIE_1.0.T65/lib/python3.10/tempfile.py:869: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp8tky0pbq'> _warnings.warn(warn_message, ResourceWarning) [ERROR] 2025-03-19-06:12:27 (PID:37944, Device:1, RankID:-1) ERR99999 UNKNOWN application exception This is due to an issue that Ascend NPU(910A) does not properly support using float(‘Inf’) as the filter_value for torch.Tensor.masked_fill. In this case, logits_processor will put ’nan’ into next_token_scores. ...