Only greedy sampling is supported in transformers inference
While inferring through transformers, you will find that it can only support greedy search without any code change. Once you tried to enable the do_sample feature for generate(), you might encounter ACL issue as below.
RuntimeError: ACL stream synchronize failed, error code:507018
[W compiler_depend.ts:409] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.371.588 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeUsedDevices)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.373.863 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.375.369 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.394.159 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.395.596 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.397.006 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.398.441 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.399.848 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
[W compiler_depend.ts:392] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 37944] 2025-03-19-06:12:26.401.257 wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
TraceBack (most recent call last):
(function npuSynchronizeDevice)
/root/miniconda3/envs/MindIE_1.0.T65/lib/python3.10/tempfile.py:869: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp8tky0pbq'>
_warnings.warn(warn_message, ResourceWarning)
[ERROR] 2025-03-19-06:12:27 (PID:37944, Device:1, RankID:-1) ERR99999 UNKNOWN application exception
This is due to an issue that Ascend NPU(910A) does not properly support using float(‘Inf’) as the filter_value for torch.Tensor.masked_fill. In this case, logits_processor will put ’nan’ into next_token_scores.
# src/transformers/generation/utils.py _sample() function
next_token_scores = logits_processor(input_ids, next_token_logits)
As official documentation, NPUs do not support the input or output of the inf/nan data of the float16 type. But according to a simple verification I did, NPU(910A) can support masked_fill with float(‘Inf’) correctly unless JIT compile is disabled by torch_npu.npu.set_compile_mode(jit_compile=False).
However, when I used transformers for inference without torch_npu.npu.set_compile_mode(jit_compile=False), the program got stuck and showed that it kept waiting.
I have reported this issue to the Ascend NPU support team.
A workaround is to modify the filter_value to a large number such as float(6.5e4) for function masked_fill in logits_process.py.
Please let me know via mail if anything is missing or any better solution.