fix npu compatibility by HsiaWinter · Pull Request #13465 · huggingface/diffusers

HsiaWinter · 2026-04-14T03:34:39Z

What does this PR do?

Fix attention_mask broadcasting for NPU compatibility

HuggingFaceDocBuilderDev · 2026-04-14T05:07:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks for the PR, I left one question

yiyixuxu · 2026-04-14T06:31:37Z

+
+            if attention_mask.ndim == 4:
+                # NPU does not support automatic broadcasting for this type; the mask must be expanded.
+                if attention_mask.device.type == 'npu' and attention_mask.shape[1:3] == (1, 1):


can we verify if we explicitly seet the backend to npu, this would also work?

diffusers/src/diffusers/models/attention_dispatch.py

Line 3173 in 6a339ce

def _native_npu_attention(

When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error similar to:
get unsupported atten_mask shape, the shape is [B, 1, 1, S] – while only shapes like [B, N, S, S], [B, 1, S, S], [1, 1, S, S], or [S, S] are accepted.

The _native_npu_attention function operates correctly as it leverages _maybe_modify_attn_mask_npu to reshape the attention mask from [batch_size, seq_len_k] to [batch_size, 1, seq_len_q, seq_len_k]. This reshaped format is compatible with the NPU backend.

Reference:
Ascend NPU fusion attention API:
https://www.hiascend.com/document/detail/zh/Pytorch/730/apiref/torchnpuCustomsapi/docs/context/torch_npu-npu_fusion_attention.md

When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error

just want to make sure we're on the same page, could you share a code example that would produce this error on npu? specifically, I;d like to know if you are running the default attention backend, i.e. without wrapping your model call inside with attention_backend("_naive_npu")

Yes, this code fixes the issue with the "native" backend. After the fix, it runs correctly with the "_native_attention" backend. Here's an example:

python import torch import torch_npu from diffusers import ErnieImagePipeline from diffusers.utils import load_image pipe = ErnieImagePipeline.from_pretrained("/model_dir/ERNIE-Image", torch_dtype=torch.bfloat16) pipe = pipe.to("npu") generator = torch.Generator(device="npu") prompt = "A black and white Chinese rural dog" images = pipe( prompt=prompt, height=1024, width=1024, num_inference_steps=50, guidance_scale=5.0, generator=generator, use_pe=True, ).images images[0].save("ernie-image-output.png")

However, I've found that when using native_npu_attention as the backend, there are still some issues with mask handling. I've pushed an additional commit to the previous PR. PR link: #13451 .
Add the following code to the example to enable the _native_npu_attention backend: pipe.transformer.set_attention_backend("_native_npu")
Note: When processing masks, we need to perform expansion validation when 4D masks are passed in, and apply mask inversion to meet the NPU interface requirements.

yiyixuxu · 2026-04-15T23:21:02Z

Thank you both for your inputs

I looked into this a bit more. I wonder if we can build the mask directly in 2D and not expand to 4D instead? https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_ernie_image.py#L398-L400

This way, I think it would work out-of-box with all our attention backend that support masks. For npu device, it would work using "_native_npu" backend but not the default naive backend - but that's the case with all other models currently

@chang-zhijie, Can you help confirm if a 2D mask would work with "_naive_npu" backend?

If it's the case, that would be our preferred direction, but if the baidu team prefers an implementation work out of the box with the default backend on npu too, we're happy to support it as well, let us know @HsiaWinter

HsiaWinter added 2 commits April 14, 2026 11:28

fix npu compatibility

fc7c419

fix npu compatibility

51c7ea7

github-actions bot added models size/S PR with diff < 50 LOC labels Apr 14, 2026

Merge branch 'main' into add_npu_compatibility

18dcd90

github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 14, 2026

yiyixuxu reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix npu compatibility#13465

fix npu compatibility#13465
HsiaWinter wants to merge 3 commits intohuggingface:mainfrom
HsiaWinter:add_npu_compatibility

HsiaWinter commented Apr 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

yiyixuxu left a comment

Uh oh!

yiyixuxu Apr 14, 2026 •

edited

Loading

Uh oh!

chang-zhijie Apr 14, 2026

Uh oh!

yiyixuxu Apr 14, 2026

Uh oh!

chang-zhijie Apr 15, 2026 •

edited

Loading

Uh oh!

yiyixuxu commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

HsiaWinter commented Apr 14, 2026

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chang-zhijie Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chang-zhijie Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yiyixuxu Apr 14, 2026 •

edited

Loading

chang-zhijie Apr 15, 2026 •

edited

Loading