Modify remote_function decorators in multi_lora_transformers by xichengpro · Pull Request #173 · modelscope/twinkle

xichengpro · 2026-04-20T07:03:56Z

Updated remote_function decorators to specify collection methods.

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

When I'm using the self-host mode for LoRA SFT training, during the eval phase,

    for batch in dataloader:
        model.forward_only(inputs=batch)
        model.calculate_loss()

the following error occurs when executing the code below:

Traceback (most recent call last):
  File "/data/dubingnan/dbn-ceph/exp/coder/taas/sft_rslora_lf_aligned_lr.py", line 457, in <module>
    train()
  File "/data/dubingnan/dbn-ceph/exp/coder/taas/sft_rslora_lf_aligned_lr.py", line 437, in train
    eval_metrics = evaluate(model, eval_dataloader, global_step)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/dubingnan/dbn-ceph/exp/coder/taas/sft_rslora_lf_aligned_lr.py", line 329, in evaluate
    model.calculate_loss()
  File "/data/dubingnan/dbn-ceph/twinkle/src/twinkle_client/model/multi_lora_transformers.py", line 76, in calculate_loss
    response = http_post(
               ^^^^^^^^^^
  File "/data/dubingnan/dbn-ceph/twinkle/src/twinkle_client/http/http_utils.py", line 157, in http_post
    return _handle_response(response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/dubingnan/dbn-ceph/twinkle/src/twinkle_client/http/http_utils.py", line 85, in _handle_response
    raise requests.HTTPError(http_error_msg, response=response)
requests.exceptions.HTTPError: 500 Error for url: http://10.178.165.81:8000/api/v1/model/Qwen/Qwen2.5-Coder-7B-Instruct/twinkle/calculate_loss
Server detail:
Internal Server Error

I found that the calculate_loss method in MultiLoraTransformersModel alters the base class's distributed semantics, causing incorrect calculations under multi-GPU DP distributed training.

Paste your experiment result here(if needed).

Updated remote_function decorators to specify collection methods.

gemini-code-assist

Code Review

This pull request updates the MultiLoraTransformers class by adding collection strategies to remote function decorators. Specifically, it configures calculate_loss to use a 'mean' collection strategy for aggregating losses across ranks and get_state_dict to use a 'first' collection strategy for efficient state retrieval. I have no feedback to provide as the review comments were explanatory in nature.

Modify remote_function decorators in multi_lora_transformers

446a190

Updated remote_function decorators to specify collection methods.

gemini-code-assist bot reviewed Apr 20, 2026

View reviewed changes

tastelikefeet approved these changes Apr 21, 2026

View reviewed changes

tastelikefeet merged commit 95ec7d8 into modelscope:main Apr 21, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify remote_function decorators in multi_lora_transformers#173

Modify remote_function decorators in multi_lora_transformers#173
tastelikefeet merged 1 commit intomodelscope:mainfrom
xichengpro:main

xichengpro commented Apr 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xichengpro commented Apr 20, 2026

PR type

PR information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants