[WIP] Multi-LoRA SFT support FSDP2 #155
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements FSDP2 support for MultiLoraTransformersModel by integrating it into the shared strategy and lazy-wrap lifecycle and introducing sharding-aware parameter access helpers. Review feedback identifies critical bugs in the distributed tensor handling: _write_param_tensor may incorrectly double-shard local data, set_state_dict risks shape mismatches when applying global state to local shards, and get_state_dict returns sharded tensors that could lead to corrupt checkpoints. Furthermore, the model's initialization should be refactored to properly use the parent class, and internal imports should be moved to the module level.
|
I'd love to have this feature! Just curious — why was this PR changed to draft? Any other plans in the works? |
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).