Wan2.2 LoRA Fine-Tuning Issues: Anyone Else Seeing This?

by Rajiv Sharma 57 views

Hey everyone!

I'm super excited about Wan2.2 and the fantastic work the team has done in open-sourcing it. I've been diving into fine-tuning with LoRA, and I've hit a bit of a snag. I wanted to share my experience and see if anyone else has run into something similar, or if you guys might have some insights to share. Let's get into it!

The Experiment: Fine-Tuning Wan2.2 for Clothing/Makeup Transformation

So, here’s what I did: I decided to try fine-tuning the Wan2.2 A14B model using LoRA for a specific task – clothing and makeup transformations. This kind of task is super interesting because it really pushes the model to learn nuanced changes. I used the official training script, which is always a good starting point, right? I figured sticking to the official tools would give me a solid baseline.

I fine-tuned both the low noise and high noise models, because I wanted to see how each would handle the task. To make sure the training was focused, I set a timestep boundary to 0.875. This is a crucial step because it helps the model concentrate on the relevant parts of the diffusion process for the transformations I was aiming for. The training itself went smoothly – no errors or crashes, which is always a relief! Everything seemed to be going according to plan.

But, this is where things got a little… weird. After the fine-tuning wrapped up, the results were, well, not what I was hoping for. In fact, they were pretty poor. I was expecting the model to nail the clothing and makeup transfers, but it just didn’t seem to learn effectively. This was a head-scratcher, especially since the training process itself didn't throw up any red flags.

Testing the Fine-Tuned Model: Official vs. KJ’s Inference Workflows

To really put the fine-tuned model through its paces, I decided to test it using two different inference workflows. First, I used the official inference workflow, because, you know, gotta start with the official setup. Then, I also tried KJ’s inference workflow, which I've found to be quite robust in the past. I wanted to cover all my bases and see if maybe the issue was specific to one particular setup. The goal was to make absolutely sure that the problem wasn’t just a quirk of the testing environment.

Unfortunately, the results were consistently disappointing across both workflows. The model just didn’t seem to grasp the concept of motion or makeup transfer. It was like it had gone through the training, but the knowledge hadn’t really stuck. This was super puzzling because everything had seemed to run correctly during the fine-tuning phase. The fact that both the official and KJ’s workflows yielded similar results suggested that the issue wasn't with the inference process itself, but rather with the model's learning during fine-tuning. I started to wonder if there was something fundamentally different about Wan2.2 that was causing these issues.

The Wan2.1 Comparison: A Stark Contrast

Now, here’s where it gets even more interesting. To get a better handle on what was going on, I decided to run a comparison. I took the exact same dataset and the same task – clothing and makeup transformations – and fine-tuned Wan2.1. This was a crucial step because it gave me a baseline to compare against. If Wan2.1 worked well, it would strongly suggest that the issue was specific to Wan2.2.

And guess what? The results with Wan2.1 were much better. The LoRA successfully learned the desired transformations, and the difference was night and day. This was a huge clue! It pointed squarely at something unique about Wan2.2 that was causing the fine-tuning to underperform. Maybe it’s the architecture, the training data, or some other subtle factor. Whatever it is, it’s clear that Wan2.2 behaves differently than Wan2.1 in this context. This made me even more curious and determined to figure out what was happening.

This experience really highlighted the importance of having a solid baseline for comparison. Without it, I might have just assumed that the dataset or task was inherently difficult. But by comparing the results with Wan2.1, I was able to isolate the issue and focus my investigation on the specific differences between the two models.

The Question: Has Anyone Else Seen This?

So, here I am, scratching my head and wondering if anyone else has experienced something similar with Wan2.2. Have you guys tried fine-tuning it, and if so, what were your results like? Did you notice any quirks or unexpected behavior? I’m really curious to hear about your experiences. Maybe we can piece together what’s going on and find a solution together. Or, at the very least, we can raise awareness of this issue so others don’t spend time banging their heads against the wall, like I did!

I’m particularly interested in hearing from people who have tried fine-tuning Wan2.2 for similar tasks, like clothing or makeup transformations. But any insights are welcome! Even if you’ve worked on different tasks, your experience might shed light on the underlying issue.

Potential Insights and Discussion Points

Here are a few things I’ve been pondering, and I’d love to hear your thoughts on them:

  • Model Architecture Differences: Could there be fundamental differences in the architectures of Wan2.1 and Wan2.2 that affect LoRA fine-tuning? Maybe Wan2.2 has some specific architectural features that make it less amenable to this particular technique.
  • Training Data: Is it possible that the training data used for Wan2.2 is different in some way that impacts fine-tuning performance? Perhaps there are subtle differences in the data distribution or the way the data was preprocessed.
  • Timestep Boundary: I used a timestep boundary of 0.875. Is it possible that this value isn’t optimal for Wan2.2? Maybe a different boundary would yield better results. I’m open to experimenting with this, but I’d love to hear if anyone has any experience with different timestep boundaries.
  • LoRA Hyperparameters: Could the LoRA hyperparameters be the culprit? I used the official training script, so I assumed the defaults would be reasonable. But maybe Wan2.2 requires different settings for optimal performance. This is another area I’m considering exploring further.

I know this is a bit of a long post, but I wanted to give you guys all the details so we can have a productive discussion. I’m really hoping we can crack this nut together! Let me know your thoughts, experiences, and any ideas you might have. Thanks in advance for your help!

Conclusion

In conclusion, my experience fine-tuning Wan2.2 with LoRA for clothing and makeup transformations has been less than ideal. The model didn’t seem to learn as effectively as Wan2.1, and I’m curious to hear if others have encountered similar issues. Let’s share our experiences and insights to help each other out and better understand this fantastic new model!