Movitation

应用需求

their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images.

存储需求、复杂的finetune过程，多张参考图像

they either necessitate extensive fine-tuning across numerous model parameters, lack compati- bility with community pre-trained models, or fail to maintain high face fidelity.

需要大量参数来微调

缺少和社区开源模型的适配

缺少面部的高保真

问题难点

generating customized images that accurately preserve the intricate identity details of human subjects.

生成自定义图像的同时保留人物主体复杂的特征，难点在于人脸ID的特征是细微复杂的，一般物体的描述是粗糙的（形状、颜色），人脸关注细粒度的纹理

之前的工作（ControlNet T2I-adapter）确实可以做到条件控制，利用(depth map,sketch,body pose)，但是最终生成的效果只保留了空间参考信息，所以要细粒度地保留参考图像的特征。

总而言之，要么就是人脸保真度不高（效果），要么就是要复杂的finetune（效率）。

技术路径

一般做主题控制有两种路径

[1] Fine Tune ，是资源密集的方法，需要大量的设置和训练

[2] w/o fine tune when inference, 利用一个adapter 学会把domain-specific的图像特征提取出来