Abstract:
This project focuses on leveraging consistency models for downstream tasks involving the trans- lation and mapping between different modalities, such as converting visible images to their corresponding infrared representations. By utilizing paired data for training, the model learns a robust mapping that preserves essential features across modalities. The ultimate goal is to build a model capable of generating accurate outputs in the target domain (e.g., infrared) from inputs in the source domain (e.g., visible), enabling practical applications in domains like imaging, vision enhancement, and modality transformation while showcasing the potential of consistency models for cross-domain learning tasks.