ColPali: Efficient Document Retrieval with Vision Language Models https://arxiv.org/abs/2407.01449 https://huggingface.co/learn/cookbook/multimodalra
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents (arxiv preprint 2024) https://arxiv.org/pdf/2410.10594 241014에 나온 pre
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training (2024 arXi