
BAI, Jinze, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023.

WANG, Peng, et al. Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution. arXiv preprint arXiv:2409.12191, 2024.

CHU, Yunfei, et al. Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models. arXiv preprint arXiv:2311.07919

WU, Chenfei, et al. Qwen-image technical report. arXiv preprint arXiv:2508.02324, 2025.