Chenyang Lyu 吕晨阳

Staff Researcher at Alibaba and PhD from Ml-Labs, Dublin City University
Glasnevin, Dublin
Ireland

Email: lyuchenyang.dcu [at] gmail [dot] com
Google Scholar Twitter DBLP LinkedIn

About

I am currently a Staff Researcher/Tech Lead at Alibaba ATH, where I lead the Speech and Omni LLM Applied Research team. Previously, I was a researcher at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), focusing on multilingual and multimodal large language models. I earned my Ph.D. in Machine Learning from Dublin City University's ML-Labs in 2023, following a Bachelor of Engineering from Northeastern University in China in 2018. My research lies primarily in natural language processing, with a focus on LLMs across multilingual, multimodal, and speech settings, as well as LLM agents. My Google Scholar citations are 2,223, with an h-index of 23 and an i10 index of 32. Recent work includes CoQuIR, selected at ACL 2026, plus accepted papers at COLM 2026, CVPR 2026, and ICASSP 2026. Other recent work spans foundational LLMs, multilingual/multimodal/speech LLMs, and LLM agents, including Marco-MoE, Trust No Tool, and Crayotter. GPT4Video was nominated for the Best Paper Award at ACM-MM 2024. My open-source projects and contributions have collectively earned 4k+ GitHub stars, and I won two championships and two runner-up prizes in the IWSLT 2025 speech translation competition. Prior to my current role, I gained extensive research experience in large language models through positions as a research assistant and visiting scholar at Tencent AI Lab, Japan's National Institute of Informatics (NII) and IBM Research-China. My work has been recognized with several awards, including the IWSLT 2025 Speech Translation Competition championship, the ACM-MM 2024 Best Paper nomination, the German DAAD AInet Fellowship, the 2023 Irish AI Young Talent of the Year Award and an SFI PhD Scholarship. Additionally, my research has been featured in media outlets such as Irish national broadcaster RTÉ, Slator, Irish Tech News, and Irish podcasts.

News

[2026/07] Our paper CoQuIR is selected as an SAC Highlights Award at ACL 2026.
[2026/07] Marco-MoE is accepted to COLM 2026.
[2026/02] One paper elastic transformer is accepted to CVPR 2026.
[2026/01] Three papers including Marco-Voice,Long Speech, MECAP-R1 are accepted to ICASSP 2026.
[2025/12] Invited Industry Expert Talk on Marco Models at ACM Multimedia Asia 2025, hosted by Dr. Jianquan Liu (NEC).
[2025/11] Invited Talk on LLMs Hallucination Detection at Southern University of Science and Technology (SUSTech), hosted by Dr. Linyi Yang.
[2025/07] Our team secured Two Championships and Two Runner-ups at the IWSLT 2025 International Speech Translation Competition.
[2025/05] Four papers including multilingual LLMs, hallucination detection are accepted to ACL 2025.
[2024/10] Marco-LLM featured by global media including Bloomberg, CNBC, and South China Morning Post (SCMP).
[2023/06]: We have released our Multi-modal Large Language Models named Macaw-LLM!.

Educational Background

Sep 2019 - Jul 2023, Doctor of Philosophy in Computer Science, ML-Labs, Dublin City University
Oct 2014 - June 2018, Bachelor of Engineering in Computer Software Engineering, Northeastern University

Research Experience

Research Intern, Huawei Noah's Ark Lab, May 2020 - May 2021
Research Intern, IBM Research-China, July 2018 - December 2018

Selected Publications

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval
Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Alexander Pretschner, Heinz Koeppl, Fakhri Karray
In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) SAC Highlights Award
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang, Weihua Luo
In Conference on Language Modeling (COLM 2026)
ElasticFormer: Detecting Objects in HRW Shots via Elastic Computing Vision Transformer
Xiang Li, Wenxi Li, Yuetong Wang, Chenyang Lyu, Haozhe Lin, Guiguang Ding, Yuchen Guo
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026)
LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech
Fei Yang, Xuanfan Ni, Renyi Yang, Jiahui Geng, Qing Li, Chenyang Lyu*, Yichao Du, Longyue Wang, Weihua Luo, Kaifu Zhang
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)
MECap-R1: Emotion-aware Policy with Reinforcement Learning for Multimodal Emotion Captioning
Haoqin Sun, Chenyang Lyu*, Xiangyu Kong, Shiwan Zhao, Jiaming Zhou, Hui Wang, Aobo Kong, Jinghua Zhao, Longyue Wang, Weihua Luo, Kaifu Zhang, Yong Qin
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)
Marco-Voice Technical Report
Fengping Tian, Chenyang Lyu*, Xuanfan Ni, Haoqin Sun, Qingjuan Li, Zhiqiang Qian, Haijun Li, Longyue Wang, Zhao Xu, Weihua Luo, Kaifu Zhang
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback
Lecheng Yan, Ruizhe Li, Xicheng Han, Wenxi Li, Binwu Wang, Longyue Wang, Chenyang Lyu, Guanhua Chen
Preprint 2026, agent safety
Crayotter: Traceable Multi-Agent Workflows for Long-Form Video Editing
Lecheng Yan, Yichong Zhang, Ben Pan, Xiaoyu Zheng, Jiawei Qian, Anqi Wu, Wenxi Li, Chenyang Lyu
Preprint 2026, multi-agent workflows
CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks
Peiqin Lin, Chenyang Lyu, Wenjiang Luo, Haotian Ye, Md Mehrab Hossain, Chunlan Ma, Shaoxiong Ji, Younes Samih, Bo Zeng, Fan Jiang, Yuanbin Cao, Dilda Duisenbek, Adrian Neo Sau Xun, Daria Pozdniakova, Liubou Misevich, Nevena Marinkovic, Ngoc Gia Linh Nguyen, Thi Khanh Linh Do, Sarakmatak Sophy, Baotian Hu, Guanhua Chen, Gongbo Tang, Alham Fikri Aji, Longyue Wang, Weihua Luo
Preprint 2026, multilingual benchmark
ReFreeKV: Towards Threshold-Free KV Cache Compression
Xuanfan Ni, Liyan Xu, Chenyang Lyu, Longyue Wang, Mo Yu, Lemao Liu, Fandong Meng, Jie Zhou, Piji Li
Findings of ACL 2026
SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures
Nedjma Ousidhoum, Junho Myung, Carla Pérez-Almendros, Jiho Jin, Amr Keleg, Meriem Beloucif, Yi Zhou, Rodrigo Agerri, Vladimir Araujo, Naomi Baes, James Barry, Joanne Boisson, Nancy F. Chen, Christine de Kock, Aleksandra Edwards, Joseba Fernandez de Landa, Mohamed Fazli Imam, Huda Hakami, Shu-Kai Hsieh, Joseph Marvin Imperial, Roy Ka-Wei Lee, Zhengyuan Liu, Chenyang Lyu, Younes Samih, Johan Sjons, Bryan Tan, Asahi Ushio, Weihua Zheng, Alice Oh, José Camacho-Collados
In Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026), ACL 2026 (Shared Task Organizer)
Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language Models
Wenxi Li, Jingchen Huang, Chenyang Lyu*, Mo-Ran Liu, Haozhe Lin, Guiguang Ding, Yuchen Guo
In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs
Qing Li, Jiahui Geng, Zongxiong Chen, Derui Zhu, Yuxia Wang, Congbo Ma, Chenyang Lyu, Fakhri Karray
In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
EditEval: Towards Comprehensive and Automatic Evaluation for Text-guided Video Editing
Bingshuai Liu, Ante Wang, Zijun Min, Chenyang Lyu, Longyue Wang, Zhihao Wang, Xu Han, Peng Li, Jinsong Su
In Proceedings of the 33rd ACM International Conference on Multimedia (ACM-MM 2025)
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu
In Proceedings of the 32nd ACM International Conference on Multimedia (ACM-MM 2024) Best Paper Awards Nomination
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero*, Chenyang Lyu*, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Rada Mihalcea, Thamar Solorio, Alham Fikri Aji
In Advances in Neural Information Processing Systems, Dataset and Benchmark Track (NeurIPS 2024) (Oral presentation)
Benchmarking and Improving Long-Text Translation with Large Language Models
Longyue Wang, Zefeng Du, Wenxiang Jiao, Chenyang Lyu, Jianhui Pang, Leyang Cui, Kaiqiang Song, Derek F. Wong, Shuming Shi, Zhaopeng Tu
In Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)
A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models
Chenyang Lyu, Zefeng Du, Jitao Xu, Yitao Duan, Minghao Wu, Teresa Lynn, Alham Fikri Aji, Derek F. Wong, Longyue Wang
In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Large Language Models as Code Executors: An Exploratory Study
Chenyang Lyu, Lecheng Yan, Rui Xing, Wenxi Li, Younes Samih, Tianbo Ji, Longyue Wang
Preprint 2024

Teaching Experience

CA-146/297, Introduction to Programming, 2020, Dublin City University, Teaching Assistant
CA-271, Machine Learning, 2022, Dublin City University, Guest Lecturer
CA-168, Digital World, 2022, Dublin City University, Guest Lecturer

Professional Activities

PC Member
- The 45th European Conference on Information Retrieval, ECIR 2023
- The 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
- The 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
- The 3rd Workshop on Financial Technology on the Web in conjunction with The Web Conference 2023, FinWeb 2023
Conference Reviewer
- The 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
- The 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
- The 29th International Conference on Computational Linguistics, COLING 2022
- The 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Workshop Reviewer
- EvoNLP: The First Workshop on Ever Evolving NLP, EMNLP 2022
- FinNLP: The Fourth Workshop on Financial Technology and Natural Language Processing, EMNLP 2022
- FinWeb: The 3rd Workshop on Financial Technology on the Web, WWW 2023
Journal Reviewer
- IEEE Transaction on Multimedia, IEEE
- Social Network Analysis and Mining, Springer
- Connection Science, Taylor & Francis
Regular Reviewer
- ACL Rolling Review