Yuzhen Huang

Hi, I am a second-year PhD student in The Hong Kong University of Science and Technology, Department of Computer Science and Engineering. I am fortunate to be advised by Prof. Junxian He. Before that, I received the bachelor degree in Computer Science in Shanghai Jiao Tong University in 2023.

Research Interests

I am primarily focused on large language models, particularly in advancing their reasoning capabilities and multimodal understanding. To achieve this, my research interests lie in:

Enhancing reasoning and planning abilities through self-improvement and RL techniques. (B-STaR, SimpleRL)
Developing reliable evaluation methods for language models. (C-Eval, LLM-Compression-Intelligence)
Improving the architecture and training methods of multimodal models to strengthen their understanding across multiple modalities.

I am open to any collaboration 🤗

Publications

Most recent publications on Google Scholar.
* denotes co-first authors

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng *, Yuzhen Huang *, Qian Liu *, Wei Liu, Keqing He, Zejun Ma, Junxian He
Arxiv 2025. [arxiv] [github] [Hugging Face]

Achieve improvements in both reasoning accuracy and response length across diverse models.
Introduce SimpleRL-Zoo, a simple reinforcement learning recipe to improve models’ reasoning abilities.
Identify key factors that shape the emergence of advanced reasoning behaviors (i.e., the “aha moment”).

SimpleRL: 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
Weihao Zeng *, Yuzhen Huang *, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He
Notion. [notion] [github] [Hugging Face]

Training on a 7B model using only 8K MATH examples, achieving strong performance in complex mathematical reasoning.
Demonstrated that a 7B model develops long CoT and self-reflection through RL with a simple design.
Outperforms methods that use over 50× more data and complex architectures.

Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum *, Yuzhen Huang *, Hongjian Zou, Ding Qi, Yixuan Liao, Xiaoxin Chen, Qian Liu, Junxian He
ICML 2025. [arxiv] [github] [dataset]

Leverages compression efficiency to identify high-quality data that enhances downstream performance.
Introduces PRESELECT, a lightweight data selection method based on predictive strength.
Demonstrates a 10x reduction in compute requirements and significant performance improvements.

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng * , Yuzhen Huang *, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He
ICLR 2025. [arxiv] [github]

Quantitatively analyze the dynamics of exploration and exploitation during self-improvement.
Introduce B-STaR, a Self-Taught Reasoning framework that autonomously adjusts its configurations.
Balance exploration and exploitation, leading to superior performance.

Compression Represents Intelligence Linearly
Yuzhen Huang *, Jinghan Zhang *, Zifei Shan, Junxian He
COLM 2024. [arxiv] [github] [dataset]

Investigate the linear correlation between compression and intelligence in LLMs.
Provide evidence for the belief that superior compression is indicative of greater intelligence.
Propose compression efficiency serves as an unsupervised and reliable metric to assess LLMs’ abilities.

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang *, Yuzhuo Bai *, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He
NeurIPS 2023 (Datasets and Benchmarks track). [arxiv] [github] [website] [dataset]

The first comprehensive Chinese evaluation suite for LLMs.
Conduct a thorough evaluation of the most advanced LLMs.
Over 9.8M downloads on Hugging Face and more than 100 models on leaderboard.

Experiences

Academia

2024.02 - now PhD student, Department of CSE, HKUST, Hong Kong SAR, China.
2019.09 - 2023.06 Undergraduate, Computer Science, Shanghai Jiao Tong University, Shanghai, China.

Industry

2023.11 - 2024.01 Research Intern, Wechat, Tencent.

Service

Reviewer: NeurIPS 2024, ICLR 2025, ICML 2025, ARR

Invited Talks

Mar 2025, Georgia Tech PAIR, Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
Feb 2025, Apple AIML, Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
May 2024, BAAI, Compression Represents Intelligence Linearly. [video]