Hi, I am a second-year PhD student in The Hong Kong University of Science and Technology, Department of Computer Science and Engineering. I am fortunate to be advised by Prof. Junxian He. Before that, I received the bachelor degree in Computer Science in Shanghai Jiao Tong University in 2023.

Research Interests

I am primarily focused on large language models, particularly in advancing their reasoning capabilities and multimodal understanding. To achieve this, my research interests lie in:

  • Enhancing reasoning and planning abilities through self-improvement and RL techniques. (B-STaR, simpleRL)
  • Developing reliable evaluation methods for language models. (C-Eval, LLM-Compression-Intelligence)
  • Improving the architecture and training methods of multimodal models to strengthen their understanding across multiple modalities.

I am open to any collaboration 🤗

Publications

Most recent publications on Google Scholar.
* denotes co-first authors

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
Weihao Zeng *, Yuzhen Huang *, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He\ Notion. [notion] [github] [Hugging Face]

  • Training on a 7B model using only 8K MATH examples, achieving strong performance in complex mathematical reasoning.
  • Demonstrated that a 7B model develops long CoT and self-reflection through RL with a simple design.
  • Outperforms methods that use over 50× more data and complex architectures.

Predictive Data Selection: The Data That Predicts Is the Data That Teaches
Kashun Shum *, Yuzhen Huang *, Hongjian Zou, Ding Qi, Yixuan Liao, Xiaoxin Chen, Qian Liu, Junxian He
Arxiv 2025. [arxiv] [github] [dataset]

  • Leverages compression efficiency to identify high-quality data that enhances downstream performance.
  • Introduces PRESELECT, a lightweight data selection method based on predictive strength.
  • Demonstrates a 10x reduction in compute requirements and significant performance improvements.

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng * , Yuzhen Huang *, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He
ICLR 2025. [arxiv] [github]

  • Quantitatively analyze the dynamics of exploration and exploitation during self-improvement.
  • Introduce B-STaR, a Self-Taught Reasoning framework that autonomously adjusts its configurations.
  • Balance exploration and exploitation, leading to superior performance.

Compression Represents Intelligence Linearly
Yuzhen Huang *, Jinghan Zhang *, Zifei Shan, Junxian He
COLM 2024. [arxiv] [github] [dataset]

  • Investigate the linear correlation between compression and intelligence in LLMs.
  • Provide evidence for the belief that superior compression is indicative of greater intelligence.
  • Propose compression efficiency serves as an unsupervised and reliable metric to assess LLMs’ abilities.

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang *, Yuzhuo Bai *, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He
NeurIPS 2023 (Datasets and Benchmarks track). [arxiv] [github] [website] [dataset]

  • The first comprehensive Chinese evaluation suite for LLMs.
  • Conduct a thorough evaluation of the most advanced LLMs.
  • Over 9.8M downloads on Hugging Face and more than 100 models on leaderboard.

Experiences

Academia

  • 2024.02 - now PhD student, Department of CSE, HKUST, Hong Kong SAR, China.
  • 2019.09 - 2023.06 Undergraduate, Computer Science, Shanghai Jiao Tong University, Shanghai, China.

Industry

  • 2023.11 - 2024.01 Research Intern, Wechat, Tencent.

Service

Reviewer: NeurIPS 2024, ICLR 2025, ICML 2025

Invited Talks

  • Mar 2025, Georgia Tech PAIR, Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
  • Feb 2025, Apple AIML, Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
  • May 2024, BAAI, Compression Represents Intelligence Linearly. [video]