Ph.D. student in Computer Science
shijiexia [AT] sjtu.edu.cn
Github | Twitter | Google Scholar | CV
I am a second-year Ph.D. student at Shanghai Jiao Tong University, advised by Prof. Pengfei Liu. Prior to that, I received my B.Eng. degree in Intelligence Science (Honors Program) from Fudan University in 2024.
Starting from April 2026, I am a research intern at
Qwen Team, contributing to the development of foundation models.
My research aims to develop agentic foundation models. To achieve this goal, I focus on the following topics:
(1) Agentic Pretraining: How to evaluate the model's agentic capability at the pretraining stage?
(2) Agentic RL: How to build simulated environments that reflect the real world?
(3) Agentic Application: How to build a good harness that enables agents to learn continuously at test time?
Diagnosing and Mitigating Context Rot in Long-horizon Search
Shijie Xia, Yikun Wang, Zhen Huang, Pengfei Liu
arXiv preprint, 2026
Summary: We reveal a prevalent but unnoticed phenomenon of context rot in long-horizon search and explore its mitigation through context management and rejection sampling.
SR-Scientist: Scientific Equation Discovery With Agentic AI
Shijie Xia, Yuhan Sun, Pengfei Liu
ICLR 2026
Summary: We present SR-Scientist, a framework with a corresponding RL training strategy, in which an autonomous agent discovers scientific equations through long-horizon, tool-driven data analysis and equation evaluation.
A Survey of Test Time Scaling for Reasoning
Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu
arXiv preprint, 2025
Summary: We organize and analyze a broad range of work on test-time scaling.
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu
AAAI 2025 oral presentation
Summary: We propose ReasonEval, a suite comprising a new evaluation methodology with defined metrics for assessing mathematical reasoning quality and corresponding LLM-based evaluators for automated calculation.
Diagnosing and Mitigating Context Rot in Long-horizon Search
Shijie Xia, Yikun Wang, Zhen Huang, Pengfei Liu
arXiv preprint, 2026
Summary: We reveal a prevalent but unnoticed phenomenon of context rot in long-horizon search and explore its mitigation through context management and rejection sampling.
SR-Scientist: Scientific Equation Discovery With Agentic AI
Shijie Xia, Yuhan Sun, Pengfei Liu
ICLR 2026
Summary: We present SR-Scientist, a framework with a corresponding RL training strategy, in which an autonomous agent discovers scientific equations through long-horizon, tool-driven data analysis and equation evaluation.
InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Yunze Wu※, Dayuan Fu※, Weiye Si, Zhen Huang, Mohan Jiang, Keyu Li, Shijie Xia, Jie Sun, Tianze Xu, Xiangkun Hu, Pengrui Lu, Xiaojie Cai, Lyumanshan Ye, Wenhong Zhu, Yang Xiao, Pengfei Liu
ICLR 2026
A Survey of Test Time Scaling for Reasoning
Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu
arXiv preprint, 2025
Summary: We organize and analyze a broad range of work on test-time scaling.
LIMO: Less is More for Reasoning
Yixin Ye※, Zhen Huang※, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu
COLM, 2025
Evaluating Safety with Critique
Yixiu Liu※, Yuxiang Zheng※, Shijie Xia, Yuan Guo, Jiajun Li, Yi Tu, Chaoling Song, Pengfei Liu
EMNLP 2024, Findings
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
NeurIPS 2024
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu
AAAI 2025 oral presentation
Summary: We propose ReasonEval, a suite comprising a new evaluation methodology with defined metrics for assessing mathematical reasoning quality and corresponding LLM-based evaluators for automated calculation.
Outstanding Graduates of Fudan University, 2024
Shanghai City Scholarship, 2022
Fudan University Academic Scholarship, 2020-2024
Program Committee/Reviewer
AAAI (2025-2026), ICLR (2026), EMNLP (2025)