Shijie Xia (夏世杰)

Publications

Selected
All

Diagnosing and Mitigating Context Rot in Long-horizon Search

Shijie Xia, Yikun Wang, Zhen Huang, Pengfei Liu

arXiv preprint, 2026

Paper

Summary: We reveal a prevalent but unnoticed phenomenon of context rot in long-horizon search and explore its mitigation through context management and rejection sampling.

SR-Scientist: Scientific Equation Discovery With Agentic AI

Shijie Xia, Yuhan Sun, Pengfei Liu

ICLR 2026

Paper Code Model Data

Summary: We present SR-Scientist, a framework with a corresponding RL training strategy, in which an autonomous agent discovers scientific equations through long-horizon, tool-driven data analysis and equation evaluation.

A Survey of Test Time Scaling for Reasoning

Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu

arXiv preprint, 2025

Paper Code

Summary: We organize and analyze a broad range of work on test-time scaling.

Evaluating Mathematical Reasoning Beyond Accuracy

Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu

AAAI 2025 oral presentation

Paper Code Model Poster

Summary: We propose ReasonEval, a suite comprising a new evaluation methodology with defined metrics for assessing mathematical reasoning quality and corresponding LLM-based evaluators for automated calculation.

Diagnosing and Mitigating Context Rot in Long-horizon Search

Shijie Xia, Yikun Wang, Zhen Huang, Pengfei Liu

arXiv preprint, 2026

Paper

Summary: We reveal a prevalent but unnoticed phenomenon of context rot in long-horizon search and explore its mitigation through context management and rejection sampling.

SR-Scientist: Scientific Equation Discovery With Agentic AI

Shijie Xia, Yuhan Sun, Pengfei Liu

ICLR 2026

Paper Code Model Data

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research

Yunze Wu^※, Dayuan Fu^※, Weiye Si, Zhen Huang, Mohan Jiang, Keyu Li, Shijie Xia, Jie Sun, Tianze Xu, Xiangkun Hu, Pengrui Lu, Xiaojie Cai, Lyumanshan Ye, Wenhong Zhu, Yang Xiao, Pengfei Liu

ICLR 2026

Paper Code

A Survey of Test Time Scaling for Reasoning

Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu

arXiv preprint, 2025

Paper Code

Summary: We organize and analyze a broad range of work on test-time scaling.

LIMO: Less is More for Reasoning

Yixin Ye^※, Zhen Huang^※, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu

COLM, 2025

Paper Code

Evaluating Safety with Critique

Yixiu Liu^※, Yuxiang Zheng^※, Shijie Xia, Yuan Guo, Jiajun Li, Yi Tu, Chaoling Song, Pengfei Liu

EMNLP 2024, Findings

Paper Code

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu

NeurIPS 2024

Paper Code

Evaluating Mathematical Reasoning Beyond Accuracy

Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu

AAAI 2025 oral presentation

Paper Code Model Poster

Shijie Xia (夏世杰)

Bio

Publications

Selected Honors and Awards

Service