Ph.D. student in Computer Science
shijiexia [AT] sjtu.edu.cn
Github | Twitter | Google Scholar
“If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute resolving it.” — Albert Einstein
Shijie is a Ph.D. student at Shanghai Jiao Tong University, advised by Prof. Pengfei Liu. Prior to that, he received the B.Eng. degree in intelligence science from Fudan University in 2024. He worked as a research intern in Shanghai AI Laboratory from 2023 to 2024.
His research interests mainly focus on Large Language Models, including reliable evaluation and data synthesis.
Evaluating Safety with Critique
Yixiu Liu, Yuxiang Zheng, Shijie Xia, Yuan Guo, Jiajun Li, Yi Tu, Chaoling Song, Pengfei Liu
EMNLP 2024, Findings
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
NeurIPS 2024
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu
arXiv preprint, 2024
Evaluating Safety with Critique
Yixiu Liu, Yuxiang Zheng, Shijie Xia, Yuan Guo, Jiajun Li, Yi Tu, Chaoling Song, Pengfei Liu
EMNLP 2024, Findings
Summary: We introduce SAFETY-J, a bilingual generative safety evaluator for English and Chinese with critique-based judgment.
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
NeurIPS 2024
Summary: We introduce OlympicArena, a benchmark for evaluating cognitive reasoning abilities of LLMs and LMMs.
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu
arXiv preprint, 2024
Summary: We propose ReasonEval, a suite comprising a new evaluation methodology with defined metrics for assessing mathematical reasoning quality and corresponding LLM-based evaluators for automated calculation.
Outstanding Graduates of Fudan University, 2024
Shanghai City Scholarship, 2022
Fudan University Academic Scholarship, 2020-2024
Cognition Engineering: The Inevitable Path to AGI, Nov. 2024, CIPS-LMG 2024 [slides]
Program Committee/Reviewer
AAAI: 2025