Crowdsourcing and Reinventing the Next Generation of Dynamic and Scalable Math Benchmarks

The Project

The project seeks to create a community-driven platform that can produce, validate and iterate on benchmark mathematical problems at scale, with a view of streamlining AI for Math model evaluation and data procurement. By leveraging semi-automated workflows, the platform will crowdsource and verify complex math problems, ensuring a regularly updated and contamination-free benchmark. Key features of this project include streamlined problem submission, automated verification, expert review, and LLM evaluation, all contributing to a dynamic and collaborative research and development environment.

The Team

James Zou is an associate professor of Biomedical Data Science, CS and EE at Stanford University. He works on advancing the foundations of AI and cutting-edge scientific and medical applications. He has received a Sloan Fellowship, the Overton Prize, an NSF CAREER Award, two Chan-Zuckerberg Investigator Awards, a Top Ten Clinical Achievement Award, several best paper awards, and faculty awards from Google, Amazon, Adobe and Apple.

Huaxiu Yao is an Assistant Professor in the Department of Computer Science, with a joint appointment in the School of Data Science and Society at the University of North Carolina at Chapel Hill. He was previously a Postdoctoral Scholar in Computer Science at Stanford University. His current research focuses on developing agentic, multimodal foundation models that are widely generalizable and well-aligned with human preferences. His research has been recognized with several honors, including TMLR Outstanding Paper Award, KDD 2024 Best Paper Award, Amazon Research Award, Cisco Faculty Award, AAAI 2024 New Faculty Highlights, PharmAlliance Early Career Researcher Award, and UNC Junior Faculty Development Award.

Linjun Zhang is an Associate Professor in the Department of Statistics, at Rutgers University. He obtained his Ph.D. in Statistics at the Wharton School, the University of Pennsylvania in 2019, and received J. Parker Bursk Memorial Prize and Donald S. Murray Prize for excellence in research and teaching, respectively upon graduation. He also received the NSF CAREER Award, and Rutgers Presidential Teaching Award in 2024. His current research interests include algorithmic fairness, privacy-preserving data analysis, deep learning theory, and high-dimensional statistics.

David Pennock is Director of DIMACS and Professor of Computer Science at Rutgers University, where he designs human-AI market mechanisms and pioneered both combinatorial prediction markets and truthful wagering; his early work in recommender systems, web analysis, and sponsored search earned three Test-of-Time distinctions. Holding a Ph.D. in AI, he has spent two decades shaping the economics-and-computation community—co-founding two research areas, three workshops, an ACM journal, and three corporate research labs—while also leading Microsoft Research NYC as Assistant Managing Director. He has published 100+ papers, filed 20+ patents, featured in major media, and delivered 50+ invited talks.

Pan Lu is a postdoctoral researcher at Stanford University. He received his Ph.D. in Computer Science from UCLA in 2024. His research focuses on developing AI methods and systems to advance complex reasoning, mathematical intelligence, and scientific discovery. He has served as Senior Program Chair for NENLP 2025, Program Chair for SoCal NLP 2023, and Co-Chair of the MATHAI workshops at NeurIPS (2021–2024). He is a recipient of several awards, including two Most Influential Paper Awards (NeurIPS 2022, ICLR 2024), a Best Paper Honorable Mention at ACL 2023, the Best Paper Award at the KnowledgeNLP Workshop 2025, and Ph.D. Fellowships supported by Amazon, Bloomberg, and Qualcomm.