Kevin Buzzard and Alex Kontorovich on the Future of Formal Mathematics: A Mathlib Initiative Interview

Q&A

Oct 3

Renaissance Philanthropy is launching the Mathlib Initiative to professionalize and scale the world's largest library of computer-verified mathematics.

The Mathlib community has created an open-source library containing nearly two million lines of computer-verified mathematics, and is beginning to reach the limits of what is possible by means of a purely voluntary model. This initiative provides the professional infrastructure needed to scale Mathlib’s impact while preserving its open, collaborative spirit.

With a donation from Alex Gerko (Founder and CEO of XTX Markets) and leadership from Johan Commelin, Oliver Nash, and Adam Topaz, the Mathlib Initiative will help mathematics rise to meet the demands of a world where AI accelerates discovery, ensuring that our most fundamental knowledge remains rigorous, reliable, and accessible as code.

In September 2025, Oliver Nash separately interviewed Prof. Kevin Buzzard (Imperial College London) and Prof. Alex Kontorovich (Rutgers University) about Mathlib and the Mathlib Initiative, with both interviews were loosely framed around the same questions - a lively discussion ensued! We share some highlights below.

Nash: Lean's Mathlib library had its busiest ever month in August 2025: over 1,100 community contributions were reviewed and added to the library. Nevertheless, about 2,000 community contributions still await review. A major aim of the Mathlib Initiative is to increase review bandwidth. How do you think this would impact your work, in particular the FLT (Fermat's Last Theorem) or PNT+ (Prime Number Theorem) project?

Buzzard: One thing I've noticed is that we have a backlog of definitions. Let me step back a bit. I see contributors who are reluctant to PR to Mathlib because the process is slow, an explicit example being some attempt by one of my PhD students to define quadratic algebras. So if things speed up that would be nice. I myself have noticed that it's just easier not to have to deal with the hassle of going through the review process. But there's a gigantic Mathlib directory which myself and others wrote in the FLT project; there's no reason to not start PRing that. I regard 2,000 open PRs as a problem. You could regard it as a gigantic success story or also say we're a victim of our own success. Overall I think it's really good that the review process is so thorough, but it does make it slow.

Nash: Large projects like FLT / PNT+ have a symbiotic relationship with Mathlib: they depend on it, but also produce a wealth of mathematics that could enrich it. The Initiative aims to strengthen this two-way connection, making it smoother for projects to stay current with Mathlib and for their results to be integrated back. From your perspective, what are the biggest friction points in this process, and what improvements would be most transformative for your project's workflow?

Kontorovich: For me, it was and still is, really fun to work on the prime number theorem project because I was free of "is this the perfect generality?" and "is this the right way to do something?". All I have to do is write the math. I would very much like all of that PNT+ work to make its way eventually into Mathlib. There's almost three skills. There's the skill of knowing the mathematics in the first place to write out a blueprint. There's a skill of coordinating people or doing it yourself or using LLMs or whatever to actually get it done. And then there's a really separate skill which is "okay, now it's done, look back over this large 30,000-line project and figure out how to take little bits and pieces of it and get them into Mathlib". How any good thing happens is people voluntarily take on the responsibility that they're good at and specialise and get really good at it. Maybe the Initiative can say "hey, we're actually going to pay somebody" and "you're really good at this, how about we pay you for it?". There are people who are really good at refactors [they] go through a project with 30,000 lines and start pulling things out into Matlib because they've already been proven. One of the big questions "does this definition actually work in practice?" [can be answered with a] link to the project showing this has worked in that project. The dream is whatever we've done up to this point with the medium prime number theorem, that should just not even exist anymore [because it has moved to Mathlib]. All that should be left at the end is the blueprint as an educational document.

Nash: Beyond research, Mathlib is increasingly used in education. The Initiative plans to invest in better documentation and learning resources. Drawing from your experiences teaching with Lean, what have been the most surprising successes or persistent challenges in bringing students into the world of formal mathematics? Are there aspects of creating these educational materials that seem particularly difficult to sustain with a purely volunteer-driven model?

Buzzard: At the beginning I had an unnaturally large success because the library was so small that first-year undergraduates would come along and say "I need a project" and I'd say "great, do matrices, we don't have them". I mean this actually happened! So it was kind of easy at the beginning to bring students in because the material was sort of on par with their undergraduate lecture notes but now it's a very different situation. It's still the case that some people can see the software and fall in love with it but for first year undergraduates, it's difficult to find things for them to do. What to do with the fact that all over the world there are undergraduates who could potentially turn into useful contributors? We have to find the route in, and of course one issue is that the learning curve is ridiculous. We [at Imperial College London] have a third-year undergraduate course which really guides people through the library [answering questions like] "Here's a random statement from third-year algebra about principal ideal domains or unique factorisation domains: how do you even say that in Lean?". My course is a very useful resource: it's currently being translated into Chinese by one of my PhD students because the Academy of Mathematics and Systems Science in China want something. Mathematics In Lean [the canonical textbook] is great but I think there's a lot more room in that space. I think ways of just explaining to people "here are standard undergraduate-level statements, and here's how to say them in Lean" [would be very useful]. It should cover lots of pure maths and ideally a bunch of probability, statistics, the beginnings of applied maths as well. Just explaining "here's 50 statements, choose one that you understand", and now "here's how you actually say it in Lean because this is not obvious".

Kontorovich: For the Big Proof conference, I really tried to think about the four distinct things in this ecosystem. One is Lean itself, just the core software which of course the FRO is developing. Then there's Mathlib, which is this amazing engineering project as far as I see it. And then there's just doing research on top of Mathlib and then there's teaching. Teaching sits somewhere between Lean and Mathlib and also off to the side because you don't jump straight to the Lebesgue integral. You do Riemann integrals [just as] you do epsilon-delta before you do filters. So there's a space for using some bits of Mathlib but also going straight from core and making your own definitions for pedagogical purposes. Right now I'm working on this real analysis game, because I'm teaching real analysis. My dream is that eventually every undergraduate subject is just a game and a 12-year-old can learn the entire undergraduate curriculum formally by playing the natural number game, and then the real number game, and then the real analysis game, and then the topology game ...

Nash: There's growing interest from the AI community in formal mathematics, partly because Lean's perfect verification is an antidote to the "hallucination" problem in large language models. Corporate AI labs such as Google DeepMind, Harmonic, ByteDance have demonstrated that a true synergy exists. What is your perspective on this intersection of AI and formal proof? Do you see it as a promising new frontier, a potential distraction, or something else entirely?

Buzzard: AI is doing the IMO and I imagine it's not going to stop there. I imagine that in a few years time AI is going to be attacking the Putnam and claiming to have an undergraduate degree. This is something that I'm surprised hasn't happened already. Maybe it will be Lean-related, maybe it won't be Lean-related. In my opinion it should be Lean-related precisely for the reason in the question: my understanding is that solving hallucinations is not something that's going to happen anytime soon. I'm still much very open-minded about whether these things are actually going to start proving theorems that we didn't know. You look at all these fabulous things that happened in the last five years and then you can ask one question: did an AI ever prove a theorem that humans didn't know how to prove? An interesting theorem that humans wanted to prove but couldn't prove. And the answer is no. But I'm just coming to the conclusion now, that autoformalisation actually might be a much more viable idea. "Here's a basic paper or here's a graduate text in mathematics. Translate all this into Lean." It sounds kind of easy: we can translate from English into Python or English into Chinese so surely we should be able to translate from LaTeX into Lean but I think it's much harder than that, especially because of definitions. But if people manage to get autoformalisation then suddenly it's opening the door to tools which will referee papers which I think would be a great breakthrough. You give it the system and it reads the paper and it says "I can put 90% of this together, but here are the 20 places where I couldn't follow, where you should read more carefully". So that might be actually a viable place where AI could help mathematicians. But when it comes to proving theorems by themselves, I think the jury is still out really as far as I'm concerned.

Nash: Various people, including Terry Tao, have spoken of a "formalisation coefficient": the ratio of effort between writing a proof in Lean versus LaTeX. Tao observes that if this coefficient drops below one, we should see a phase transition where formalisation becomes the default. Can you foresee a future in which this occurs? What obstacles to this do you see today?

Kontorovich: The second that goes below one, everyone switches automatically. I go straight to the TeX analogy. In '79 or '80 or whenever it is, Knuth comes out with TeX and there's two people on planet Earth that use it. And then throughout the 80s, Spivak is really pushing AMS-TeX, and still nobody; there's like a dozen people using it. But by the early 90s everybody except my three main collaborators know how to use TeX. What do I have to do to teach a freshman how to TeX? You make a free Overleaf account and you open it up. You don't download anything. You don't need a tutorial, anything. You just open Overleaf and start saying stuff. I mean, these days, you handwrite something, you take a picture on your phone, the LLM TeXs it for you and puts it in Overleaf. You can absolutely handwrite something: just give it to Claude and say "Give me the LaTeX version of this". It writes out the TeX, you copy-paste into Overleaf and it works. So my collaborators who never learned how to TeX can now skip that part. They don't have to know what the symbols mean. They just compile it, they see on Overleaf the PDF, and the PDF they can read and say, "yes, that's what I meant to say".

Buzzard: I can't see the future. I can imagine that that will happen one day. I'm not going to dare putting a ... it's not going to be in the next couple of years as far as I can see. I think a lot will depend on what AI can do. For me that's the big unknown. There's something that's clearly incredibly powerful. Right now it's not solving this question at all, but it's moving very fast and will it move very fast in this direction or will it realise that it actually can't get to that direction because some of the problems are too [hard]. It's not inconceivable to see a future where it does actually work and mathematicians just pipe in their LaTeX and their [computer] fans run super hot and every now and then you get something underlined in red, and the computer saying "I don't follow why, can you just write another sentence with a clue for how this works?". And at end of the day you get to the end of your LaTeX paper and there's no red underlining and there's a Lean formalisation, written for you by the machine. There's the fantasy and that's not completely out of the question, but it's certainly not happening now and it might never happen.

Nash: One of the quirks of Mathlib is that it exhibits varying levels of coverage by mathematical area, partly driven by the tastes of early adopters. Things have levelled off as the community has grown but there might still be a case for a targeted development to kick-start certain areas. Do you see a case for a more strategic, top-down approach to seed development in specific, under-resourced fields? What are the potential benefits and drawbacks of such targeted initiatives?

Kontorovich: The only way I can see a top down seeding working is people have to want to do the thing. It's pretty hard to say, "hey, what do I have to pay you to get you to write a massive blueprint on covering spaces or whatever?". No one has ever come to me and said "I would really love a theorem about this. Could you write a paper on that so that I could use it?". We are all just completely driven by what we want to do. I think whatever we do, we should be very cognizant of how much we're asking: we want to get those 85% research mathematicians on board, and the more technological savvy we can pull away from the requirements of using Lean in general, the better.

Buzzard: Yes absolutely. For example, I think it's disgraceful that we don't have anything about PDEs [in Mathlib]. I don't see any drawbacks to targeted initiatives which attempt to develop fields as long as the right people are involved. I don't see it as being any different to the way there have been targeted initiatives before. It's just they weren't sort of branded like that. There were just a few people, reviewing each other's PRs, and who knew the right way to do something.

Nash: Computational mathematics has been essential for certain landmark results, like Helfgott's proof of the Weak Goldbach Conjecture, which included significant, carefully-checked computations. The Initiative could support a platform for _verified_ computation, where the code that performs the calculation is itself formally proven correct. How do you think you or your colleagues or your students might use such a platform?

Kontorovich: Yeah, if there's a theorem that requires, say, interval arithmetic. My kind of knee-jerk reaction is that I have all kinds of things that I want to happen and if I see someone who actually wants to do it and just needs a little bit of encouragement, then they go and do it and that's great. I'm wondering what support means, maybe put out a call. Hey, does someone want to spearhead this kind of a thing? And if someone picks it up, fantastic.

Buzzard: I've heard these horror stories, about people attempting to compute integrals by just piping them into a machine and it confidently prints out an answer to 10 decimal places. You kind of think, "great, the manual says it can do it and it's given me an answer, so, this must be right". And then it turns out the answer is quite inaccurate; it should never have been reported to so many decimal places, but it was anyway. I think the Rocq community is doing very well at turning their system into something that can be used as a computational tool. I do wonder whether verified computation is going to be something which opens the door to far more people. A big question there is can we actually get it to compete against a non-verified system which is multiplying floats together using functions on chipsets.

Nash: Defining success for a project like the Mathlib Initiative is challenging. One could track internal metrics like the growth of the library, or aim for external impact, like the percentage of new mathematics that is within reach of formalisation. If you had to choose a single North-Star metric to gauge the health and ultimate success of the formal mathematics enterprise, what would it be and why?

Buzzard: I think defining success is really important. You might say it's challenging but I say it's important. Growth of the library is not really something you can sell. Tackling modern research is the single most important thing. At the beginning that was a fantasy. At the beginning, the metric was how much of an undergraduate degree we can do. And I still think we should occasionally mention that as a metric, but for me "how much research mathematics can the system understand?" [is a key metric]. So I think Mathlib needs to have all standard modern definitions. I normally have a few missing definitions in mind at any given time. The list is updated continuously: Riemannian manifolds were recently added to Mathlib and so were dropped from the list. The examples I currently have are "de Rham cohomology, symplectic manifolds, and Heegaard Floer homology". So there's naught out of three on that metric.

Kontorovich: I think a really good one is what fraction of papers posted to the archive in a given year in pure math come with an appendix that's a link to the formalisation. So it's 0.0001 and then it goes to 0.001 and that's a big improvement.

Nash: What do you think Leibniz or Gauss or Hilbert would make of Mathlib and the Mathlib Initiative?

Buzzard: I get the impression that Gauss might be surprised that some mathematical proofs are so long! But Hilbert would have seen class field theory being developed, and that was a very long proof. I would imagine that Hilbert would just take one look at it and say, "if you've got machines that can do this now, then obviously this is the right thing to do". That's my instinct as to what Hilbert would think.

Kontorovich: Isn't Mathlib Hilbert's literal dream? He was just 125 years too early! I don't know that Euler or Leibniz would appreciate it. Gauss was on the cusp. The whole point is there were all these crises in the 19th century that forced us to get more serious about rigor and even if you go back and you read Newton, Newton knows that he's supposed to be doing things rigorously. So he's concerned about this, Leibniz apparently is not. So Hilbert would give it 10/10, Leibniz would give it 1/10 and Gauss somewhere in between.

Find out more about the Mathlib Initiative here.

Lucia Asanache

Kevin Buzzard and Alex Kontorovich on the Future of Formal Mathematics: A Mathlib Initiative Interview

Q&A With Laurence Holt on How AI “Unlocks” New Possibilities in Education

Introducing the Inaugural EU Cohort of the Big if True Science Accelerator