Mid-scale science

Progress in science depends on new techniques, new discoveries, and new ideas, probably in that order.
— Sydney Brenner

The Challenge

Traditional research models are not designed for the mid-scale investments needed to accelerate the pace of science and innovation

The development of new techniques, tools, and datasets sometimes requires mid-scale investment: focused support of tens of millions of dollars that falls between grants to individual principal investigators (PIs) and “big science,” like the Large Hadron Collider or James Webb Space Telescope. These investments have the potential to move an entire field forward, solve a range of important scientific and societal challenges, and reduce the time and cost associated with an end-to-end innovation process, such as developing new materials or engineering microorganisms to create a circular economy.

Currently, government-funded mid-scale grants often support “centers” or “hubs” with 20 or more PIs from multiple universities. Every PI receives funding for roughly one postdoctoral researcher or graduate student. These centers enable loose collaboration, which can be powerful for open-ended discovery in an area, but not the pursuit of projects that require unity of effort and tightly organized collaboration, such as the development of a new scientific instrument or a large, high-quality dataset. University pay scales and structures also make it difficult for such centers to recruit and retain professional engineers, such as chip designers or machine learning engineers.

Mid-scale projects may be growing in importance because of the potential of AI to accelerate the pace of scientific discovery. Unlocking the potential of AI for Science will require investments such as foundation models for science, large, diverse, high-quality datasets used to train AI models, platform technologies that lower the cost of generating the data, self-driving labs that allow for rapid iteration between computational and experimental approaches, and modern, production-quality software that integrates AI, simulation, and design.

The Play

Given the potential flexibility of their funding, philanthropists and foundations can work with the research community to not only identify high-impact midscale projects, but to co-design the right mechanisms to fund, organize and incentivize the R&D

There are many types of mid-scale science projects. In some cases, these might make sense to create and sustain as “public goods,” such as open datasets, or broadly-available user facilities. In other instances, the research may lead to a startup or a new commercial product or service. Although these are described below as distinct types of projects, many initiatives may involve combinations of these approaches.

1.Datasets and benchmarks.

Open-access datasets and benchmarks are allowing researchers to train powerful AI models. For example, the Protein Data Bank has enabled Nobel Prize-winning advances in both protein structure prediction (AlphaFold), and protein design (RFDiffusion). EvE Bio’s “pharmome” is mapping the unintended targets of small-molecule pharmaceutical drugs. This open dataset could help researchers predict the negative side-effects of drugs before they reach patients, and identify opportunities to repurpose existing drugs to treat a broader range of diseases. Align to Innovate is working with the research community to define and create datasets improving “structure to function” prediction using AI. You can read more about the importance of datasets and benchmarks in our playbook on the Common Task Method.

2. New platform technologies for both imaging/characterization/measurement and synthesis/fabrication/perturbation.

In many instances, our ability to understand some complex phenomena (e.g., the architecture of the human brain with respect to memory, perception, problem-solving) is limited by existing research tools. Investing in the development of new and improved tools, or lowering the cost of existing tools, can have a transformational impact on a field. Examples include a reduction in the cost of sequencing the human genome from $100 million to $100, or the impact that electron microscopes and the ability to see individual atoms has had on materials science.

3. New foundation models for science.

Although Large Language Models have been trained on text, researchers are beginning to train foundation models on scientific data. For example, Tatta Bio is developing genomic language models that are trained on trillions of base pairs from metagenomic datasets, and that can predict novel types of protein-protein interaction. It may be difficult for academic researchers to train these foundation models in the absence of philanthropic support if they require large datasets, expensive GPU clusters, and professional ML and software engineers.

4. Automation for science.

Although not all experiments can be automated, scientific automation has a number of potential benefits. It can allow researchers (particularly graduate students and postdocs) to spend more time on the design of experiments and the interpretation of results, as opposed to repetitive manual tasks. Automation can also increase the reproducibility of research results, and make it easier for researchers to incrementally increase the size of a dataset if it is particularly valuable for training an AI model. “Cloud labs” (remote access to automated scientific equipment and reagents) can expand the number of experiments that any single researcher can take advantage of. “Self-driving labs” (integrated combinations of AI, automated equipment, software for the orchestration of scientific workflow),can accelerate the pace of scientific discovery. AI can create a system of closed loop experimentation by identifying the most valuable experiment to do next. In many fields, human judgment and intuition are still playing an important role, so developers of self-driving labs are working on designing the right forms of human-machine interaction.

5. Modern scientific software.

Although university researchers develop academic prototypes of scientific software, they often lack the incentive, talent and funding to develop and maintain “production-quality” code. In many instances, researchers also lack the resources to rewrite legacy code so that it is written in a modern language, optimized for modern computer architectures such as GPUs, and is designed to be integrated with AI. For example, some researchers are producing scientific software so that key parameters in a model can be “learned” and therefore improved over time.

6. Foundries or shared facilities.

Some resources needed for scientific research are expensive, or also require access to specialized expertise. For example, in the 1980s, DARPA accelerated progress in the field of microelectronics with a program called MOSIS, which gave academics and small businesses access to the ability to prototype new chip designs.

7. Sector-specific public goods.

One example is the Fusion Prototypic Neutron Source (FPNS), which will allow research to discover radiation tolerant materials for fusion reactors. Investment in an FPNS would benefit many fusion companies, in the same way that federal investment in wind tunnels in the 1930s strengthened U.S. leadership in aeronautics.

FROs
An example of mid-scale science in practice is the Focused Research Organization (FRO) model. FROs build a startup-inspired, non-academic team and a transformative goal with defined deliverables rather than merely a theme or set of questions. Convergent Research is a non-profit studio and parent organization for FROs and defines them as follows:

Focused
FROs pursue pre-specified, quantifiable technical milestones rather than open-ended, basic research. Teams are funded to aggressively pursue these milestones within a finite time – usually 5 to 7 years – to avoid mission creep and preserve focus.

Research
FROs undertake research with technical risk to produce high-impact public goods in key areas of science and technology. They can result in products like massive datasets, next-generation analytical devices, and open-source experimental protocols. FRO projects are technically ambitious and often engineering-heavy.

Organizations
FROs are led by a full-time founding team, and typically consist of 10 to 30 scientists, engineers, and operational staff. FROs execute like the best deep-tech startups: tight-knit, fast-moving, and mission-driven.

FROs focus on challenges that academia, government, and industry are ill-equipped to solve, pursuing transformational capability development that can unlock and un-bottleneck follow-on investigation. FROs often attempt to unlock fundamental capabilities for a field, leading to wide-ranging downstream effects and applications in both for- and non-profit projects.

Because FROs are time-bound and focused, teams must execute at pace, and as they near completion, FROs actively disseminate and deploy the public goods they create into the real world, whether by open-sourcing data, spinning out or transitioning into one or more follow-on nonprofits or startups, or partnering with larger institutions or projects.

Case Studies

By supporting mid-scale science, philanthropists are making scientific discovery more open, accessible, and efficient.

E11 Bio

E11 Bio is a FRO developing a scalable technology platform to lower the cost of mapping the structural and molecular architecture of the brain, to unlock insights into how the brain functions and what causes brain diseases.

In its pilot study, E11 Bio is mapping part of the hippocampus region of a mouse brain. The ability to map out the brain of a mouse, which has a human-like brain structure, could unlock greater understanding of the human brain and result in the development of next-generation therapeutics for neurodegenerative disorders as well as new approaches in AI.

After mapping out the mouse brain, researchers will need to chart out millions of connections between individual cells in the brain. Previous research did so with the brains of flies, but manually scaling past findings would take a significant amount of time, money, and other resources. E11 Bio’s solution to this problem is PRISM, a technology that can map out brain cell wiring and connections, whose beta version is anticipated to be available early next year.

E11 Bio’s methods for highly multiplexed imaging in expansion microscopy found success through the FRO model | E11 Bio

Cultivarium

Only a small number of microbes, called model organisms, are currently accessible to be used in synthetic biology, biomanufacturing and scientific study. Cultivarium is an FRO building open-source tools to make it easier to access novel microorganisms. This project has the potential to broaden the frontiers of biotechnology with downstream applications in climate, food, sustainable chemical production, disease research and other areas.

“We are a new type of nonprofit startup for science research and development: a Focused Research Organization (FRO),” writes Nili Ostrov. “That means we are a non-profit organization that thinks and works like a startup. This unique structure helps us effectively invest time and resources into solving the foundational problem in the field of non-model organisms and to release open-source data, protocols, and tools on our portal for the benefit of the scientific community.”

The tools Cultivarium creates support researchers with recipes to culture cells, parameters to penetrate cell walls, methods for genome engineering, and more for 170,000+ species of organisms. Traditionally, selecting organisms for research is a trial-and-error journey that often lacks funding and can span multiple years. If a researcher wishes to start studying a new organism, they would first need to tackle challenges like establishing a culture protocol and understanding an organism’s genetics.

Cultivarium is creating opportunities for researchers to access necessary data and information to minimize these pain points and provides a platform for scientists to provide feedback and insights on each other's work. This project has the potential to broaden the frontiers of biotechnology with downstream applications in climate, food, sustainable chemical production, disease research and more.

Cultivarium’s portal is meant to address a foundational problem in the non-model organism field, making scientific discovery in this space more accessible for researchers | Cultivarium

Align to Innovate

Align to Innovate is a research non-profit that is defining and creating “massive, open, and living” datasets. Their initial focus is on datasets and tournaments that can result in predictive models for biology – e.g., given the sequence of a protein, what function will it have?

The Open Datasets Initiative is accelerating the use of automated labs to curate high-fidelity, AI-ready biological datasets. It does so by:

Establishing collaborative teams of scientists, machine learning specialists, and automation experts to pinpoint critical datasets that should be collected in life science.
Developing peer-reviewed proposals for high-throughput measurement techniques to best collect data.
Funding the collection of open datasets and providing project management.

The Protein Sequence to Function dataset aligns dataset creation so that this data can be used to advance the life sciences field | Align to Innovate

The Tournament is a protein engineering competition providing an open platform to foster collaboration, evaluate progress, and establish benchmarks for computational scientists. The tournament results in open-source, dynamic datasets and standardized, automated protocols that the community can continue to benchmark in the future.

How we can help

Renaissance Philanthropy is interested in working with philanthropists and scientists to identify transformative opportunities for mid-scale science. This could involve:

Supporting researchers that have an idea for a mid-scale project, and giving them the time that they need to develop a technical roadmap, and seed funding to develop early prototypes;
Sponsoring a white paper competition to surface ideas in a given field;
Orchestrating multi-donor funding for projects that can move an entire field forward, help solve important problems, and reduce the time and cost of some critical end-to-end innovation process such as drug discovery or materials discovery; and
Making the case for impact investing in startups developing platform technologies (traditionally, most VCs are more interested in supporting drug companies that use new platforms, as opposed to companies developing new platform technologies and making them broadly available).

Resources

About Focused Research Organizations (Convergent Research)
Architecting Discovery (Ed Boyden and Adam Marblestone)
Focused Research Organizations: A New Model for Scientific Research (Federation of American Scientists)
The Automated Lab of Tomorrow (Proceedings of the National Academy of Sciences)
What Can Biology Learn from Physics (Erika Alden DeBenedictis)