Foundation Models

Apr 19, 2025

Neural foundation models are coming (some are already here). What are the promises and pitfalls of this way of thinking?

Foundation what?

Unless you’ve been living under a rock, you’ve probably heard about foundation models. What are they? and what do they mean for neuroscience? foundation models are large AI models that form the base of specialized tasks. For example, you’ve heard of ChatGPT. Chat GPT is a chat bot trained on top of a foundation model (GPT-3 or 4, 4o, etc.). The amount of compute needed to build a specific application (like the chat bot) is small compared to the generalized pretraining (no that’s not what GPT stands for). GPT gets at the fundamentals of the idea though. It is Generative Pre-trained Transformer. The idea behind GPT is that you train a transformer on a large dataset to do next token prediction.

We’re all well aware the there have been major breakthroughs in AI in the last 10 years. Prominant members of the AI community will claim that vision is “solved” andlanguage is “solved” and now we know what the recipe for success is: next token prediciton.

It’s really that simple. Transformers are magic for taking advantage of current hardware architectures and they can do highly dynamic and nonlinear processing (cite polysamenticity post).

We understand now that major breakthroughs are possible in science (see Alpha fold) if we can format the data correctly. If we can turn a scientific question into next token prediciton, then the engineers can do what they’re good at. We already sort of do that in neuroscience when we try to train data-driven models to predict spiking activitiy. We want the best p(spike|everything else).

We will see lots of efforts to replicate this success in neuroscience. This will primarily be funded by private foundations and philanthropy because study section has little appetite for this way of thinking, but it’d definitely coming for neuroscience.

Here’s the problem. Any afficionado of LLMs will be able to tell you that next token prediciton is only as good as the data and I think this is a problem for neuroscience, because neuroscience data is not very good.

Recording from brains is hard. And every tool we have has major limitations. fMRI is barely measuring the plumbing of the brain [cite]. It’s a decent proxy for the underlying computations, in teh same way that you can use things like temperature and fan speed to detect nefarious computation on computers. But it’s limited. What about 2phoon? It’s slow and suffers from a myriad of artifacts realted to the particulars of how it is imaged. Okay, spikes then. They are the gold standard and they