What If LLMs Really Could Learn from Just One Example?

Fast.ai recently uncovered a mind-bending pattern in fine-tuning LLMs: training loss curves showed sudden drops after just one pass through the training set — strongly suggesting that the model could memorize inputs almost immediately. This finding challenges long-held assumptions about how much data neural networks truly need.

How Neural Networks Typically Learn

Traditionally, neural networks require large amounts of data and many epochs to begin learning. A typical training loss curve shows slow, steady progress as the model gradually gets better at recognising input-output pairs.

A Very Odd Loss Curve

In this Fast.ai experiment, they fine-tuned an LLM on multiple-choice science questions. The result? Predictable “epoch-end” drops in loss — so consistent they were initially blamed on a bug. But the pattern persisted across different implementations, including custom training loops.

These curves looked nothing like the slow-learning curves typically expected.

Digging Deeper

Fast.ai’s working theory: the model isn’t learning gradually — it’s memorizing after just a couple of examples. The training curves showed large loss drops toward the end of each epoch, tracking almost exactly to where specific examples appeared in the batches.

These patterns make sense if the model memorizes infrequent examples when the learning rate is high, then overly confident predictions skew the validation loss.

How Could Memorization from One Example Be Real?

It might sound wild, but it’s plausible. Pretrained LLMs are likely sitting in parts of the weight space where optimization is extremely smooth, enabling massive parameter steps (aided by the Adam optimizer) from just a single example.

Because LLMs already encode complex abstraction hierarchies, adapting to a new piece of data might just be a tiny tweak — one that yields outsized changes in behavior.

What Comes Next?

If LLMs can memorize so quickly, many core training routines may need revisiting:

Catastrophic Forgetting becomes a bigger threat — seeing one rare example could overwrite more common patterns
Data Augmentation might no longer be effective at preventing overfitting if models memorize regardless of input variety
Strategies like dropout, stochastic depth, or mixing diverse datasets may become essential

This is one of those findings that seems like a curiosity until you realize how much of standard ML intuition it undermines. Worth watching.

References

Originally published on LinkedIn.