There are new AI architectures that are coming out that may be the next generation of new foundation models. I’m going to keep a running list of the promising ones here.
The Dragon Hatchling (BDH)
This new architecture was published in September 2025 and claims to be a closer representation of the brain. One of the main advantages is that it has unlimited context. Current transformers have a fixed context length that determines how long your input (and output) can be. The model uses the entire prompt and previously output tokens to predict the next token, explicitly storing the entire history of tokens to use in the next prediction. And because transformers train on the relationship between the last token and all previous tokens, transformer prediction quality can decrease with context length if the training data does not have enough examples of long-context data. It can get caught in circles, repeating the same steps of the process.
However, this new architecture stores the context in an evolving state, rather than an explicit history that grows as the model continues to output. Practically, this means that the model can continue with the task indefinitely, keeping track of where it is in the task using the evolving state. Theoretically, it can also get stuck in loops if the evolving state can be cyclic, but the memory is designed so old information that is relevant to the current task retains a strong signal. I am guessing that this helps prevent the model from getting stuck repeating itself (as long it’s been well trained in a task that is close enough to the target task).
The current version isn’t ready for the stage yet though — the quality is so far only benchmarked at GPT-2. But if this can be scaled, it might be a new standard architecture for long-running tasks. Keep an eye out.