Over the weekend, I used Claude Code agent teams to rewrite a popular open-source scientific simulation package in optimized Python. Refactoring resulted in roughly a million-fold reduction in wall-clock time and a 15-30% improvement in worst-case numerical accuracy. Collaborators are scaling to multi-year global simulations that would have taken years on a high-performance computing cluster. It now takes weeks.
I want to walk through the team structure, the collaboration patterns that made it work, and a few additions to the workflow that I’ve found helpful for long-running or overnight sessions.
1. Why the gains are that large
That magnitude of improvement sounds absurd, but it’s common in scientific codebases. These packages are written for readability and scientific understandability, not computational performance. There’s usually a stack of well-known mathematical and algorithmic improvements that nobody applied because being expert in both the science and the computer science was too expensive. That constraint is loosening.
Many of us have seen a glimpse of this when pasting responses back and forth between different Claude or Gemini chat windows. Maybe you’ve been discussing several niche topics with different agents, pasting between them to iterate on a unified solution. Keeping context focused in one chat and not overwhelming it with distracting content is powerful. This is the same principle as having a team of experts focused on their specific domain who can collaborate and share insights to find the best solution. Claude Code agent teams formalize that pattern.
2. The team structure
Open Claude Code with teammates enabled and set the team lead to Opus 4.6. Have the lead create persistent teammates dedicated to specific jobs:
- Code implementation — the only agent editing the main codebase. This separation matters. When multiple agents edit the same files, you get merge conflicts and regressions. One implementer, clear ownership.
- Testing and validation — writes and runs tests, never touches main code. This agent’s incentives are pure: its only job is to find problems. It doesn’t have the implementer’s attachment to the current approach.
- Documentation and history — writes and updates docs and GitHub issues. This is the team’s historian. Without it, agents lose track of what has been tried before and start repeating failed approaches.
- Hardware expert — optimization and scaling. Knows about memory layout, cache behavior, parallelization strategies, and what’s worth optimizing vs. what’s already fast enough.
- Mathematical expert — algorithmic improvements and numerical methods. Knows the difference between a textbook algorithm and one that’s numerically stable in practice.
- Package expert — lives inside a particular complex dependency (e.g., Python GPU packages like CuPy or PyAMG). Helps other teammates know exactly what’s implemented and troubleshoots when the package gives an unexpected error. This saves enormous time compared to every agent independently reading the same library docs.
Not all of these roles apply in every session. What I’ve found valuable is giving the team lead a menu of teammate personas it can activate as needed. Whenever I hit a pain point, I make an expert for it.
3. The collaboration instruction
The key instruction: tell the team lead to have teammates collaborate. They should stay in their own lane and respect their duties, but be proactive about asking other teammates for help or their opinion.
I’ve sat back and watched two expert agents work with the implementer and documentation manager to iterate on a complex problem, test many potential solutions, and come out at the end with a consensus implementation that was lean and well-documented. The math expert might propose an algorithmic change, the hardware expert evaluates whether it’s worth the complexity on the target architecture, the implementer writes it, the tester validates it, and the documentation agent records what was tried and why.
Pair this with a server or cluster where you can send many parallel test jobs, and the team can be incredibly productive and self-sufficient.
4. Powerboost: the /loop command
For long-running or overnight sessions, the /loop command keeps the team lead checking in on a schedule. I use 30 minutes. Here’s the kind of instruction I give:
/loop 30min You are the team lead. Keep your context lean by aggressively delegating to existing or new teammates. Remind teammates of each other’s roles and encourage them to communicate to find the best solution. Teammates should send you updates when they change approach and ask permission before significantly changing direction. You are their supervisor, but give them appropriate autonomy. They should be able to explain and justify any changes to you. You are responsible for keeping them headed toward the session goal.
This turns a collection of agents into something closer to an actual team. The lead isn’t doing the work — it’s steering, delegating, and making sure nobody drifts.
5. Context management through teammates
When it’s clear that important details are being forgotten in a long-running session, creating a teammate to manage that specific domain is a great way to keep that information accessible to the team without forcing them to re-read docs over and over (which pushes other important details out of context). You can put a teammate in charge of the design of the pipeline and keep them from actually writing the code and dealing with the errors and documentation, that teammate can now be an effective strategist to keep the team on-track. This export-context pattern works across many context issues.
By being deliberate about what tasks in a session need highly persistent memory and creating a dedicated teammate for that, you free the other agents to focus on what they’re good at. The implementer doesn’t need to remember the three approaches that failed last hour. The documenter does that.
6. Supercharging experts with pre-extracted resources
I like to give expert agents pre-extracted information from my resource library. I have an agentic system that takes textbooks and research papers and extracts markdown-file contents with pointers to the raw information. For example, I have a GPU programming textbook and several matrix algorithms textbooks that I’ve extracted this way.
I’ve built an extraction guide for the agent system so they can set up parallel subagents and teammates to do in-depth extraction. The resulting markdown files go to my planning agents and local expert teammates to make them more effective. A math expert that has the relevant chapters from a numerical methods textbook in its context is dramatically more useful than one working from training data alone.
7. Planning with browser agents
I often start on Claude or Gemini in the browser when I’m in the design phase of a project or major feature. I find the browser chat agents are better at searching for resources than Claude Code because they have methods for bulk search and website extraction that the Claude Code research agent doesn’t have.
Throughout the planning process, I ask: “Are there textbooks or research papers you don’t have access to that I can extract content from for you? In addition to general information extraction, do you have any focus areas you want the extraction agent to keep in mind?”
I get those resources, use my extraction team to generate the markdown files, and have the planning agent generate a query for the extraction agent — keeping the context of my planning agent lean while giving it access to exactly the information it needs.
8. Productivity Gains
A big productivity gain came out of the reduction of my own context-switching between implementation details. I spend most of my time on questions like “what is the highest-priority task right now” and “what can we complete today that has the largest marginal impact for collaborators.”
The team handles execution and troubleshooting. I handle direction.
9. What I’m still figuring out
The team structure is something I keep iterating on. Some sessions need a hardware expert and no math expert. Some need two package experts for different libraries. The right configuration depends on where the hard problems are, and I don’t always know that upfront.
If you’re experimenting with multi-agent setups for technical work, I’d be interested to hear what team structures you’ve landed on. What roles ended up essential, and which ones turned out to be noise?