LLM Tips and Tools

Large language models (LLMs) utilize a form of neural network architecture to learn the relationships between words and predict the next word in a sentence (more general than words with multimodal models but words and sentences are a fine mental model to have). Below are some resources and tips I am collecting to better use LLMs in my workflow.

Contents:

Quick Tips
Components of the LLM interface
Sites and Software Tools
Using LLMs to help with Programming
Ongoing Random LLM Related Links

Related Blog Pages

Quick Tips

Unless I describe a specific model (e.g. Google’s Gemini), you can assume the tips I list in this post are very generic to most LLMs and gathered from various sources.

Prompts matter (see LLM Prompts). There is a very high return on your time spent iterating on a prompt with an LLM to make it better.
LLMs are good at role playing — tell them exactly who they are, what their task is, and what form the output should take.
Context matters (see Programming with LLMs (not just generating code with a chatbot) and Retrieval Augmented Generation (RAG)). LLMs have a lot of loose knowledge baked into the weights of the model, but you can often get a large increase in performance by giving the model more specific information about what they are working on.

Sequence of Operations of an LLM Interface

The stylized model of an LLM (or most generative models) is this:

The model operates in a high dimensional latent topic space, where tokens, phrases, and ideas live and can be related to each other through vector addition.
Your prompt is converted into a sequence of token vectors. This sequence is the “context” and is essentially a series of vector additions that charts a path through the model’s latent topic space.
The model takes your context vector sequence as a starting pathway through the latent space and predicts where the next vector should point (predicts the next token vector).
The predicted token vector is appended to your context vector, adding to the original path, pointing the model in the direction of the new combined meaning of the updated appended context.
The model resets to its initial state and the new appended context is used as fresh input. New token vectors are iteratively predicted and appended to the context path, charting a pathway through the latent topic space.

We can think of the starting prompt as the first, predefined leg of a journey, and the model is predicting each new step of the journey. This mindset is important when designing prompts that involve intermediate steps — we are asking the model to take small pitstops in the latent topic space in order to help future model steps point in a more specific or valid direction.

When you interact with an LLM in a typical interface (e.g., the chatGPT app, the Gemini website), there are many pre- and post-processing steps taking place outside of the main LLM model described above. Here is a general outline of those steps.

Prompt: Your prompt is entered as a string
Additional Context: If you are using RAG (see below), then the RAG procedure retrieves additional text chunks that are related to your query. For example, many models can search the internet for additional content or you can upload documents. The additional text is appended to your prompt string.
Tokenization: The full prompt string is tokenized — partitioned into smaller sets of characters or words like “izer” or “science” or “?”. Using a token dictionary, these tokens are converted to preset token vectors (created by another specialized LLM called an “embedding model”). These preset token vectors make up the basis vectors of the token vector space (they do not technically qualify as a basis because there can be slight overlap in them, but conceptually, that is what they are). Your full prompt is now a sequence of token vectors.
Next-Token-Vector Prediction: The LLM uses the sequence of input token vectors to predict the next token vector in the sequence. This vector lives somewhere in the full space of token vectors — a linear combination of the basis token vectors).
Token Selection/Sampling: The predicted token vector is projected onto the set of basis token vectors to calculate how “close” the prediction is to different tokens in the vocabulary of the model. This projection is converted to a probability distribution over the set of preset tokens. The probability distribution is sampled once to select one of the preset token vectors. A user can set the “temperature” of this sampling to make it more likely to sample outside of the mode of the distribution.
Sequential Prediction: The selected prediction token vector is appended to the list of input token vectors, and the LLM predicts the next token vector in the sequence. This is repeated until the model predicts a stop token vector or the model gets close to the output length limit (context window length).
Detokenization: The sequence of output token vectors is converted back to string tokens using the token dictionary. All tokens are concatenated together and given as output. If you are in the browser, you are probably seeing each predicted in sequence; in the API, you get the entire sequence back at once.

If you want to learn more about the mechanics and architecture of the LLM, I highly recommend 3Blue1Brown’s youtube series. The description of LLM “attention” is quite interesting.

Sites

Learning about AI tools

Anthropic blogs (lots of great tutorials and best practices)
deeplearning.ai Courses (free while they are beta, might not last forever)
LangChain AI workflows (python tutorials)
- LangChain for LLM Application Development

Agents / Agentic Workflows

GumLoop workflows
LangBase / LangChain are often considered best practice to compose an LLM pipeline

AI Chat Bot Sites and LLM tools

I have been recently focusing on Google’s LLM tools. There are other tools made by OpenAI (ChatGPT) and Anthropic (Claude), but I’ll focus here on what I’m most familiar with.

Google

Gemini

Gemini 2.5 currently has the largest context window of publicly available Large Language Models (LLMs) and the browser-based chat interface allows for “thinking” or reflection. This means you can put a lot of documents/text into the prompt and get a lot of text out which can be very powerful. However, without careful crafting of the prompt and input sources, the model (and other models) can get “lost in the middle” and not be able to process the input sources as well as you would like. Taking time to iterate on a very clear, precise, and detailed prompt can give you much better results. See the Prompts section for tips on creating prompts.
If you have a .edu email account, you can get Gemini Pro and other features for free through June 2026. You can verify your .edu account for a different email, in case you already have a dedicated google account.
Deep Research Mode: This feature searches the web and creates a detailed report on your query. I find it really helpful to quickly go deep on a topic and provides direct links and quotes of sources. The power is that it iteratively goes into deeper searches based on identified gaps in its research. it You can have up to three deep research queries running at a time, so I often will start reading a report, realize there’s a deeper area to investigate, and start a new deep research query on that while I read the first report. My experience is that it can often uncover details (news articles, government reports, local scandals) that I almost surely would have not found. Prompting is important here to direct its effort.

Google’s AI Studio

This site has a few benefits beyond the plain Gemini website, mostly helping with testing the abilities prior to programmatic access.

First, you can select from any of the google models to test them out for free. On the Gemini website, you can only select from a few of the most recent Gemini models, and you cannot test their cheapest ones. There are even some small models (Gemma) that you can test on the studio and use for free on the API. This is helpful if you plan to use them via API but want to test different model’s capabilities/accuracy before you commit or make expensive API calls.
Second, you can control aspects of the input/output like “temperature,” the “top-P” setting, and the maximum output token length. These variables define the randomness or deterministic nature of the output.
Third, you can use the Stream feature to share your screen with Gemini and discuss a problem without having to spend a lot of time writing up the necessary information. You can even chat over audio.

Notebook LM

NotebookLM Reddit

NotebookLM allows the user to upload lots of documents and submit queries directly on those documents. Under the hood, this uses a vector database to digest and store all the uploaded documents, and then compares your query against those docs (see RAG for more details on vector databases). If you sign up for Gemini Pro (free for .edu folks, see above), then you get a lot of space in each notebook (300 docs, up to 5000 pages each). As for all queries, you should spend some time on the prompt.

Tip: long prompts won’t fit into the prompt in NotebookLM, but you can save a long prompt as a source (add a note, save that as source) and then submit something like
Please read the source named "Notebook Query" and follow the prompt in there. It will require thinking.

You can generate a “podcast” of some topic that is covered in your sources. Again, the prompt is super important here.

Anthropic

Claude has some great knowledge management features. Project knowledge, the system prompt, and the project prompts are great ways to add context to all your prompts in a given project without having to append all the same information each time. The project knowledge feature uses a vector database similar to Google’s NotebookLM. So a Claude project is similar to a NotebookLM notebook, except you can use better models in a Claude project, whereas in NotebookLM, you are stuck with whatever the current implemented LLM is (generally one generation behind the current public frontier model). I would suggest using the desktop client version of Claude because you can give it access directly to project files on your computer.

Claude Code can be activated directly in the terminal and is great for starting projects from scratch and doing challenging file parsing / moving tasks. Claude Code “see” your whole project directory and create an execute code. It can create, edit, move, rename, and delete files a bit easier than using Claude in Agent mode in the Cursor IDE. For anything but the most simple tasks, I would suggest writing a detailed prompt in a text file and asking Claude Code to “follow the instructions in X.txt”.

OpenAI

ChatGPT. We all know this one. Nothing particularly different than the base Gemini and Claude models since they are all getting better every few weeks and are all integrating each other’s features. So now they all have internet search and artifacts. I would say that I lean on Gemini more for code now because I have had issues with ChatGPT and Claude trying to edit an existing code artifact, and accidentally deleting important parts or

Codex is OpenAI’s coding partner — competitor to Cursor.

Using LLMs to help with Programming

Cursor IDE has integrated LLM access with the capabilities to edit your code, create new files, run code, see errors, and iterate on a problem. Cursor is built on top of VS Code. There are other editors: windsurf for example. Cursor has access to all the usual LLMs, so you can choose which LLM you would like to use for a specific task/edit.
Codex is OpenAI’s coding partner. I haven’t used it much because I started using the Cursor IDE had more features when I started (and I have some amount of lock-in in the short term).

Ongoing Random LLM Related Links

Overviews

Substack Series on LLMs: Explore everything about LLMs—from their foundations to their future

Prompts

See the Prompt Links at the end of my LLM Prompts page.

How do Transformers work / Transformer Architecture

3Blue1Brown’s youtube series

My ongoing page on the Transformer Architecture

The Illustrated Transformer (and their related book and free course)

deeplearning.ai Courses (free while they are beta, might not last forever)

Retrieval Methods (RAG)

My ongoing page on Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (by Prompt Engineering Guide)

Agentic Programming

My ongoing page on Programming with LLMs (not just generating code with a chatbot)

12-Factor Agents – Principles for building reliable LLM applications (agentic coding from someone deep in the agent-building industry)

Pre-trained LLMs for Specific Topics

ClimateBert (also have a chatbot ChatClimate – grounded on the latest IPCC Report)
NLP in the Field of Climate Change tutorials (youtube video)

Miscellaneous

Another random blog post with some good tips for LLMs and AI programming

Top AI Research Tools (reddit)

LLM Tips and Tools

Quick Tips

Sequence of Operations of an LLM Interface

Sites

Learning about AI tools

Agents / Agentic Workflows

AI Chat Bot Sites and LLM tools

Google

Anthropic

OpenAI

Using LLMs to help with Programming

Ongoing Random LLM Related Links

Overviews

Prompts

How do Transformers work / Transformer Architecture

Retrieval Methods (RAG)

Agentic Programming

Pre-trained LLMs for Specific Topics

Miscellaneous

Published by acwatt

One thought on “LLM Tips and Tools”

Leave a comment Cancel reply

Quick Tips

Sequence of Operations of an LLM Interface

Sites

Learning about AI tools

Agents / Agentic Workflows

AI Chat Bot Sites and LLM tools

Google

Anthropic

OpenAI

Using LLMs to help with Programming

Ongoing Random LLM Related Links

Overviews

Prompts

How do Transformers work / Transformer Architecture

Retrieval Methods (RAG)

Agentic Programming

Pre-trained LLMs for Specific Topics

Miscellaneous

Share this:

Related

Published by acwatt

One thought on “LLM Tips and Tools”

Leave a comment Cancel reply