/ blog·/ ai-foundations·/ ai-llm-glossary-terms-every-developer-should-know

AI FOUNDATIONS

AI & LLM Glossary: Terms Every Developer Should Know

AI & LLM Glossary: Terms Every Developer Should Know

A plain-English field guide to AI and LLM terminology - definitions, analogies, and context for terms like RAG, RLHF, embeddings, and prompt engineering, without needing a CS degree.

Anton K.

AI Foundations · @anton

AI & LLM Glossary: Terms Every Developer Should Know

FIG. 01 ·AI Foundations — May 15, 2026

Introduction

If you've been around tech lately, you've seen the alphabet soup: AI, ML, LLM, RAG, RLHF. It sounds like someone hit random keys on a keyboard. But beneath the buzzwords are real concepts - and once you understand them, the whole field clicks into place.

This guide is for anyone who wants to understand AI terminology without a computer science degree. Each term comes with a plain-English definition, a real-world analogy, and enough context to actually use the word in conversation. Think of it as your field guide to the AI forest.

Let's start from the ground up.

Foundations

AI (Artificial Intelligence)

Definition: The broad field of making machines that can perform tasks that normally require human intelligence - things like recognizing images, understanding language, or making decisions.

Analogy: AI is like "transportation." It's the umbrella term that covers everything underneath it. Cars, trains, planes - they're all transportation. Similarly, chatbots, recommendation systems, and self-driving cars are all AI.

Example: Netflix recommending a show is AI. So is a spam filter. So is ChatGPT. They're very different systems, but they all fall under the AI umbrella.

ML (Machine Learning)

Definition: A subset of AI where machines learn patterns from data instead of following explicit, hand-written rules.

Analogy: If traditional programming is giving someone a recipe to follow step by step, machine learning is showing them a hundred dishes and letting them figure out the recipes themselves.

Example: Instead of writing rules like "if an email contains the word 'FREE' and has three exclamation marks, mark it as spam," you feed a model thousands of emails labeled "spam" or "not spam" and let it discover the patterns on its own.

DL (Deep Learning)

Definition: A subset of machine learning that uses neural networks with many layers - hence "deep." It's particularly good at tasks like image recognition, speech processing, and language understanding.

Analogy: If ML is "learning from examples," DL is learning from examples through a multi-stage filter system. Each layer picks out something more specific: the first layer notices edges in an image, the next notices shapes, the next notices faces, and so on.

Example: When your phone unlocks by recognizing your face, that's deep learning. The model processes your face through dozens of layers, each extracting more detailed features, until it can confidently say "yes, that's you."

Neural Network

Definition: A computing system inspired by biological brains, made of layers of interconnected nodes (neurons). Each connection has a weight, and the network learns by adjusting these weights.

Analogy: Imagine a factory assembly line. Raw materials (input data) enter at one end. At each station, a worker examines the product, makes a small decision, and passes it along. By the time it reaches the end, a final decision has been made - "this is a cat" or "this email is spam." The workers get better over time by learning which signals matter most.

Example: A simple neural network for detecting fraudulent credit card transactions might look at: transaction amount → location → time of day → purchase history. Each layer combines these signals to produce a fraud probability.

Model

Definition: A trained neural network - or any ML system - that can make predictions or generate outputs. The model is the "frozen" result of training: the architecture plus all the learned weights.

Analogy: Think of a model as a chef who has finished culinary school. The architecture is the kitchen. The training is years of practice. The model is the chef themselves - ready to cook (make predictions) on their own.

Example: GPT-4, Claude, and Llama are all models. They were architectures that went through training and emerged as systems that can answer questions, write code, and hold conversations.

Training

Definition: The process of teaching a model by showing it data and adjusting its internal weights to minimize errors. Training is computationally intensive and can take days or weeks on thousands of GPUs.

Analogy: A student solving practice problems. They attempt an answer, check if it's right, and adjust their understanding. After thousands of problems, they've internalized the patterns.

Example: Training a language model involves feeding it trillions of words from the internet, books, and articles. At each step, the model tries to predict the next word, gets feedback on whether it was right, and adjusts. Over billions of iterations, it learns grammar, facts, reasoning, and style.

Inference

Definition: Using a trained model to make predictions or generate outputs on new, unseen data. Inference is what happens when you actually use the model - it's typically much faster and cheaper than training.

Analogy: If training is studying for an exam, inference is taking the exam. The student (model) has already learned the material; now they're applying that knowledge to new questions.

Example: When you type a question into ChatGPT and get an answer, that's inference. The model was already trained - it's now just running its learned patterns on your specific input. Every API call to an AI model is inference.

LLM Core

LLM (Large Language Model)

Definition: A neural network trained on massive amounts of text data to understand and generate human language. "Large" refers to the number of parameters - typically billions or trillions.

Analogy: An LLM is like someone who has read a significant portion of the internet and developed an intuitive sense for how language works. They haven't memorized everything, but they've absorbed patterns of grammar, facts, reasoning styles, and even humor.

Example: GPT-4, Claude, Llama, and Gemini are all LLMs. They can write essays, answer questions, translate languages, write code, and more - all from the same underlying model.

Token

Definition: The basic unit of text that an LLM processes. A token can be a word, part of a word, or even a single character, depending on the language and the tokenizer.

Analogy: Think of tokens like LEGO bricks. Some words are a single brick ("cat"). Others need several bricks ("un-believ-able"). The model doesn't see whole words - it sees the bricks they're built from.

Example: The sentence "I can't believe it!" might be tokenized as: ["I", " can", "'", "t", " believe", " it", "!"] - 7 tokens. Roughly speaking, 1 token ≈ 4 characters or ¾ of a word in English.

Tokenization

Definition: The process of breaking text into tokens that the model can process. Different models use different tokenization strategies.

Analogy: Tokenization is like a translator converting a sentence into a code the model understands. Instead of "Hello world," the model sees [15496, 2159] - numbers that represent those token fragments.

Example: The word "running" might be split into ["run", "ning"] because the model learned that "-ning" is a common suffix. But "the" stays as one token because it appears so frequently.

Context Window

Definition: The maximum amount of text (measured in tokens) that an LLM can process at once - including both your input and its output. It's the model's working memory.

Analogy: Your context window is like a desk. A small desk (4K tokens) means you can only have a few documents open at once. A huge desk (200K tokens) means you can spread out an entire book, your notes, and reference materials simultaneously. But once something falls off the edge of the desk, the model forgets it.

Example: If a model has a 128K context window, you could paste an entire 300-page book into it and ask questions about any part. A model with only a 4K context window would only handle a few pages at a time.

Parameters

Definition: The internal variables (weights and biases) of a neural network that are adjusted during training. The number of parameters is a rough measure of a model's capacity and complexity.

Analogy: Parameters are like the connections in a brain. More connections mean more capacity to store knowledge and make nuanced decisions. A 7-billion-parameter model is like a brain with 7 billion adjustable knobs, each tuned during training to produce better outputs.

Example: GPT-3 has 175 billion parameters. Llama 3 comes in sizes from 8 billion to 70 billion. Generally, more parameters = more capable, but also more expensive to run. The relationship isn't perfectly linear - architecture and training quality matter enormously.

Transformer

Definition: The neural network architecture that powers modern LLMs. Introduced in the 2017 paper "Attention Is All You Need," transformers use a mechanism called "attention" to weigh the importance of different parts of the input simultaneously.

Analogy: Before transformers, models read text like a person reading a book - one word at a time, left to right, trying to remember everything. Transformers read the entire text at once and use "attention" to figure out which words matter most for each prediction. It's like having a highlighter that automatically marks the most relevant parts of every page.

Example: In the sentence "The bank was flooded, so I went to the other bank," a transformer's attention mechanism helps figure out that the first "bank" refers to a river bank and the second refers to a financial institution - by looking at all the surrounding words simultaneously, not just the ones that came before.

Prompt Engineering

Prompt

Definition: The input text you give to an LLM to get it to generate a response. A prompt can be a question, an instruction, a partial sentence, or any combination of these.

Analogy: A prompt is like a question you ask a knowledgeable friend. The better and more specific your question, the better their answer. "Tell me about dogs" gets a vague response. "What's the difference between a golden retriever and a labrador for a family with small kids?" gets something much more useful.

Example: "Write a poem" is a prompt. "Write a haiku about debugging production code at 2 AM" is a much better prompt - specific, contextual, and constrained.

System Prompt

Definition: A hidden instruction that sets the model's behavior, personality, or constraints for the entire conversation. It's like giving the model a role to play before the conversation starts.

Analogy: If a regular prompt is a question, the system prompt is the job description. It tells the model: "You are a helpful coding assistant who always explains things with analogies and never writes code without comments." Everything the model says after that is filtered through this lens.

Example: ChatGPT's system prompt includes instructions like "You are a helpful assistant." Developers can customize this: "You are a senior code reviewer. Be direct, point out bugs, and suggest improvements with code examples."

Zero-shot

Definition: Asking the model to do something without providing any examples. You're relying entirely on the model's pre-existing knowledge.

Analogy: It's like telling someone "write a cover letter" without showing them any cover letters. They'll do it based on what they've seen in the world, but the style might not match what you want.

Example: "Translate this to French: Hello, how are you?" - zero-shot translation. The model has never seen your specific style preferences but draws on its training.

One-shot

Definition: Giving the model exactly one example of what you want before asking it to perform the task.

Analogy: Like showing someone one sample cover letter and saying "write another one like this." They pick up on the format, tone, and structure from that single example.

Example: "Convert these to emoji summaries: 'I had a great day' → '😊🌟'. Now convert: 'The meeting was boring' → ?"

Few-shot

Definition: Giving the model several examples (typically 2-10) before asking it to perform the task. This dramatically improves output quality and consistency.

Analogy: Like giving a new employee a style guide with multiple examples before they start writing. After seeing the pattern a few times, they internalize it.

Example: "Classify the sentiment: 'I love this product' → Positive. 'This is terrible' → Negative. 'It works I guess' → Neutral. 'Absolutely amazing experience' → ?"

Temperature

Definition: A setting (usually 0.0 to 2.0) that controls how creative or deterministic the model's output is. Low temperature = predictable and focused. High temperature = varied and creative.

Analogy: Temperature is like the difference between playing classical music from sheet music (temperature 0.1 - precise, repeatable) and playing jazz (temperature 1.5 - improvisational, surprising). Same musician, different mode.

Example: For code generation, you'd use temperature 0.1-0.3 - you want correct, deterministic output. For creative writing or brainstorming, temperature 0.8-1.2 gives more variety and unexpected ideas.

Top-p (Nucleus Sampling)

Definition: An alternative to temperature for controlling output randomness. The model only considers the smallest set of most likely tokens whose combined probability exceeds p. Lower top-p = more focused output.

Analogy: Imagine the model is choosing its next word from a ranked list. Top-p says "only consider options that together make up 90% of the probability." It cuts off the weird, unlikely tail of options while keeping the reasonable choices.

Example: If top-p is 0.9, the model might consider "the," "a," "my" as next words but ignore "pineapple" or "quantum" - even if those have non-zero probability. It's a cleaner way to control randomness than temperature alone.

Fine-tuning & Training

Pre-training

Definition: The initial, massive training phase where a model learns general language patterns from a huge corpus of text. This is where the model learns grammar, facts, reasoning, and world knowledge.

Analogy: Pre-training is like getting a general education - going through school, reading widely, learning how the world works. You're not specialized yet, but you have a broad foundation.

Example: GPT-4 was pre-trained on trillions of tokens from the internet, books, and other sources. This cost millions of dollars in compute and took months. The result was a general-purpose model that could do many things reasonably well.

Fine-tuning

Definition: Taking a pre-trained model and training it further on a smaller, specialized dataset to improve its performance on specific tasks or domains.

Analogy: If pre-training is general education, fine-tuning is medical school. You already know how to read, write, and think - now you're specializing in one field.

Example: Take a general LLM and fine-tune it on legal documents to create a legal assistant. Or fine-tune it on your company's documentation so it answers questions about your specific products accurately.

RLHF (Reinforcement Learning from Human Feedback)

Definition: A training technique where humans rank different model outputs, and the model learns to prefer the responses that humans rated higher. This is how models learn to be helpful, honest, and harmless.

Analogy: Imagine a student writing essays and a teacher grading them: "This answer is better than that one - here's why." Over time, the student internalizes what makes a good answer. RLHF is that feedback loop, scaled up with thousands of human raters.

Example: Without RLHF, an LLM might answer "How do I make a bomb?" with detailed instructions. With RLHF, human raters have taught the model that helpful answers also need to be safe and ethical, so it declines instead.

LoRA (Low-Rank Adaptation)

Definition: A technique for fine-tuning large models efficiently by training only a small subset of parameters (low-rank matrices) instead of the entire model. It dramatically reduces compute and memory requirements.

Analogy: Instead of rebuilding an entire car engine to improve performance, LoRA is like adding a turbocharger - a small, targeted modification that makes a big difference without touching the rest of the system.

Example: Fine-tuning a 70-billion-parameter model normally requires dozens of GPUs. With LoRA, you can do it on a single GPU by training only 0.1% of the parameters. The model keeps its general knowledge while learning your specific task.

Embeddings

Definition: A numerical representation of text (or other data) as a vector - a list of numbers - that captures its meaning. Similar concepts have similar vectors, enabling mathematical operations on meaning.

Analogy: Embeddings are like GPS coordinates for concepts. "King" and "queen" are close to each other on the map. "Apple" (fruit) and "orange" are nearby. "Apple" (fruit) and "Apple" (company) are farther apart. You can even do math: king - man + woman ≈ queen.

Example: When you search for "how to fix a broken screen," embeddings help find articles about "repairing cracked displays" even though they don't share the exact same words - because the meanings are close in vector space.

RAG & Tools

RAG (Retrieval-Augmented Generation)

Definition: A technique where the model first searches an external knowledge base for relevant information, then uses that information to generate its response. It combines the model's language skills with up-to-date or domain-specific data.

Analogy: RAG is the difference between taking an exam closed-book (the model relies only on its training memory) and open-book (the model can look up information first). The model is smarter when it can reference materials.

Example: A customer support chatbot using RAG doesn't rely on its training data (which might be months old). Instead, it searches your current product documentation, finds the relevant section, and generates an answer based on that - ensuring accuracy and freshness.

Vector Database

Definition: A database optimized for storing and searching embeddings (vectors). It enables fast similarity searches across millions or billions of documents.

Analogy: A regular database is like a filing cabinet organized by name or date. A vector database is like a librarian who understands what each document is about and can instantly find the ones most similar to your query - even if they don't share the same words.

Example: Pinecone, Weaviate, and Milvus are popular vector databases. You'd store embeddings of all your company documents in one, then when a user asks a question, you convert the question to an embedding and find the most similar documents.

Semantic Search

Definition: Search that understands the meaning and intent behind a query, not just keyword matching. Powered by embeddings, it finds results that are conceptually related even without exact word overlap.

Analogy: Keyword search is like looking up a word in a dictionary's index - you only find exact matches. Semantic search is like asking a librarian who understands what you actually need and points you to the right section, even if you described it imperfectly.

Example: Searching for "affordable vacation spots" with keyword search might miss an article titled "Budget-friendly destinations for your next trip." Semantic search catches it because the meanings align, even though the words differ.

Agents

Definition: An AI system that can plan, use tools, and take multi-step actions autonomously. Unlike a chatbot that just responds to prompts, an agent can break down a goal into steps, execute them, and adapt based on results.

Analogy: A regular LLM is like a consultant who gives advice. An agent is like a consultant who also has access to your computer, can send emails, run scripts, check databases, and actually get things done.

Example: An AI agent tasked with "research competitors and write a comparison report" might: (1) search the web for competitor info, (2) scrape their websites, (3) analyze pricing data, (4) draft a report, and (5) save it to your Google Drive - all autonomously.

Function Calling (Tool Use)

Definition: The ability of an LLM to request that the application call a specific function (API, database query, calculator, etc.) with specific arguments. The model doesn't execute the function itself - it outputs a structured request, and the system runs it and feeds the result back.

Analogy: The model is like a manager who can't use the tools directly but knows exactly which tool is needed and how to specify the job. "Hey, run this database query for me and bring me the results so I can analyze them."

Example: A user asks "What's the weather in Tokyo?" The model outputs a function call: get_weather(city="Tokyo"). Your system calls the weather API, gets the result, and feeds it back to the model, which then generates a natural language response.

Evaluation & Safety

Hallucination

Definition: When an LLM generates information that sounds plausible but is factually incorrect or entirely made up. The model is confident in its answer, but the answer is wrong.

Analogy: Hallucination is like a student who didn't study but is really good at sounding confident during an oral exam. They'll give you a detailed, well-structured answer that's completely invented. It sounds right, but it isn't.

Example: Ask an LLM "What year did the first person walk on the surface of Proxima Centauri b?" and it might confidently answer "2041" - because it's generating plausible-sounding text, not because it knows this hasn't happened. Hallucination is the #1 challenge in production AI systems.

Benchmark

Definition: A standardized test used to measure and compare model performance on specific tasks. Benchmarks provide objective metrics for evaluating how good a model is at reasoning, coding, math, or general knowledge.

Analogy: Benchmarks are like standardized tests for models - SAT, GRE, or professional certifications. They don't capture everything about a model's ability, but they give a common yardstick for comparison.

Example: MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects from law to physics. HumanEval measures coding ability. GSM8K tests grade-school math. When a company says "our model scores 86.4% on MMLU," that's a benchmark result.

Guardrails

Definition: Rules, filters, or systems put in place to prevent an AI model from generating harmful, inappropriate, or off-topic content. Guardrails act as a safety layer between the model and the user.

Analogy: Guardrails are like the bumpers in bowling - they keep the ball (model output) from going into the gutter (harmful territory). They don't change how the ball is thrown, but they prevent the worst outcomes.

Example: A customer-facing AI might have guardrails that: block responses containing personal data, redirect political questions, prevent the model from making medical diagnoses, and flag potentially harmful requests for human review.

Alignment

Definition: The research area focused on ensuring AI systems behave in ways that match human values, intentions, and expectations. An aligned model does what you want it to do, not just what you literally asked it to do.

Analogy: Alignment is the difference between a genie who grants your exact wish (with disastrous unintended consequences) and a wise advisor who understands what you actually mean and helps you get there safely.

Example: If you ask an unaligned model "How do I make my competitor's website go down?" it might give you instructions for a DDoS attack. An aligned model recognizes the harmful intent behind the question and responds ethically - perhaps by suggesting legitimate competitive strategies instead.

Where to Go From Here

Understanding these terms is the first step. The field moves fast, but the fundamentals stay relatively stable. Here's how to keep building on this foundation:

Try it yourself. Open any LLM and experiment with temperature settings, few-shot prompts, and system prompts. The best way to internalize these concepts is to see them in action.

Follow the research. Papers on arXiv, blog posts from Anthropic and OpenAI, and the Hugging Face community are great places to see how these concepts evolve in practice.

Build something. A simple RAG pipeline with a vector database, a fine-tuned model with LoRA, or an agent with function calling - even small projects cement these ideas faster than any glossary.

The alphabet soup isn't so intimidating once you know what each letter stands for. Welcome to the conversation.

Written By

Anton K.

Engineer at letscodeit.dev. Writes about code, design and shipping software.