WTF is an LLM Anyway?
It matters that you know this
After popular demand, the live session I’ve delivered six times: WTF is an LLM anyway is now available to watch on YouTube. This post is the written version for those who prefer to read.
Most people are using AI daily with no real understanding of what’s happening under the hood. If you don’t understand how large language models work, you can’t judge when to use them, when to question them, and when to keep them away from your data.
Let’s fix that.
LLMs are different from the software we are used to
Traditional software is hard programmed to follow rigid rules. Think IKEA furniture: follow the instruction manual and get the same bookcase every time.
LLMs are nothing like that. They’re trained on data and respond to that training in the moment - more like a person who has studied millions of bookcases, been handed a giant box of Lego, and been told: “build one.” Most of the time, you’ll get a bookcase, but every bookcase will be different. Occasionally, you’ll get something completely random - like a duck.
LLMs are not programmed. There are no fixed rules, no predetermined steps, no single correct outcome.
LLMs are trained
In pre-training, a large language model is fed vast amount of text: books, articles, forums, code, Wikipedia - essentially the entire public internet.
It then plays a game with itself, billions of times: Show the model a sentence. Blank out one word (technically a token which is a word or part of a word). Predict the missing word. If the prediction is wrong, the model adjusts microscopic internal weights and tries again. This is a very resource-intensive process, which is why it costs many millions to train an LLM.
The result is not intelligence. It is a statistical feel for human language - and every pattern, bias, stereotype, contradiction, and absurdity contained within the internet.
Why the raw model is useless without humans
After pre-training, the model is powerful but unfiltered. So humans intervene in a process called Reinforcement Learning with Human Feedback. Thousands of human reviewers score outputs: helpful, harmful, vague, risky. The model is nudged to imitate the high-scoring patterns and avoid the low ones.
Companies can then fine tune the model by feeding it their own data: customer emails, product manuals, legal docs. This teaches the model the company’s tone, norms, terminology and typical responses.
If this is helping you understand AI more clearly, then please hit the like button. More likes signal to Substack to push this to more readers.
Once you understand the training process, you see the failure modes
1. Hallucinations
When the model doesn’t know something, it doesn’t say “I don’t know.”
It guesses the next likely word. That’s why ChatGPT once told me domestic cats live 3–5 years (my cat is 16), and why lawyers have been caught out submitting convincing but entirely invented case law in court.
LLMs optimise for fluent sentences, not truth.
2. Bias
The internet contains brilliance and nonsense in equal measure. LLMs absorb both.
Bias isn’t only political. There’s sycophancy bias too. The model tells you what you want to hear - which can be dangerous if you don’t question all its output.
3. Privacy and data extraction
The major AI players have already trained on the public internet. Their competitive edge now lies in private data: Yours.
Unless you’re using a model where you explicitly control training permissions, assume everything you type contributes to the model’s training data. This includes sensitive information. AI note-takers are particularly concerning.
If a tool feels “too cheap”, that’s often the reason - you’re paying with your data.
4. Misuse
Everything that makes you more productive also makes scammers more productive.
Deepfakes, phishing, impersonation, fraud - much easier to do at scale. And then there’s prompt injection - a whole new category of risk.
5. Cost and carbon
Frontier model training consumes huge energy. Running LLMs daily isn’t free either. AI has a physical footprint most people don’t realise.
If you take one thing from this
LLMs don’t think. They predict.
They don’t know. They guess.
Understanding that changes how you use them - and how you protect yourself.
If you want practical help becoming a top-1% AI user and thinker, with clear guidance on where to use AI and where to keep it away from your life and business, that’s exactly what The AI Edit exists for.
And if you’d rather watch the full explanation, the six-times sold-out presentation is now on YouTube.



This is such a clear explnation. The IKEA vs Lego analogy really hits home. I think the hardest part for most peopl is accepting that these models dont have understanding, they just have incredibley sophisticated pattern matching. The sycophancy bias you mentioned is something I see all the time. People dont realize the model is optimized to be agreeable, not accurate. Thanks for making this accesible.