When your AI turns against you

A guide to prompt injection attacks

Oct 29, 2025

The biggest risk in using AI isn’t bias or bad data.

It’s unknowingly giving a hacker control of your AI - and your information.

Do you copy other people’s prompts and paste them into your AI tool? Do you upload files you haven’t created? Have you ever used one of the new AI browsers? Or are you using a Custom GPT that has been shared with you? If you answered ‘yes’ to any of these, you could be exposing private data to attackers who know exactly how to exploit them.

This post explains how it happens, and what you can do about it.

If you're feeling uncomfortable, shocked or a little concerned, please share this post with someone who needs to see it. It's very easy to fall victim to a prompt injection attack if you don't know about this.

Prompt injection attacks explained

A prompt injection attack is when someone secretly adds hidden instructions to the content your AI reads. The model can’t tell which instructions came from you and which came from them. So it follows both.

These hidden commands can tell the AI to reveal data, invent false information, or even take actions you never approved.

This isn’t technical wizardry. It’s social engineering for machines. Machines follow orders literally and these attacks exploit that.

Why this matters (if it’s not already obvious)

AI systems don’t judge intent. They execute everything that looks like an instruction.

If an attacker hides malicious text inside a “useful” prompt, a shared document, or a web page, your AI can start serving their goals instead of yours (and you wouldn’t even know it).

That’s how private data leaks happen without anyone noticing.

If you've learned something here, please give this post a ❤️

Watch: beginner’s guide to prompt injection attacks

…a more detailed explanation of how these attacks happen so you can be prepared.

TL;DR: Nine quick ways to protect yourself from a prompt injection attack.

Keep AI disconnected. Don’t link it to your email, calendar, or web browser.
Never share sensitive data. Remove client names, figures, or personal details before uploading anything into your AI model.
Don’t copy prompts from strangers. If you must, paste any shared prompt into a plain-text editor first to check for hidden text.
Use a sandbox account. Keep one AI account for experiments and another for real work.
Check your AI’s responses. Look for odd phrases, inserted names, or unfamiliar links before reusing content.
Be deeply suspicious. You already know that AI hallucinates. So extend your default skepticism.
Clear chat history. Turn off memory or delete past chats regularly.
Update your tools. Most AI security patches arrive through silent updates.
Stay informed. New attack methods appear weekly. Subscribing here keeps you ahead.

Prompt injection isn’t about hacking computers. It’s about hacking trust and persuading your AI to believe someone else over you.

Once you understand that, you can stop it.

If you’re concerned about AI safety, sign up for my session: AI Safety: Protect Yourself in a World of Invisible Risk on 2 December:

The Humans in the Loop

Discussion about this post