Not Quite There: Exploring the Flaws of Using AI in Writing
Updated Februar, 2026 by Dr. Katharina Grimm
Dr. Katharina Grimm is a UX Writer, educator, and founder of The UX Writing School with 8+ years of industry experience and PhD in Technology Management and Communications.
While working on my Voice & Tone Masterclass, I came across various bold claims about what AI, and especially LLMs like ChatGPT, can do for you regarding voice and tone.
Upload a couple of sample texts (newsletters, social media captions, press releases) and let it derive voice & tone guidelines from them. Teach it your voice and tone and let it create style-consistent copy for you. Brief the tool and let it use non-writers to do writing tasks so that writers don't become the bottlenecks of their teams.
All those things should be possible with tools like ChatGPT. You just need to brief and train it correctly, right? Actually, that's right. Broken down, it really is that simple. But it's not easy.
The Root of AI Scepticism
The reason why so many professional writers will tell you that it still takes a trained expert to write high-quality copy with AI-based tools like ChatGPT is simple: these tools are flawed in ways non-writers often don't even recognize.
For many users, the process looks like entering a well-thought-out prompt, maybe uploading a voice and tone style guide, and watching ChatGPT produce a text that sounds pretty good. As a non-writer, it might not be obvious that the text doesn't actually align with the overall brand voice, sounds AI-ish, or simply feels… off.
To create awareness, this blog post summarizes the most significant flaws of LLMs that will naturally compromise the quality of your copy. For that, we will look at the specific case of ChatGPT, as it is currently still the most widespread and popular model.
The Flaws of LLMs in 2026
Many of the inherent flaws of LLMs like ChatGPT stem from the fact that these tools do not operate based on critical thinking, but rather on training with language data. Therefore, the quality of ChatGPT's output is not only determined by the prompt you give it, but also by the training data and the way it is processed — both of which are complete black boxes. Let's look at the most consequential flaws that follow from this reality.
1. Adding a WEIRD Bias
Psychology research has long criticized "WEIRD" sampling studies that overrepresent participants from Western, Educated, Industrialized, Rich, Democratic countries. The issue is clear: if you only study a small, biased slice of humanity, your results can't be generalized to everyone.
The same problem applies to LLMs. Research shows that models like ChatGPT are trained disproportionately on internet text — written mostly in English and heavily shaped by WEIRD cultural norms. A 2024 study by Cornell University researchers found that responses from five GPT models consistently aligned with the values of English-speaking and Protestant European countries, even when prompted to represent other cultures. A separate Nature analysis confirmed that AI models continue to be geared toward the needs of English-speaking people in high-income countries.
For voice and tone in UX Writing, this bias bears a real risk. It makes text output lean toward a "default" communication style that reflects the statistical center of the training data — not a brand's unique identity, and not the preferences of your specific target audience, even if that audience is located in a specific global region.
2. Bringing in Its Own Tone
Even if you upload a carefully designed voice and tone guide, ChatGPT often brings in its own tone. You can literally feel the output being pulled toward that central, core style its training produced. This happens partly because many voice and tone guides are too vague, incomplete, or full of gaps — and ChatGPT fills those gaps with its own defaults. This includes characteristic syntax patterns like the typical binary constructs ("This is not a. This is b." or "This is not only a. This is b.") that feel formulaic precisely because they are.
And there's a compounding effect: the more AI-generated content people publish without editing, the more of that tone ends up in future training data, which means future outputs sound even more AI-ish. The result is a slow drift toward a monoculture of sameness in content.
“The more AI-generated content gets published without expert editing, the more of that AI-ish tone feeds back into future training data. We’re looking at a compounding monoculture of sameness — and voice and tone are the first casualties.”
3. Being Unreliable
You uploaded your voice and tone guide and ChatGPT produced some decent copy last Tuesday? Sometimes it works remarkably well. Other times, it's completely off — and there's no reliable way to predict which one you'll get.
For non-writers, this means potentially relying on something fundamentally inconsistent without realizing it, because the nuances that make copy sound "off" are often very subtle. Every single output therefore needs expert review. Otherwise, you risk publishing text that undermines your brand voice rather than reinforcing it.
For companies hoping that "anyone can create professional copy with AI," this is probably the hardest truth to accept: without someone trained specifically in voice and tone reviewing and adjusting the output, you're not working with a reliable system. You're buying lottery tickets.
4. Having Weak Memory
Closely related to unreliability is another issue: ChatGPT doesn't reliably retain training or instructions across sessions. Even in Projects mode, corrections and feedback are frequently forgotten after a few hours, and rules get applied inconsistently. Custom GPTs and standard chat sessions don't maintain any memory across sessions by default, so you have to manually reinsert all instructions every time you want consistent results.
Training ChatGPT is therefore nothing like training a team member. It's more like writing reminders on sticky notes and hoping the right one gets glanced at at the right time. Consistent quality is always at risk.
5. Knowing Little About Accessibility and Inclusive Language
There are also more structurally serious flaws to consider. AI is not reliably trained on accessibility standards or inclusive language guidelines. Those topics may appear somewhere in the training data, but the majority of what ChatGPT learned from is simply what exists on the internet — and a significant portion of that content is flawed when it comes to accessibility and inclusivity.
This can result in copy that doesn't work for screen readers, language that unintentionally excludes or alienates part of the audience, a tone that fails to meet trauma-informed or culturally sensitive standards, or humor that's inappropriate for the context.
The risk here is not just editorial. It is also legal, ethical, and reputational.
6. The Risk of Superficial Expertise
When you combine all of the above, you arrive at a deeper danger: the illusion of competence. Because ChatGPT produces text, many people assume the task is complete. The gap between some text and good text, however, is significant.
Each and every day, there are LinkedIn posts that are clearly generated by an LLM. They feel copy-pasted and generic, because they share the same voice and tone patterns, including:
Anaphora: "It's edgy. It's groundbreaking. It's the new standard."
Binary constructs: "It's not only edgy. It's groundbreaking."
Strongly shortened syntax: "The idea? Groundbreaking."
Lack of punctuation diversity: Heavy reliance on full stops and em-dashes, minimal variation.
Overuse of gerunds: "Groundbreaking ideas have real impact on your business, increasing your revenue one step at a time."
Overly figurative language: "That's why mediocre ideas won't cut it."
Unnatural idioms: "But here's the kicker:"
The default tone ChatGPT uses for marketing copy is designed to feel bold, confident, and attention-grabbing. But how attention-grabbing can a text actually be when it reads like 60% of everything else out there? An author who isn't familiar with these patterns might not even notice. And while the same is true for some readers, awareness is growing — particularly in marketing and communications contexts where people read a lot.
“LLMs like ChatGPT produce copy with recognizable stylistic fingerprints: anaphora, binary constructs, gerund-heavy sentences, and a default boldness that reduces brand distinctiveness at scale”
Why This Still Matters in 2026
The research landscape has continued to develop. A 2025 PNAS study found that value-aligned LLMs still exhibit widespread stereotype biases across race, gender, religion, and health categories — even when those models pass explicit bias benchmarks. A separate study published in PNAS in 2025 found that LLMs actively favor content written by other LLMs over content written by humans, raising new concerns about what this means for human writers competing in AI-influenced environments.
These findings matter for brand voice and tone. If an LLM subtly favors certain cultural registers, certain sentence structures, and certain types of confidence — and if human audiences are increasingly primed by AI-generated content to expect those patterns — the case for expert human oversight in writing becomes stronger, not weaker.
What AI Can and Cannot Do for Writing
Does all of this mean AI has no place in writing work? No. It can support expert writers, enrich their workflows, and open up new ways of working with language. What it doesn't mean is that the process automatically becomes faster — because effective use of AI tools for writing requires intense briefing, training, prompting, and reviewing. But used well, it can genuinely improve both the writing experience and the final output.
Having a professional writer in the loop, however, remains a non-negotiable. Even after reading this post, a non-writer will not be able to identify all the flaws an AI-generated text carries, nor correct them — especially for more complex aspects like the right type and amount of humor, cultural sensitivity, inclusivity, or subtle brand alignment.
The calculation is simple: having non-writers produce copy with AI tools might save some time and budget in the short term. But the risk to brand voice, audience trust, and long-term reputation is real.
Key Takeaways
LLMs like ChatGPT are trained on biased, WEIRD-skewed data that shapes their default voice and tone in ways most users don't recognize.
AI tools are inconsistent: a prompt that works today may produce off-brand results tomorrow, with no warning.
ChatGPT doesn't retain training across sessions, making reliable brand voice consistency structurally difficult.
AI is not reliably trained on accessibility or inclusive language standards.
AI-generated content carries recognizable stylistic patterns that, at scale, reduce brand distinctiveness.
Research from 2025 confirms that bias in LLMs is structural, not incidental — and is unlikely to disappear through prompt engineering alone.
Frequently Asked Questions
Can AI tools like ChatGPT accurately replicate a brand's voice and tone?
Not reliably. While ChatGPT can produce stylistically consistent copy in isolated cases, it lacks the memory and contextual judgment needed to maintain a brand's voice and tone consistently across time, channels, and audiences. Expert human review is always necessary.
Why do AI-generated texts often sound similar to each other?
LLMs are trained on large datasets that center certain communication patterns — particularly in marketing copy. These patterns (anaphora, binary constructs, heavy em-dash usage) get amplified because they appear frequently in training data. The result is a recognizable default tone that reduces brand distinctiveness.
Is AI bias in writing actually a problem for UX Writing?
Yes. Studies show that LLMs are trained on WEIRD-biased data, meaning their default outputs favor communication styles rooted in Western, English-speaking cultural norms. For UX Writers working with global or diverse audiences, this creates a real risk of misalignment between the AI's output and the target audience's expectations.
Can you "train away" the AI's default tone with a good voice and tone guide?
Partially. A well-structured voice and tone guide can improve output quality, but it cannot override the model's training-level tendencies. Gaps in the guide are filled by the model's defaults, and those defaults tend to resurface — especially in longer or more complex writing tasks.
Do professional writers still need to be involved when using AI for copy?
Yes. The subtleties of brand voice, cultural sensitivity, accessibility, and inclusive language require trained human judgment. AI can support the writing process, but it cannot replace the expertise needed to evaluate and refine copy at a professional level.
Are the flaws of LLMs in writing getting better over time?
Some aspects are improving, but structural issues like training bias and inconsistent memory are not fully resolved. Research from 2025 confirms that bias in LLMs persists even in value-aligned models. Users should approach AI writing tools with realistic expectations and maintain expert oversight.
Learn more at writewithdrkat.com | The UX Writing School | YouTube