Making text easier to read is often praised as an inclusive practice—but when the content involves technical ideas, especially in fields like academic finance, AI-driven simplification can come at a serious cost. With generative AI now widely used to adapt complex language for broader audiences, including people with cognitive processing difficulties, the stakes are higher than ever. But what happens when making content “accessible” ends up compromising what the content actually means?
That’s the question I explored in a recent experiment, where I tested how GPT-4 simplified peer reviewer comments from academic finance papers. These comments—often steeped in econometric jargon—can be difficult for any researcher to interpret, but they pose particular challenges for those with dyslexia, working memory limitations, or other cognitive accessibility needs. I prompted the model twice per comment: once to produce a general plain-language version, and once with specific reference to cognitive accessibility.
What emerged was a pattern of inconsistency—and in some cases, distortion.
In one example, the model described “endogeneity” as “hidden effects”—a surface-level rephrasing that sounds intuitive but misses the statistical depth of the term. Endogeneity is a serious flaw in causal analysis, and mischaracterizing it risks misleading researchers into thinking it’s a minor, invisible nuisance rather than a technical red flag requiring correction.
Sometimes, in trying to be more accessible, the model actually misrepresented concepts altogether. The economic term “bounded rationality,” for instance, was simplified as “limited thinking ability,” when it is actually defined as “how people make decisions when they are limited by incomplete information, cognitive constraints, and time pressures.”
In another instance, the phrase “causal insight into the timing and magnitude of market responses” was reduced to “how fast and strongly markets respond,” erasing the analytical focus on causality altogether.
One of the most concerning findings was how inconsistently GPT-4 performed. On one run, the model preserved a term like “sensitivity analysis” but didn’t define it. On another, it swapped it out for a fuzzy phrase like “double-checking your methods.” If you ran the same prompt twice on different days, you might walk away with completely different interpretations.
In my paper, I’ve included links to the full GPT-4 conversations to ensure transparency and allow readers to assess the outputs firsthand.
My findings show that accessibility-focused prompts alone won’t guarantee accurate simplification. Domain knowledge still matters, both in writing the prompts and in reviewing the outputs. As such, human oversight is essential.
What can editors do?
Whether you’re simplifying content manually or starting with an AI-generated draft, the editor’s responsibility remains the same: to ensure clarity without compromising meaning. If you’re working from scratch, approach accessibility with both empathy and precision—use plain language where possible, but preserve technical accuracy and contextual relevance. If you choose to use AI tools like GPT-4 to support simplification, treat the output as a starting point, not a finished product. Carefully review whether key concepts have been preserved, check for unintended distortions, and bring in domain expertise where needed. Simplification should never come at the cost of scientific integrity. Editors are uniquely positioned to bridge the gap between readability and rigor—and that’s where our value truly lies.
