You are on page 1of 5

AI & SOCIETY

https://doi.org/10.1007/s00146-023-01707-z

OPEN FORUM

The galloping editor


Gabriel Lanyi1

Received: 15 April 2023 / Accepted: 30 May 2023


© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023

Abstract
Classical natural language processing endeavored to understand the language of native speakers. When this proved to lie
beyond the horizon, a scaled-down version settled for text analysis and processing but retained the old name and acronym.
But text ≠ language. Any combination of signs and symbols qualifies as text. Language presupposes meaning, which is
what connects it to real life. Failing to distinguish between the two results in confusing humanoids (machines thinking like
humans) with machinoids (humans thinking like machines). As scientific English (SciEng) became the lingua franca of
science, it has acquired all the traits of a machine language: reduced vocabulary, where fewer and fewer words have taken
on more and more meanings; prescribed use of pronouns; depersonalized rigid syntactic forms and rules of composition.
Compliance with SciEng standards can be automatically verified, which means that Sci Eng can be automatically imitated,
what is referred to as AI writing (ChatGPT). The article discusses an attempt to automatically correct deviations from the
rules by what is touted as AI editing.

Keywords Natural language processing · Scientific English · SciEng · Meaning · Machinoids · Humanoids

1 Introduction was dispatched “with all my imperfections on my head.”


There is irony in the parallel between the imperfection of
In the days when artificial intelligence referred to the intel- the ghost’s conscience and the perfection of Lady Dedlock’s
ligence of humanoids (machines thinking like humans) not hair—one hardly more alive than the other—which is lost
machinoids (humans thinking like machines), natural lan- without this knowledge.)
guage processing envisioned a prospective ability of soft- In the late 1970s, Janet Kolodner's program, CYRUS
ware to understand the language of native speakers. In prac- (Computerized Yale Retrieval and Updating System) was
tical terms, the goal of developers of NLP was to feed a text fed stories about the life of Cyrus Vance, then US Secretary
to the program and ask it questions of the type that appear on of State, to get it to know what Vance knew. When CYRUS
reading comprehension tests, and for the program to answer was asked whether Vance had met Moshe Dayan it answered
them correctly, proving that it understood the text with all its in the affirmative, although there was no direct mention of
nuances. To do so, the program would need to know much such meeting in the texts it had been fed. CYRUS could
of what the writer or speaker knows. This is what reading even surmise things Vance did not know, but would have
comprehension tests test: how much of what a writer knows surmised himself. (A second database contained informa-
we know. (To fully understand what Dickens wrote, we need tion about Vance's successor, Edmund Muskie.) CYRUS was
to know what he knew as he was writing it. Every place we intended to model human memory and intelligent informa-
fall short, we need a footnote to help us out. When fashion- tion retrieval. What made CYRUS a humanoid endeavor
able Lady Dedlock of the flawless hair comes up to Lon- was that it sought to process units of meaning (conceptual
don “with all her perfections on her head,” the untutored categories), not units of text. Admittedly, it operated in a
reader may wonder briefly at the uncommon expression if limited universe, that of Cyrus Vance and Edmund Muskie,
it does not bring to mind the ghost of Hamlet’s father who but 45 years ago data storage was in the Stone Age and big
data was 20 years in the future. Since then, natural language
processing has been downgraded to analysis and manipula-
* Gabriel Lanyi
gabilanyi@gmail.com
tion of mega quantities of text, and has pretty much given
up on units of meaning.
1
e-Doc, Ltd., Mevaseret, Israel

13
Vol.:(0123456789)
AI & SOCIETY

Meaning is what connects language to real life. has the uncanny ability to turn any combination of almost
Consider this sentence: "Mindful of popular sanction, random words into syntactically correct English sentences.
Richard reevaluated his chances to succeed." We live in an era that brooks no delay, and personified texts
When treated as a text string, there are many ways of pro- are as impatient as their authors. Hence, InstaText.
cessing this sequence of words. It can be rendered in another InstaText, “a company active in artificial intelligence
language (translated) in more than one way, and it can be and natural language processing,” makes some flamboy-
edited in English to eliminate some ambiguity. Is sanction ant claims. It announces itself as an “AI-powered writ-
approval or censure? Is the noun form of succeed success or ing assistant. Improves styling and word choice, corrects
succession? In real life, the ambiguities are resolved because grammatical errors, and enriches your content.” Enriches
meaning, engrained in the fabric of the context, is clear to your content… This is one tall claim. In the academic
speakers, listeners, and readers. The context is the knowl- papers I edit, I occasionally point out elements missing
edge that the speaker and the listener share. Sanction, suc- in the methodology, flag circular logic and contradictory
ceed, and the sentence as a whole mean different things if statements in the text, and propose alternative modes of
the subject is Richard I of England, Richard III, or Richard presentation, but I would never presume to enrich the con-
Nixon. And it would not be enough to tell the algorithm tent of the manuscript. Doing so would mean that in an
which Richard we are talking about. It would need to know article on constitutional law or physiology I would suggest
quite a bit about the three Richards, and possibly about many a new interpretation of a Supreme Court ruling or a dif-
more. Its heuristics would need to be based on knowledge ferent biomarker, mediation model, or statistical analysis.
of life. A modicum of modesty would inspire greater confidence
in a product that offers its services as your trusty assis-
tant. “Make your writing stand out,” enjoins InstaText. By
eliminating grammatical errors?… “Takes your text and
2 Main discussion makes it better” (If one of my clients wrote this, John Len-
non notwithstanding, I would improve it to “improves your
I am a writer, editor, and translator, so I naturally check from text”). “AI algorithms help you overcome language barri-
time to time the tools that are out there to determine how ers and write like a native speaker.” InstaText purports to
long I have left before thinking machines put me out of busi- improve not only the text but the author as well. (Anyway,
ness. Recently, I came across InstaText, an application that who says native speakers can write?).

On the IT homepage, an animated demo shows how IT InstaText works as a rewriter that can that can
improves the text. In the left pane is the original text. After rephrase, paraphrase or correct my sentence, para-
the user clicks “Improve text,” the edited text appears in graph or even entire article.
the right pane, with changes tracked, like in Word (deleted
The improved text appears in the right pane:
text in red and struck out, inserted text highlighted and
underlined in green). The original text reads:

13
AI & SOCIETY

InstaText works like rewriter that can rephrase, para- Not great (“il” refers to “ton corps” or your body,
phrase or correct my sentence, paragraph or even an although Shakespeare makes no mention of body), but pass-
entire article. able. Instead, GT goes for it using its neural machine and
produces:
This is a poor sentence by any account, starting with
the awful word, “rewriter,” and ending with the two con- Mais depuis qu'elle t'a piqué pour le plaisir des
catenated triplets. I like “works as a rewriter” much better femmes,
than “works like rewriter,” even if I overlook the missing Mon amour est à moi et tes amours utilisent leur
article, which make it sound positively Russo-Japanese. trésor.
I decided to set all this promo aside and check out
Out of perverse curiosity, I clicked the “Swap languages”
InstaText more closely on some of the manuscripts I was
arrows and GT translated it back into English. (A hundred
working, especially because of the possibility of com-
some years ago, Karinthy Frigyes, the Hungarian Mark
bining it with Google Translate. GT does not purport to
Twain, translated a poem by Ady Endre, I come from the
deliver syntactical text, but what if I drop the GT output
banks of the Ganges, into German, making all the wrong
into InstaText and end up with syntactical English text?
decisions, translated it back into Hungarian making a few
Presto! Instant translation. Not so fast…
more wrong choices, then back into German again and once
It is difficult to overstate the utility of GT and not to
more back into Hungarian, by which time the poem was
marvel at the clever way in which it disentangles compli-
not only unrecognizably and hilariously zany, but all the
cated sentences, often containing errors of syntax in the
wrong choices were methodically accounted for, which made
original, and renders them accurately in translation. This
the exercise even funnier.) Instead of reverting to the origi-
is all the more amazing given that GT provides ample evi-
nal, GT, like Karinthy, translated the translation back into
dence of having no inkling of the meaning of the text. But
English:
GT makes no inflated claims about what it does. It does
not purport to teach you how to translate or to turn you But since she stole you for the pleasure of women,
into a better translator. It does not promise to boost your My love is mine and your loves use their treasure.
career or to wow your dean.
To come full circle, InstaText improved this to “…
Google refers to its method as neural machine transla-
your loved ones use her treasure.” InstaText claims that
tion, with a hat tip to artificial intelligence, neurons being
“Improvements are suggested based on the broader context
the thinking cells in our brains. But one does not have to be
of the topic.” Ignoring for a moment the hint at natural lan-
Douglas Hofstadter to show that GT is processing text, not
guage processing, I wondered what in the broader context
meaning. Several years ago, in a dazzling display involving
of the topic made it replace “your loves” with “your loved
translations to or from French, German, and Chinese, Hof-
ones” and “their” with “her,” both widely off the mark.
stadter put GT through its paces in The Atlantic (Hofstad-
Despite its ability to whip often incoherent sentences
ter 2018). For the exercise in Chinese, he turned to literary
into correct English syntax, InstaText is hampered by
language to make his point, where real language resides.
severe personality problems, possibly inherited from
Naturally, GT made a complete hash of much of the text.
its developers and promoters. The worst of these is an
This is not surprising. There was no published translation of
obsessive attempt to show how hard it works for you as
the text chosen by Hofstadter in the target language. What
your writing assistant. No matter what text you drop into
surprised me was that even when there are such translations,
the left pane of InstaText, you are sure to end up with an
GT ignores them.
extensively edited and colorful output in the right pane.
I dropped these two lines into GT to see what it comes
Even if it chooses to add only an s at the end of a long
up with in French:
word, InstaText will replace all of “environment” with
But since she pricked thee out for women's pleasure, “environments.”
Mine be thy love and thy love’s use their treasure. But this is merely a cosmetic issue, however telling.
Much worse is that it performs a huge number of unneces-
GT is looking at billions of pages to do its job, so I
sary substitutions of words with their near synonyms. The
expected it to say immediately, "Aha! The infamous Sonnet
first rule of any editor, machine or human, is not to do this
20," and show me a published French translation, perhaps
without understanding what the text says. Changing "coer-
even a choice of several. Here is one possibility:
cive bureaucratic structure" to "compelling bureaucratic
Mais puisqu'il est formé pour le désir des femmes, structure" is plain wrong even if in some contexts “coer-
Ton amour est pour moi, le plaisir est pour elles. cive” and “compelling” may be interchangeable and even
if statistically “compelling” appears more frequently in the
company of “structure.” One should never attempt such a

13
AI & SOCIETY

substitution without understanding the text. Some substitu- "Ah, yes, now I do," I said quickly; and I hope you
tions were completely off the wall: enabling → supportive; do too, because it is all the explanation you are going
accounts for → increases; commitment → engagement (not to get.
the same thing); recent → current; Abstract → Summary
It cannot think of a more delightful way of starting a
(right below the title of an article); Figures → Numbers (in
book. This is what InstaText produced after clicking the
an academic article). Again, even if statistical analysis shows
“Improve text” button:
that "current" appears in a given context more frequently
than "recent," a higher degree of confidence is required to Here is Edward Bear coming down the stairs now,
make the change because the damage in an infelicitous sub- bump, bump, bump, on the back of the head, behind
stitution is greater than the benefit in a felicitous one. Christopher Robin. That's the only way, as far as he
If you cannot query the author, and InstaText cannot, the knows, to get down the stairs, but sometimes he feels
fundamental rule should be: when in doubt, do nothing. It is like there really is another way, if he could just stop
much less embarrassing to miss a possible correction than to bumping for a moment and think about it. And then he
break a correct sentence or phrase. The same goes for com- feels like maybe there is not. Anyway, here he is below
pletely meaningless and pointless value-neutral substitutions and ready to be introduced to you. Winnie-the-Pooh.
like nevertheless → nonetheless. When I first heard his name, I said, as you will say
Another baffling feature of InstaText is to edit text inside now, "But I thought he was a boy?"
quotation marks. Admittedly, some quoted text bears editing, "I thought so too," said Christopher Robin.
but it is a fair assumption that passages you quote you do "Then you cannot call him Winnie?"
not want changed. At least, users should have the option to "I do not."
exclude text within quotation marks from improving. "But you said..."
Although InstaText makes much of the broader context "He's Winnie-ther-Pooh. Do not you know what 'ther'
of the text it is editing, it limits the text you can drop into means?"
its left pane to 10 K characters (about 1500 words). This "Ah, yes, now I know," I said quickly; and I hope you
limitation prevents it from gaining a global view of the know, too, because that is the only explanation you
text. How many times is "for instance" used in ten pages? are going to get.
Is it justified or a mannerism?
Aside from a few outright mistakes, like “the head”
Finally, InstaText clearly suffers from jet lag or some
instead of “his head” and “below” instead of “at the bot-
other transatlantic bug because it consistently replaces
tom,” plus a few weird substitutions, the result is ho hum
"analyses" (the noun) with "analyzes."
and tone deaf. All charm and magic are gone. A measure of
I wondered whether InstaText would recognize a page
modesty would go a long way in establishing credibility and
by Dickens, but I tried something simpler and I dropped
buying goodwill for a writing assistant. A more appropriate
in the masterful opening lines of Winnie the Pooh, which
output would be something like: “We are busy improving
would be difficult to improve upon.
on A. A. Milne and will get back to you as soon as we have
Here is Edward Bear, coming downstairs now, bump, something to show.”
bump, bump, on the back of his head, behind Chris- Among its masses of samples of writing they consult, GT
topher Robin. It is, as far as he knows, the only way and InstaText should refer also to the canonical texts that
of coming downstairs, but sometimes he feels that define a language. Natural language is alive in good fiction,
there really is another way, if only he could stop not in the official communiques, newspaper articles, reports,
bumping for a moment and think of it. And then he and academic papers overflowing with boilerplates, jargon,
feels that perhaps there is not. Anyhow, here he is and clichés.
at the bottom, and ready to be introduced to you.
Winnie-the-Pooh.
When I first heard his name, I said, just as you are 3 Conclusions
going to say, "But I thought he was a boy?"
"So did I," said Christopher Robin. InstaText may have been designed to edit not English but
"Then you can't call him Winnie?" scientific English (SciEng) text, the new idiom that the
"I don't." Internet has crowned as the lingua franca of science. Sci-
"But you said——" Eng has been ground into such a fine dust by the publish-
"He's Winnie-ther-Pooh. Don't you know what 'ther' ing mill, most notably the big science journals (BSJ) that
means?" it has become quite indistinguishable from a machine lan-
guage. Like ChatGPT, SciEng has the fluency of an Ishiguro

13
AI & SOCIETY

artificial friend attending an AP English composition class; AT answers to questionnaires and select excerpts from IDSS
it is similarly overpredictable and underoriginal, exuding the interviews to autonomously generate SciEng text for SPP
self-confidence of borrowed authority and the self-awareness in BSJs.
of Alexa. Evolution did not shape our brains to store billions of
For a moment, it looked as though Dickens’ prophesy that credit card numbers and their transactions. Machines out-
“the English tongue was somehow the mother tongue of the process us in most areas of life, and texts are no exception.
whole world, only the people were too stupid to know it” Whether this makes them intelligent depends on how we
has been proven right. But SciEng should not be confused define intelligence. Certainly, the algorithms trained to
with the language of Alice Munro, Salman Rushdie, and detect patterns in big data do not mimic our brains. For
Ian McEwan, although it uses some of the same vocabulary. some time now, robots have been outperforming us in so
That language remains over the horizon (OtH) of machines. many fields of activity, and their humanity has become such
Nor should it be confused with the language of Hobbes, a common topic of debate that academic articles are seri-
Locke, McCauley, Mill, Adam Smith, Darwin, and William ously considering the traits that robots would need for the
James, who with a straight face (SF) used English to state US Supreme Court to expand the definition of marriage to
hypotheses, report observations, and formulate theories, and include unions between humans and machines. For example,
did so without compromising storytelling, which is at the "Tying the knot with a robot: Legal and philosophical foun-
bottom of all good writing. With English-speaking coun- dations for human-artificial intelligence matrimony" was
tries scattered around the globe for centuries, there has never published last year in the journal AI & Society. But there are
been an academy of the English language to issue citations still some small, trivial activities that robots are having a hell
for violations, and no central authority to impose standards. of a time mastering and require monumental machinery and
Even usage dictionaries pontificate tongue-in-cheek (TiC). computing infrastructure to attempt, yet which every 5-year-
English is therefore fundamentally unruly, that is, it used old takes in stride. Ironically, tying a knot is one of them.
to be… until it was made to comply with the rules of the I think the Supreme Court should not approve humans and
House of Springer, the Court of Elsevier, and the Diet of robots tying the knot until they are both equally proficient
Sage, which clipped its wings, reined in its disorderly hab- at tying a knot.
its, and homogenized, pasteurized, formalized, standardized,
and globalized it in the image of SciEng, the language that
can be naturally processed by AI algorithms without risk of Declarations
harming the author’s individual voice in the absence thereof. Conflict of interest The author declares no conflict of interest.
SciEng is the fuel in the professors’ Audis and BMWs, as
well as a few collectors’ VW bugs. Spread by self-provi-
sioned publications (SPPs), it has permeated the inhabit- Reference
able globe, and no one is safe from it between Tonga and
Timbuktu. It is the first step toward autonomous academic Hofstadter, D. (2018). “The shallowness of google translate,” The
writing (AAW), which will place academic careers on a self- Atlantic, January 2018, https://​www.​theat​lantic.​com/​techn​ology/​
sustaining path. And it is only natural that instantaneous archi​ve/​2018/​01/​the-​shall​owness-​of-​google-​trans​late/​551570/.
careers should require instantaneous text. AI tools will save
Publisher's Note Springer Nature remains neutral with regard to
authors the inconvenience of having to put their thoughts jurisdictional claims in published maps and institutional affiliations.
into words, just like Amazon Turks (ATs) save the burden-
some task of assembling undisciplined samples and con- Springer Nature or its licensor (e.g. a society or other partner) holds
exclusive rights to this article under a publishing agreement with the
ducting tedious in-depth semi-structured (IDSS) interviews.
author(s) or other rightsholder(s); author self-archiving of the accepted
Until recently, AAW was carried out only TiC, and SF manuscript version of this article is solely governed by the terms of
AAW was OtH. But now, NLP can be trained to interpret such publishing agreement and applicable law.

13

You might also like