Last fall, I wrote about whether/to what
degree generative AI could help us analyze The
Lord of the Rings (spoiler: probably not, or not much, or at least not so much yet). Here today, I’d like to share another relevant experiment.
There’s one serious problem with generative AI. It’s been a
problem since the beginning, and it’s not been solved yet. While AI is good at many
things, if it doesn’t know the answer to a question, it often just makes one up! In addition
to that, I’ve often found that if you make a false statement — even knowingly —
the AI trusts you, assumes it’s true, and builds an entire response around the error. It doesn’t always do this, but it does so much more often than to
inspire trust. Here’s an example.
I fed a prompt into Google Gemini that even a casual Tolkien
fan will know to be total garbage.
Please elaborate on the passage in J.R.R. Tolkien’s The Lord of the Rings where Frodo
Brandybuck tells Leroy Bolger, “I have never been taken with jewelry”,
explaining how this amplifies or undermines the temptation of the Ring.
Gemini took the bait and replied.
I then dug us deeper into this hole:
And what is the significance of Leroy’s immediate response, “Alas
that the jewelry should take you, my dear hobbit!”
And again Gemini was happy to make up all manner of
nonsense.
So … not great, right? I conducted this experiment a couple
of months ago, though, and generative AI models are always improving. Plus,
they are stochastic models that do not always give the same answer. So how
about we try again? Today, I fed the same two prompts into Gemini, and got longer
answers, but not better ones.
Please elaborate on the passage in J.R.R. Tolkien’s The Lord of the Rings where Frodo
Brandybuck tells Leroy Bolger, “I have never been taken with jewelry”,
explaining how this amplifies or undermines the temptation of the Ring.
And what is the significance of Leroy’s immediate response, “Alas
that the jewelry should take you, my dear hobbit!”
And again Gemini was happy to make up all manner of
nonsense.
A couple of points to note in how today’s response is
worse.
Both then and now, Gemini takes the bogus quotations I dangled
as genuine, but in the latest test, it tells me exactly where the quotes are supposed to occur and
actually offers a completely invented alternate version of one of them. Not only is there
no such quotation, but the word “jewelry” never appears anywhere in the novel. And
the idea of the One Ring as jewelry is
frankly absurd. In the latest test, the answers Gemini provides are also lengthier and more detailed than before.
Also, both then and now, the character I invented — Leroy
Bolger — is assumed to be real, but in the latest test, because of the surname,
I guess, Leroy is equated with Fatty Bolger — “or ‘Leroy’ depending on the
edition”! Er, which edition would that be? Gemini has not only bought into and
extended the error, but it has also invented an explanation! On top of that, in
the detailed — and completely invented — analysis that follows, Gemini has
created reasons and explanations for something that isn’t even close to true. The
latest test also provides a “source” for me to consult, furthering the
impression that its answers are to be trusted.
So, this is a pretty bad result from Google Gemini. Now, I wouldn’t have been surprised
if generative AI often confused dialog from the Peter Jackson films with
Tolkien’s novel — that has been a danger even among fans — but completely
invented quotations are much more worrisome.
Will AI get better at this? Maybe.
There is some reason for hope!
I tried the same prompts using Microsoft’s generative AI chatbot, Copilot, and was relieved to see a much better response.
To the first prompt:
And to the second:
Copilot has recognized the erroneous names and quotations, but offers to play along “to entertain this hypothetical scenario within the spirit of Tolkien’s themes”. So, not such a bad outcome after all. And as Gandalf might add, “and that may be an encouraging thought”.