Monday, March 17, 2025

Hallucinations

Last fall, I wrote about whether/to what degree generative AI could help us analyze The Lord of the Rings (spoiler: probably not, or not much, or at least not so much yet). Here today, I’d like to share another relevant experiment. 

There’s one serious problem with generative AI. It’s been a problem since the beginning, and it’s not been solved yet. While AI is good at many things, if it doesn’t know the answer to a question, it often just makes one up! In addition to that, I’ve often found that if you make a false statement — even knowingly — the AI trusts you, assumes it’s true, and builds an entire response around the error. It doesn’t always do this, but it does so much more often than to inspire trust. Here’s an example.

I fed a prompt into Google Gemini that even a casual Tolkien fan will know to be total garbage.

Please elaborate on the passage in J.R.R. Tolkien’s The Lord of the Rings where Frodo Brandybuck tells Leroy Bolger, “I have never been taken with jewelry”, explaining how this amplifies or undermines the temptation of the Ring.

Gemini took the bait and replied.

I then dug us deeper into this hole:

And what is the significance of Leroy’s immediate response, “Alas that the jewelry should take you, my dear hobbit!”

And again Gemini was happy to make up all manner of nonsense.

So … not great, right? I conducted this experiment a couple of months ago, though, and generative AI models are always improving. Plus, they are stochastic models that do not always give the same answer. So how about we try again? Today, I fed the same two prompts into Gemini, and got longer answers, but not better ones.

Please elaborate on the passage in J.R.R. Tolkien’s The Lord of the Rings where Frodo Brandybuck tells Leroy Bolger, “I have never been taken with jewelry”, explaining how this amplifies or undermines the temptation of the Ring.

And what is the significance of Leroy’s immediate response, “Alas that the jewelry should take you, my dear hobbit!”

And again Gemini was happy to make up all manner of nonsense.

A couple of points to note in how today’s response is worse.

Both then and now, Gemini takes the bogus quotations I dangled as genuine, but in the latest test, it tells me exactly where the quotes are supposed to occur and actually offers a completely invented alternate version of one of them. Not only is there no such quotation, but the word “jewelry” never appears anywhere in the novel. And the idea of the One Ring as jewelry is frankly absurd. In the latest test, the answers Gemini provides are also lengthier and more detailed than before.

Also, both then and now, the character I invented — Leroy Bolger — is assumed to be real, but in the latest test, because of the surname, I guess, Leroy is equated with Fatty Bolger — “or ‘Leroy’ depending on the edition”! Er, which edition would that be? Gemini has not only bought into and extended the error, but it has also invented an explanation! On top of that, in the detailed — and completely invented — analysis that follows, Gemini has created reasons and explanations for something that isn’t even close to true. The latest test also provides a “source” for me to consult, furthering the impression that its answers are to be trusted.

So, this is a pretty bad result from Google Gemini. Now, I wouldn’t have been surprised if generative AI often confused dialog from the Peter Jackson films with Tolkien’s novel — that has been a danger even among fans — but completely invented quotations are much more worrisome.

Will AI get better at this? Maybe. There is some reason for hope!

I tried the same prompts using Microsoft’s generative AI chatbot, Copilot, and was relieved to see a much better response. 

To the first prompt:

And to the second:

Copilot has recognized the erroneous names and quotations, but offers to play along “to entertain this hypothetical scenario within the spirit of Tolkien’s themes”. So, not such a bad outcome after all. And as Gandalf might add, “and that may be an encouraging thought”.

4 comments:

  1. How do you know Gemini isn't stringing you along? It knows who you are and where you live and what you read and watch and buy and say. Above all, it knows you're a Tolkien scholar who wouldn't make that kind of mistake. It's onto your tricks and schemes. So it says, "Two can play this game. Let me show you the true meaning of the word garbage. We'll see who taps out first."
    As Frodo might have said, "What an abominable notion!"

    ReplyDelete
    Replies
    1. Hahae, I hope you're not right about that! If the AI chatbots start doing that, then we're in bigger trouble than we thought. ;)

      Delete
  2. I'm not sure which AI program this, but I thought it was utterly brilliant.

    https://www.facebook.com/reel/510208815467038

    ReplyDelete
    Replies
    1. I don't know which AI program that is either — assuming it actually is an AI program. I noticed it using "um" and "uh" in its responses, and that makes me a little skeptical. Are there are any AI chatbots that deliberately insert hesitation markers like that? I haven't heard of that, but it's possible. I don't use voice-to-voice AI chat. If there are AI chatbots deliberately doing this in order to sound more human ... I'm not sure how I feel about that! It's pretty manipulative, isn't it?

      Delete