While St Jerome was getting lost in time, the conference got underway. In a later episode, we will see where he ended up, but for now let’s stay in what DALL-E believes is a conference room in Brussels.
I sometimes ask ChatGPT to write a description of a photo, and then feed its own description back to DALL-E as a prompt. This is what happened here:
Me: Create a photorealistic image from the following description: The image shows an indoor conference room setting with a number of attendees. It appears to be a formal event, possibly a conference or seminar, given the structured seating arrangement, personal laptops open on desks, and the presence of microphones. In the background, the European Union flag indicates that this may be taking place in an EU facility or is an EU-related event. On a large screen, there is a woman speaking, suggesting that she may be a key speaker or presenter at this event. The slide next to her reads “Creativity in translation,” indicating the topic might be related to language translation. The attendees are focused on the presentation, and the setting is professional.
I must have been dissatisfied by the first attempt because I retorted to DALL-E like this:
Me: Can you try to create a different version?
DALL-E refused to create just any different version, and asked for more precise instructions:
[…] I can make adjustments and try to create another version based on specific feedback you provide. What would you like to change for the next image?
Me: Please make the conference room circular. Add windows and sunlight.


The big surprise is that the title text is correct in the first image, although DALL-E could not help itself and added gibberish further down: something like “Owotvjry los Ormagng Vanlteofaim”. In the second image, the text began to deteriorate: “Creatvity in Transelaon”.
Next, I asked DALL-E to remove the reference to the EU and make the room slightly less formal. The result was nothing interesting, so I will skip that image. What happened after this is more worth our attention.
Me: Remove the text. Make the conference room larger. Bring back the panel of speakers, seated on the stage.
Then me again: Turn the image around, so that we are facing the panel of speakers.


Of course, DALL-E does not know how to “turn around” an image, but at least it has an unwavering knack to the grotesque — because it announced this after the second image: The image has been reoriented to face the panel of speakers as you requested.
DALL-E isn’t lying, of course. (It would assume intention, which it does not have.) But it also has no idea what’s on the image.
At this point, I felt it was time to add the surreal:
Me: Add planets to the scene.

And then I found an elegant way of changing my mind:
Me: Sorry, I made a typo. I meant plants. Please add plants to the scene.

The astonishing thing about this is that from my prompt, DALL-E knew where to track back and which image to modify.
