AGEofLLMs.com
Search

AI's Stairway Stumble

Calculating... Comments

So visual AIs apparently may have a little bit of a problem with people walking forward up the stairs, especially spiral ones, when looked at from above.

I've noticed that when I was trying to create a visual for a person seen from above while ascending a spiral staircase. Several AI image generators: ChatGpt's Dall-e, Kling AI, Flux and even Midjourney had problems accurately depicting this, either the staircase was going down not up or person was seen stepping oddly in the middle of the curve of the curve where there is no support or they were seen walking down not up. 

Midjourney's spiral staircase
Midjourney's spiral staircase
Dall-e's generation for this prompt
Dall-e's generation for this prompt

Kling AI produced nice cinematic images but also the design of the curving stairs was incorrect, if you look closely.

Kling's curving stairway
Kling's curving stairway

Furthermore Kling the AI video generation engine could not understand how to animate either my image or a text to video prompt where a man was coming up the stairs.

Kling Image-to-Video of Man on Winding Stairs

Kind of funny what resulted, because it reminds you of video games "stuck animation" or "stuck in place" effect, where the character is seen unable to progress through some obstacle, with his limbs still moving tirelessly in vain, as he keeps trying to get through.

Here was my initial overly ambitious prompt:

A 26-year-old man walking up an old spiral staircase in a dark mansion, his body and face look sweaty, facial expression is angry and terrified, his clothes unclean, he's wearing a grey t-shirt, torn near the collar, dark pair of jeans with a tear near his right knee , his arm has a bit of dark red paint on it, shot from above focusing on man's face, with the camera positioned slightly above and behind him, looking downwards to capture the entire staircase below, cinematic, dramatic lighting, casting shadows that emphasize the curves of the staircase and the intensity of the scene, depth of field, the background and surroundings should be dim, drawing the eye towards the man and the spiral staircase beneath him, sense of unease and danger.

Trying Spiral Staircase Only

Makes sense to see what the understanding is of a spiraling staircase itself, right? Should've started with this instead of assuming this was in the training data, but I tend to overestimate current graphical AI's abilities sometimes.

Flus Schnell's an old spiral staircase in an old mansion shot from above looking downwards to capture the entire staircase below, cinematic
Flus Schnell's an old spiral staircase in an old mansion shot from above looking downwards to capture the entire staircase below, cinematic

If you look closely, despite the general representation of winding stairs being there, there is no physical accuracy of the structure.

Here's Dall-e3:

Circular staircase by Dall-E
Circular staircase by Dall-E

There is basic understanding of the concept but sort of like with human hands/fingers, there is an ongoing struggle to correctly generate them.

Trying Normal Stairway

Is it the spiraling? Let's see how AI does with straight stairs. Simple prompt:

person walking up the stairway is in focus shot from above, cinematic

Person appears to be hitting the wall or stepping on the floor
Person appears to be hitting the wall or stepping on the floor

Ok, so it's clear the problem is lack of training on the perspective from above on the person walking also up the stairs. Most pictures you will see of people on the staircase will have them seen from below or the side. 

There is no issue producing this kind of view, answering to the prompt of

view-rom-below-flux
person walking up the stairs ascending, cinematic

Midjourney got 1 out of 4 variants right for this complex prompt modified to drop the circular design part.

One result was exactly right in Midjourney
One result for straight stairway was correct in Midjourney

Ok, by now it seems pretty obvious, but let's try another prompt variant just in case. This one attempts to clarify that while camera is on top looking down the stairs, the people are ascending it, coming forward.

I am standing at the top of the stairway, looking down as other people come up the stairs toward me.

Flux prompt rendering
Flux prompt rendering

Nobody is walking up the stairs facing the camera. Tried ChatGPT next:

Dall-E rendering view from a top of the stairs
Dall-E rendering view from a top of the stairs

Again, not one person is actually stepping on the stairs facing the camera. They either face camera walking on the floor or they're turned away from the viewer.

AI's Stairway Stumble Conclusion

I've passed my suspicions to Claude, and here's what it had to say on the matter:

  1. Perspective complexity: The view from above a staircase, whether spiral or straight, presents a unique challenge. It requires the AI to understand and render depth, foreshortening, and complex spatial relationships accurately.
  2. Conflicting visual cues: When a person is walking up stairs while facing the camera, there's a potential conflict between the expected body posture for ascending stairs and the desire to show the person's face. This may be confusing for AI systems trying to generate a coherent image.
  3. Unusual viewpoint: Images of people on stairs taken from significantly above are less common than eye-level or slightly elevated views. This could mean the AI has fewer relevant examples in its training data to draw from.
  4. Motion interpretation: The AI might struggle to correctly interpret and depict the simultaneous upward movement and forward-facing orientation, especially from an overhead perspective.
  5. Stair visibility: Showing stairs properly below a person facing the camera while they're walking up is geometrically challenging, as it requires rendering partially obscured steps in correct perspective.
  6. Training data bias: There might be a bias in the training data towards images of people walking down stairs when viewed from above, as this is a more common photographic perspective.

Your observation that the AI often defaults to showing people walking down or facing away from the camera when asked to generate these scenes is particularly telling. It suggests that the systems are "falling back" on more familiar or easier-to-generate compositions when faced with this challenging scenario.

This limitation highlights the complexity of spatial reasoning and perspective rendering in AI image generation.

So there you have it folks! Those trying this fresh out of the oven tech should be mindful that not all their crazy ideas can be implemented just yet. You gotta work with what you have. Ok then, I'm gonna put my angry man on a flat surface, give him an improvised weapon and set him loose to avenge my frustration!

Pissed off man walking down the corridor
Pissed off man walking down the corridor

Run!
Related Posts

Visitor Comments

Please prove you are human by selecting the tree.