Kling AI Prompt Guide
As far as AI video content creation is concerned, Kling AI stands out as an amazing tool for turning text or images into short videos. Knowing how to write effective prompts is key to making the most of this awesome platform.
Since Kling has just published their first English language guidelines, in this post, we’ll simplify the Kling AI prompt structure, show you some examples and explore Kling camera movements to help you create eye-catching AI-generated videos.
Kling AI Prompt Structure
Let's break down each component:
- Subject: The main focus of your video (e.g., people, animals, objects)
- Subject Description: Details about the subject's appearance and posture
- Subject Movement: How the subject moves in the 5-10 second video
- Scene: The environment where the action takes place
- Scene Description: Details about the setting
- Optional elements:
- Camera Language: How the camera captures the scene
- Lighting: The type and quality of light in the scene
- Atmosphere: The overall mood or feeling of the video
Prompt Examples
Let's look at some examples to see how this structure works in practice:
Basic Prompt | Enhanced Prompt | Advanced Prompt |
"A giant panda is reading a book in a café." |
"A giant panda, wearing black-rimmed glasses, is reading a book in a café. The book rests on a table where a steaming cup of coffee sits beside it, next to the café's window." |
"Shot in medium range, with a blurred background and atmospheric lighting, a giant panda, adorned with black-rimmed glasses, is seen reading a book in a café. The book lies on a table, accompanied by a steaming cup of coffee, next to the cafe windows. Movie-level color palette." |
The basic prompt structure above isn't the only acceptable way though, as Kling goes on to list several high-quality examples where camera shot and camera movement go in the beginning, and subject is mentioned in the middle or end:
The camera zooms into a beacon tower on the Great Wall, first-person perspective, high-speed flight, symmetrical composition, motion blur, and atmospheric lighting.
This prompt focuses heavily on camera movement and atmosphere, which are part of the optional elements in the prompt structure. It lacks explicit subject description and movement. It doesn't fully adhere to the recommended structure.
A circling camera shot captures a handsome young man dressed in ancient clothing, wearing white, seated by the pond with his eyes closed, meditating.
This prompt aligns more closely with the recommended structure. It includes a subject with description, implies subject movement (or lack thereof), and describes the scene. The camera movement is also specified, even though it's placed at the start not end.
But if we were to follow Kling's order of elements in the prompt, we'd have phrased it as:
A handsome young man (dressed in flowing white ancient clothing, serene expression) sits cross-legged by a tranquil pond. He remains still, eyes closed in deep meditation. The pond (surrounded by lush greenery) reflects the sky. Camera: slow circling shot around the subject.
So take the guidance on the prompt structuring into account but also learn from real life examples as well - that'd be my verdict for now.
Tips for Crafting Effective Kling AI Prompts
- Use simple words and sentence structures
- Keep visual content simple, suitable for a 5-10 second video
- Avoid relying on specific numbers, as the AI may struggle with consistency (i.e. 5 trees, 6 puppies)
- For split-screen effects, specify the number of camera angles (i.e. "4 camera angles, representing spring, summer, autumn, and winter")
- Be cautious with complex physical movements, as they can be challenging for the AI (bouncing of a ball or the trajectory of a high-altitude throw)
Prompts for Image-to-Video
In addition to generating videos from text, Kling AI offers an impressive Image-to-Video feature that's revolutionizing the way creators approach video production. I personally love prompting with images initially, because immediately you pick the visual you want, not worrying that some important detail will be misrepresented. I usually generate a few images from text, pick the one I like then go an animate it. But you still need a text prompt to guide an AI in its process.
For Image-to-Video generation, controlling the motion of the subject within the image is the core aspect. Here's the formula for Kling prompts:
Subject + Movement, Background + Movement ···
- Subject: The main focus in the video (people, animals, plants, objects, etc.)
- Movement: Descriptions of the subject's movement status
- Background: Background of the scene
Cat astronaut video from image example
Cat walking forward on the alien landscape, his tail swaying gently. Vibrant meteor shower fills the sky, with meteors streaking across.
The essential elements of the prompt structure with image are the subject and the movement. Unlike Text-to-Video, which requires scene description, Image-to-Video already has a scene provided by the input image. So you only need to describe name the subjects and write what's happening to them.
Static shot of zebra jumping
Flux generated image of anthropomorphic zebra animated using Kling image-to-video, sending zebra jumping. Prompt: man in zebra costume jumps in the air, seen mid jump his feet off the ground, positioned towards the camera, static shot
Similarly, when animating 2 images (with end frame).
Kling tends to make people smile in the middle of singing, I noticed. Perhaps if you need a more serious performance, either use a negative prompt field for 'smiling', or specify what kind of singing it is, as in: "singing a sad song" or "singing a solemn song".
Tips for Effective Image-to-Video Prompts
- Be specific about subject and movement: Instead of just saying "wear sunglasses," use "Mona Lisa puts on sunglasses with her hand."
- For multiple subjects, list movements sequentially: "Mona Lisa puts on sunglasses with her hand, and a ray of light appears in the background."
- Help the model understand the context: If you're working with a painting or photo, be clear about the desired animation to avoid static video generation.
Example:
- Poor prompt: "wear sunglasses"
- Better prompt: "Mona Lisa puts on sunglasses with her hand"
- Best prompt: "Mona Lisa slowly raises her right hand, grasping a pair of modern sunglasses, and gently places them on her face. A soft ray of golden light gradually appears in the background, illuminating her enigmatic smile."
You might find, like I have, that animating a single human/animal subject is easier than two. Not just in Kling, in many visual AIs. There is no problem making multiple subjects do the same thing: two people talking, or dancing, or eating. But if you want one of them drinking while another one is singing and another one is plaing guitar, you might find that the AI is trying to make them all partake in the same kind of activity: they all sing, all eat, well, probably not all playing guitar but there is what I call a 'contageon effect' going on. I'll expand on this on some other occasion. Ok, I've run a text-to-image with such prompt for you.
Camera Movements in Kling AI
Kling AI supports various camera movements to add dynamism to your videos. Here are the basic movements available:
- Horizontal: Move left or Move right
- Vertical: Move up or Move down
- Zoom: In or Out
- Pan: Up or Down
- Tilt: Left, Right
- Roll: Left or Right
Additionally, Kling AI offers four "Master Shots":
- Move Left and Zoom In
- Move Right and Zoom In
- Move Forward and Zoom Up
- Move Down and Zoom Out
To incorporate camera movements into your prompt, use the interface to select your preferred camera movement from the dropdown menu when available. For image-based prompts, Kling currently states there is no tweaking camera motion. You can try adding the desired movement at the end of your description, for example:
"A giant panda is eating grapes by the lake. Camera: move left."
But it seems that it is largely ignored, and you need to stick to picking pre-defined options from the menu below the text prompt.
Not on the list: Full Rotation
Kling AI is pretty good at generating full rotation videos, where the camera is moving around the subject to capture it from all angles, effectively creating a comprehensive view of the subject from all sides.
Either '360-degree rotation', '360 rotation' or '360 spin' work to achieve the goal. This camera movement isn't on the drop-down list, so you have to type it into your text prompt, like so:
Anthropomorphic fat red cat sitting on the chair at the outside table made of wood eating dumplings using his hands, bowl with dumplings on the table, 360 spin around the cat, photorealistic, cinematic
Note that a 5-second video won't be long enough to cover the full spin, so go with 10.
A Note on Panning in Kling AI
It's interesting to note that Kling AI's use of the term 'pan' differs from its traditional meaning in cinematography:
- Traditional panning: In standard film terminology, panning refers to a horizontal camera movement where the camera rotates left or right while remaining in a fixed position.
- Kling AI panning: Interestingly, in Kling AI, 'pan' is classified as a vertical movement (up or down), similar to what is traditionally called a 'tilt'.
When crafting prompts for Kling AI, keep this distinction in mind. If you want a horizontal camera movement, use 'tilt left' instead of 'pan left'. But if you're in Luma, remember to use 'pan left' and 'tilt up' again. Brr... As if this whole generative video wasn't confsing enough already.
Kling's Motion Brush
Kling's latest motion brush feature allows users to draw a path and move objects within an image.
This feature enables creative video editing, such as making a cat jump over an object by selecting a cat in the image and literally drawing a path as an arrow, pointing where the cat should go.
The ability to select and move objects within a video frame by frame offers more creative freedom. But it won't be a magic fix to just about anything. For example, it won't overcome AI's guidelines and limitations, such as not hurting living creatures or not breaking laws of physics in most cases. While you can draw a dramatic trajectory for this howling wolf to leap into the sea, it won't do that. The bird on the other hand will have no issues flying back to the ground:
This tool is only available for image-to-video. If you want professional quality, you can pick either Kling 1.0 or Kling 1.5 (latest), but if you want standard quality you have to opt for the older Kling 1.0 mode.
Using Negative Prompts
Kling AI provides a separate form field for listing any things you don't want to see in your video.
It's a neat function to have, in theory, but in my experience so far it won't help get rid of an unwanted feature, it will just replace one undesired effect with another similarly undesired one, or it will completely ignore your request altogether.
For instance, yestarday when making my zebra video, I've suddenly seen car appearingon my empty street, so in my next generation I've put 'cars' in the negative prompt field. That had no effect whatsoever and in other cases seemed to have only triggered more cars. Original image didn't have anything in that spot, nothing resembling a vehicle:
Negative prompt has never worked to combat 'disfigurement', despite this being listed as an example on Kling's platform. If AI can avoid disfigurement it probably will, if not this little 'spell' won't help.
Why won't negative prompts work?
Typically, image generator work best on 'what is' rather than 'what is not'. Maybe this will change one day but so far, by and large, you're better off trying a positive equivalent and being more specific where AI is producing an unwanted result. For example, instead of saying 'no cars', say 'empty street'. In my case, what helped was describing what IS behind the zebra, if not the cars: 'Behind him is a long street with a brick wall at the end, blurred.'
Sometimes AI gotta do what AI gotta do. We as users can't see what's going on behind the scenes, what training data is lacking or is conflicting at that moment. But sometimes it is what it is: your prompt wording is clear and unambiguous, your image doesn't have any odd elements that could be misconstrued, but the resulting generation keeps being wrong. I've first noticed that in Midjourney with my prompting for an elephant in the room. No rephrasing, no negative operators, nothing helps when AI can't fulfil your particular vision at this time. It's just a program after all, it has limitations and bugs.
How to Fix “Generation failed, try another prompt”
Sometimes Kling won't accept your prompt at all. This happens due to the censorship layer. First your prompt is analyzed by a simple 'banned words' filter, if it finds a match - your whole prompt is rejected and it won't tell you which one poisoned the well.
From the words I've come across that appear banned were: 'pig', 'exorcism', 'filthy'. Once you replace 'pig' with 'piglet', 'filthy' with 'unclean' the text is accepted.
1. Think of any words you might have that can possibly have a naughty or offensive connotation. Well, to be honest I'm not sure where 'exorcism' falls...
2. Use Kling's image generator to try out different prompt options excluding suspect words, it is the same filter in there and it will only cost you a fraction of a video generation cost to try out. It is actually not a bad idea to sometimes try your prompts there first anyway, just to preview.
You can also avoid this by making image-to-video generations. Even though the word 'pig' is banned, Kling makes awesome videos of them from images.
You can generate your image in Flux or ChatGPT or Leonardo or Midjourney - they have very different filters and your prompt is likely going to go through. You might have to modify your video prompt a little to better suit an image but oftentimes just using same text prompt for image works.
3. Ask your preferred LLM chatbot to reduce your prompt so you can try narrowing down which words might trigger the ban. Here is the instruction you might give to the chatbot:
I'm going to provide you my long prompt, which is not being accepted by a model due to having some banned words in it. Since I don't know which words, I need you to generate me 4 versions of this prompt, ranging from the most basic possible, to then slightly more detailed, even more detailed and one almost as detailed as my original one. Please also highlight in bold any new details you're adding to prompts 2-4. Are you ready for my prompt?
Then take your shortest prompt to Kling's image generator, try it, then try longer one and so on until it refuses to generate. Then see which version contains the offending words and try and replace the most likely suspects with synonyms.
Last modified 26 September 2024 at 20:48
Published: Aug 10, 2024 at 11:29 PM