Dream Machine Text-to-Video First Try

Calculating... Comments

Luma Labs has recently launched Dream Machine, an AI video generation model that is available for public access and testing. Yay!

Dream Machine tries to create realistic videos from text or image inputs focusing on features like smooth motion, cinematography, and character consistency.

Dream Machine is an AI model that makes high quality, realistic videos fast from text and images.

It is a highly scalable and efficient transformer model trained directly on videos making it capable of generating physically accurate, consistent and eventful shots. Dream Machine is our first step towards building a universal imagination engine and it is available to everyone now!

Their website showcases stunning videos, and promotional video gave many viewers a lot of hopes.

So naturally I had to go test it for myself. Done so couple days later, because on the first day there were so many users that each video generation sat in que for hours, there was no point. Now then, what was my experience?

I've decided to give a short and simple text prompt first. Just wanted to see a capybara chilling in an infinity pool. I love capybaras. And I have a favorite video of them soaking in hot tubs saved on my computer:

Real Chil Baras Video

This is only a short cut. It's super relaxing and adorable, right? Likely taken somewhere in Japan. So I've had something like that in mind when I entered my prompt.

The website put my request in a que...

So then the result came in in a couple of minutes.

Text-to-Video Capybara

This was the underwhelming result of my floating capybara prompt. The pool is not infinity one, but that would've been a minor mistake had the animal actually been realistic looking, not this plastic decoy kind of an object.

To test the second capability of image to video, I've used this actual photo of a chilling capybara.

Capybara Image — Capybara Photo by Brian McGowan on Unsplash

'Maybe this will work better?' - I thought. But bummer! I got an error:

Being quite familiar with tech and bugs, I've guessed maybe image's size was the problem (it was over 3 MB) and got a smaller version of the same image , less than 400KB in size. And it worked! So I began waiting impatiently while the Dream Machine was 'dreaming' it up:

Video From Photo

Here is the generated video file from the photo of a real life capybara in water. Lot's of blur, mishaped eyes, later on something that looks like a tumour under the animal's head, and finally complete mess in place of its face as it's attempting to turn.

My Impressions + User Feedback

I mean, I want to give the new tech a break, developing is hard, I know, but such 'product' is hardly worth releasing into the world just yet, much less trying to charge for its usage?

Product needs a lot more testing behind closed doors IMHO. But of course it's a free market and the company can do as they please. Perhaps they will find their buyer at this stage even, who knows? Here's what some people have been sharing on Reddit when discussing the use case for this:

The main point of the discussion was that current AI image generation models are still quite limited and can only be seen as toys or novelties. They lack essential features like reusable characters and scenes and the ability to edit the final product via text input. These features are crucial for practical use in filmmaking or professional content creation.

While these models have the potential to improve and become more useful in the future, they are not yet on the right path for real-world applications. Making these models suitable for actual filmmaking or professional content creation will require significant advancements and could be very expensive. The examples and outputs we see from these models are likely the best-case scenarios, and their performance isn't consistent enough for professional work.

However, one user showed interest in using these models for generating custom stock footage or B-roll on demand. This could be useful for creating YouTube essays or similar content, even if the technology isn't ready for full-scale movie production.

Luma Labs is Hiring

Currently Luma Labs have posted a list of positions they have available

So let's hope they will onboard a bunch of smart people who can get this product into a much better state.

Second Try

I went to check on Dream Machine again today and I'm finding they keep improving the platform, straight away you see the 'Extend' the clip option appeared and feedback buttons.

hummingbird

I've extended the clip with the hummingbird (still only amounted to 10 seconds). Then generated 2 more clips with similar color prompts and thrown them all together to create a short video, where I've used an AI generated music clip. You can view it below:

Purple Fantasies

Some samples of my prompts results on Lumalabs Dream Machine (animations) and Udio (audio track)

Not too bad, especially the Teddy bear. Though not sure why unicorn is talking or chewing, it was never in my prompt, I only asked:

a unicorn colored apple-green is floating in the purple-pink-blue sky

For something more realistic, I've prompted:

two middle-aged women giggling as they're sitting on a bench in the park in autumn, blurred background

LOL not sure why, but this seems to be the case with Bing's Dall-e version too, that whenever I request middle-aged, I get grey-haired, elderly people. Otherwise the machine likes to use 19-year olds for default. What an agesit! No equity! You're either a teen or you're an old fart in AI's books. (No offence, old farts!) Anyway, generated me the two lovely ladies in the park, almost perfect, I'd say. I've used this opportunity to test ElevenLabs sound effects generation and it was lightning fast and realistic sounding. refused to produce any 'snorting' though, haha.

Giggling Women on the Bench

Two middle-aged women giggling as they're sitting on a bench in the park in autumn, blurred background - Dream Machine prompt + voice generated in Eleven Labs , prompt 'two old ladies giggling and snorting softly'

Maybe one day we'll be looking back on these days, thinking: '... remember how we were living before text-to-video was possible? You actually had to get off your couch and take a footage!'

Last modified 25 June 2024 at 17:51

Published: Jun 17, 2024 at 9:57 AM