Pyramid Flow: The New Open-Source AI Video Generator
The wait is over for the AI community — Pyramid Flow is here, bringing high-quality open-source video generation to the table. Developed by researchers from Peking University and Kuaishou Technology, this new tool is set to make waves in the AI world.
Demo Video: Tokyo
Shared by Pyramid Flow as an example: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes

What is Pyramid Flow?
Pyramid Flow is an AI video generation model that uses a pyramidal flow matching method to keep computation costs low while delivering high-quality visuals. The video is created in a series of "pyramid" stages, with only the last stage operating at full resolution.
Pyramid Flow uses two main techniques: Spatial Pyramid for image generation and Temporal Pyramid for videos. Based on the diffusion Transformer architecture, it reduces the token count by a factor of four, making training more efficient.
The model was trained on open-source datasets like LAION-5B, CC-12M, SA-1B, WebVid-10M, and OpenVid-1M, with a combined total of around 10 million single-shot videos.
Demo and Code Availability
The model is available for download on Hugging Face and GitHub. The Hugging Face demo page allows users to test the model with text prompts or images. However, due to high traffic, the demo may experience limitations.
Observed Deficiencies
Pyramid Flow can generate a variety of video types. You can see more examples here.
While Pyramid Flow demonstrates impressive capabilities, it is not without its deficiencies. Here are some notable issues observed in the early video examples:
-
Deformations and Hallucinations:
- In the snowy Tokyo city scene, there are hallucinations and deformations, especially with the red signs at the start and the edges of pedestrians warping over time.
- The tsunami video shows a car at the end that doesn't look realistic and warps in shape.
-
Inconsistencies in Objects:
- In the highway scene, the car's rearview mirror reflection and the sunset are well-rendered, but there are inconsistencies in the lighting and shadows.
- The boat sailing video shows inconsistencies in the bars of the Eiffel Tower.
-
Lack of Detail in Complex Scenes:
- In the historic church video, while the overall scene is impressive, the people walking on the street show some deformations.
- The drone view of waves crashing shows realistic waves, but the rocky shore has some inconsistencies.
-
Missing Elements:
- In the cat waking up video, the owner is not present, which is a flaw in the prompt execution, and the cat's face is hardly cute tbh.
- The train in the steam train video is not consistent across the entire scene, and there's a mini cart in the middle that doesn't look realistic.
-
Text and Sign Legibility:
- In the snowy Tokyo city video, the signs on the buildings aren't legible, which is a common flaw in many video generators.
-
Overall Temporal Consistency:
- While Pyramid Flow shows significant improvement in temporal consistency compared to previous open-source models, there are still instances where objects split apart or disappear and reappear.
Benchmarking and Comparisons
Pyramid Flow scored 81.72 overall, just slightly below CogVideo’s 81.85 and Runway Gen 3 Alpha’s 82.32. It had the highest quality score among the models tested.

Accessibility and Limitations
- Originally, hardware requirements were as follows:
- 384p Resolution: Requires 26 GB VRAM.
- 768p Resolution: Requires 40 GB VRAM.
Even high-end GPUs like the Nvidia RTX 4090 might struggle with higher resolutions.
However, good news! Enter
kijai's ComfyUI wrapper nodes for Pyramid-Flow
Available on Github https://github.com/kijai/ComfyUI-PyramidFlowWrapper
- Licensing: Released under the MIT License, making it flexible for commercial use, modifications, and redistribution.
Pyramid Flow over FLUX
cocktail peanut tweeted this exciting news: Now, with FLUX retraining, Pyramid Flow is getting seriously impressive. It handles both text-to-video (txt2vid) and image-to-video (img2vid) tasks and runs smoothly on all platforms—including Macs!
It even supports Mac M1 and M2 devices like the M1 Max with 64GB RAM. It might be the first high-quality video generator with solid Mac support.
For those with GPU setups, it uses CPU offloading, so you only need less than 8GB of VRAM to make it work.
The app is easy to use with the 1-click Gradio Launcher. You can grab it at http://pinokio.computer.
They’ve also released a 768p model (in addition to the original 384p), allowing you to generate up to 10 seconds of video with selection options right in the UI. Both txt2vid and img2vid features are ready to go!
For more details, check out the Pyramid Flow Miniflux model on Hugging Face or its GitHub repo.
I've only just installed it so will need time to test the results myself.
Last modified 21 November 2024 at 18:04
Published: Oct 11, 2024 at 8:25 PM
Related Posts

AI Film Festival 2025: Submissions Open Now
12 Feb 2025