# NVIDIA Cosmos - A Video AI…For Free!

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=QhA2CH6Z-v4
- **Дата:** 07.01.2025
- **Длительность:** 6:56
- **Просмотры:** 89,549

## Описание

❤️ Check out Lambda here and sign up for their GPU Cloud: https://lambdalabs.com/papers

Cosmos platform:
https://www.nvidia.com/en-us/ai/cosmos/
Hugging Face models: https://huggingface.co/collections/nvidia/cosmos-6751e884dc10e013a0a0d8e6
More: https://github.com/NVIDIA/Cosmos

📝 The paper "Cosmos World Foundation Model Platform for Physical AI" is available here:
https://research.nvidia.com/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD 

Or this is the orig. Nature Physics link with clickable citations:
https://www.nature.com/articles/s41567-022-01788-5

🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Alex Balfanz, Alex Haro, B Shang, Benji Rabhan, Gaston Ingaramo, Gordon Child, John Le, Juan Benet, Kyle Davis, Loyal Alchemist, Lukas Biewald, Martin, Michael Albrecht, Michael Tedder, Owen Skarpness, Richard Sundvall, Taras Bobrovytsky,, Thomas Krcmar, Tybie Fitzhugh, Ueli Gallizzi.
If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers

My research: https://cg.tuwien.ac.at/~zsolnai/
X/Twitter: https://twitter.com/twominutepapers
Thumbnail design: Felícia Zsolnai-Fehér - http://felicia.hu

#nvidia

## Содержание

### [0:00](https://www.youtube.com/watch?v=QhA2CH6Z-v4) Segment 1 (00:00 - 05:00)

this is an exclusive look at a huge new 76 page AI research paper on creating the future and not just the future but many thousands of Futures sort of you see this AI system has multiple models one where we take an input image at the text prompt and Bam it continues this image into the future as a video or these here are text to World results no input image was used here just a piece of text and a video of this quality comes out and here comes the best part finally this is not one of those fully closed systems we all get this and can run it at home for free and then you will see me run my own prompts before this technique was released so these are results that you cannot see anywhere else just here I would normally say that the visual quality is not as good as open AI Sora however this was made for a different purpose and it is really good at that so you already see that these videos kind of have a theme so what is all this about well this work is about helping self-driving cars and humanoid industrial or Warehouse type robots learn about the world the concept is brilliant here's why you see self driving cars have a longtail problem this means that many normal scenarios have tons and tons of videos for the AI to learn how to behave however there are some crazy Corner cases which have very few videos or not at all for instance traffic lights can safely be assumed to be stationary they don't move of course except when you see a track transferring them in which case they move and thus the AI gets really confused it thinks what is going on here this is one of those rare problems that are trivial to understand for us humans but we need these crazy videos to teach an AI to understand them too how can we deal with cases like that dear fellow Scholars this is two minute papers with Dr Caro well we can deal with this by being able to create thousands and thousands of continuations to help the AI get a detailed understanding of the situation or if you wish to teach a robot how to pick up an apple you will need one video that starts with a robot picking up an apple and you cannot teach this concept from just one video to a neural network no sir you need lots and lots of variants for that and that is exactly what this technique is capable of and I got to say if I am pixel peeping I can see that these are not real videos but there are cases which are really convincing for the first site so here are some incredible news this is not like open AI Sora this is a model that is now available for all of us for free even for commercial use the goal is that every AI has different kinds of Hardware cameras so they need different kinds of training data and this system was done in a way that is easy to fine-tune they open sourced the code for that so this enables us to create our own variants for our own use cases and the whole research paper is available for free too that describes how all this was done now I had an early look at this paper and ran my own prompts to get these exclusive results to you fellow Scholars but we have two rules rule one in these cases is that I always always talk about the limitations of the paper to let you know about those two and two no one can change what I say here so the good news here is that the size of these models is not that huge around 7 to 14 billion parameters so they can run on a beier laptop however the bad news is that generation times are not fast a consumer graphics card has enough memory to run these models but you may have to wait for about 5 minutes or longer for a few seconds worth of video footage and when you are done the quality of the results are not perfect not even close it was meant to be a world Foundation model for accurately simulating the future of a simulation but there are cases where the apples don't fall correctly sometimes people can fly or grow six fingers on a hand or both at the same time and also object permanence is not guaranteed things may disappear over time there is an autoaggressive version of the technique that is much faster but in return you have to make some more compromises in terms of visual quality so there is plenty to improve for the next paper I mean if it is meant to be an accurate simulator of the future it

### [5:00](https://www.youtube.com/watch?v=QhA2CH6Z-v4&t=300s) Segment 2 (05:00 - 06:00)

has to accurately simulate the future however I think this is a good time to invoke the first law of papers says that research is a process do not look at where we are will be two more papers down the line and based on historical data I would not be surprised if two or more papers down the line this becomes at least 10x if not 100x faster and more accurate at the same time so cool please also note that we do not have a business relationship with Nvidia and this video was not sponsored by them we have our own sponsors and this will likely be one of the puzzle pieces to make incredible new AI systems that will finally fold my laundry and finally not get lost in the same room after having it cleaned a 100 times before all free to use and modify for all of us what a time to be alive the 76 page paper contains a ton of goodies including user study results to see what humans think about the quality of the results spoiler alert they favorite compared to previous techniques this paper was a herculan effort so I would like to send a huge thank you to all of the scientists who contributed to it to bring this to all of us for free I put the links to it in the description let me know in the comments what you fellow scholar would use this for to run your own experiments on an Nvidia GPU check out Lambda I use it myself regularly for these videos H look at that you can generate high quality images in less than a second per image I did a ton more of them and paid less than a dollar for all this crazy seriously try it out now at lamb labs. com slapers or click the link in the description

---
*Источник: https://ekstraktznaniy.ru/video/17200*