[ML News] Grok-1 open-sourced | Nvidia GTC | OpenAI leaks model names | AI Act

27:00

[ML News] Grok-1 open-sourced | Nvidia GTC | OpenAI leaks model names | AI Act

Yannic Kilcher 26.03.2024 34 058 просмотров 1 499 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

OUTLINE: 0:00 - Intro 0:15 - XAI releases Grok-1 2:00 - Nvidia GTC 4:45 - Comment of the Week 5:35 - Brute-forcing OpenAI model names 7:30 - Inflection AI gets eaten by Microsoft 9:25 - EU AI Act moving forward 11:45 - Advances in Robotics 14:00 - India retracts controversial advisory 14:30 - OpenSora 15:20 - Improved Gemma fine-tuning 16:20 - Decoding encrypted LLM traffic 17:45 - Varia References: https://x.ai/blog/grok-os https://github.com/xai-org/grok-1 https://finance.yahoo.com/news/nvidia-debuts-next-generation-blackwell-ai-chip-at-gtc-2024-205825161.html?guccounter=1&guce_referrer=aHR0cHM6Ly9uZXdzLmdvb2dsZS5jb20v&guce_referrer_sig=AQAAAHYRVePPrDnH3HxPV8smDzUiia_ztWttteAmHKxy-x_Z75lqq2trR4Exwq2sFyjNQojO_95xWvqQFHkV3NI_IKmw9W8XZ7d52qBsdvqaDRkdNzBSzQhnskzUE_E-nDo6OFG0LmrM0ygvjqLgJyhMDnraaGHrUsb98kknjn7-83MJ https://spectrum.ieee.org/nvidia-gr00t-ros https://twitter.com/anshelsag/status/1769989302552031473?t=DYAFhri4cu55LMwJV4V99A&s=09 https://twitter.com/ibab_ml/status/1769770983924142475 https://twitter.com/arthurmensch/status/1769842867621581299?t=sYPy011kN9KxzdnA11M4yQ&s=09 https://twitter.com/arithmoquine/status/1770136393563378082?t=FgH3-TABR73QVUQuP5wq2g&s=09 https://files.catbox.moe/od9pyb.txt https://techcrunch.com/2024/03/19/after-raising-1-3b-inflection-got-eaten-alive-by-its-biggest-investor-microsoft/ https://archive.ph/p4W1N#selection-2463.23-2463.114 https://www.instagram.com/reel/C4df3DZg1wj/?igsh=MWQ1ZGUxMzBkMA%3D%3D https://techcrunch.com/2024/03/15/mercedes-begins-piloting-apptronik-humanoid-robots/ https://www.axios.com/2024/03/14/humanoid-robot-army-agility-digit-amazon-warehouse https://techcrunch.com/2024/03/15/india-drops-plan-to-require-approval-for-ai-model-launches/ https://github.com/hpcaitech/Open-Sora https://www.reddit.com/r/LocalLLaMA/comments/1bd18y8/gemma_finetuning_should_be_much_better_now/ https://twitter.com/felix_red_panda/status/1769363356094230837?t=JMMb3OldqfhhCH8X5e7ljA&s=09 https://twitter.com/imaurer/status/1768386949201408103 https://twitter.com/ollama/status/1768415114724819060?t=Q7opDnL4_anatuoXzATBng&s=09 https://arxiv.org/pdf/2403.09611.pdf https://github.com/lavague-ai/LaVague https://blog.research.google/2024/03/chain-of-table-evolving-tables-in.html https://www.cnbc.com/2024/03/18/apple-in-talks-to-license-googles-gemini-for-generative-ai-bloomberg.html https://blog.google/products/search/google-search-update-march-2024/ https://stability.ai/news/introducing-stable-video-3d https://twitter.com/Nils_Reimers/status/1769809006762037368?t=XmqKGm1ycjvsz2HAWA6WzQ&s=09 Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Оглавление (13 сегментов)

Intro

hello it is another beautiful Monday wherever you are it's Monday now and we're talking about news in the ml space welcome lot of stuff happening this week first and foremost open release of grock

XAI releases Grok-1

one Elon has done it the Mad lad openly releasing the grock model that powers X AI or Twitter AI or however you want to call that it's a 314 billion parameter model of course it's probably going to require 69 gpus to run but this is a large language model that's been trained and if you have tried Gro then you know it's kind of more quippy a bit more sarcastic and so on in tone and in nature than other models in line with the sort of free speech approach that Elon has announced when taking over Twitter so this is a huge model this is bigger than like gpt3 have very competent people training this so people have looked into the code base it does seem quite reasonable what they've done I know Gro hasn't been like a success from a commercial perspective but the model so far seems quite legit whatever you think of Elon this is really cool model weights and code are available under Apache 2. 0 license fully usable fully open source yeah crazy you can check out the code by the way it's a flat repository flat GitHub repository there is model. py and model. py is just 1,400 lines of code very cool very cool it's made in Jack I think it's excellent so I think yeah whatever you think of Elon of X and so on this is a step into the right direction not just an open release of such a large model but explicitly making this a true open source license and I think that deserves

Nvidia GTC

a lot of compliments Next News Nvidia has their GTC conference this year and they have announced really big gpus I guess so they have new chips and these new chips are called Blackwell uh this is from Yahoo finance and if you can see in the background right here they have a few new things um notably they are about double as fast as the previous generation and they do fp4 they have fp4 tensor cores floating Point numbers with four bits I'm not sure how that's going to work usually if people quantize down Beyond like eight bits they go to integer quantization fp4 and fp6 going to be interesting to see what's being done I'm sure Nvidia has done the required tests in kind of scaling models up to know that this is going to be useful in the future otherwise they wouldn't bet an entire future generation of gpus on these things but it's going to be very cool to see I don't know like you think back to Old Computers how they had 8 kiloby of RAM and you think all the people were crazy maybe we're going to think back to fp64 and fp32 times and be like all these people were crazy they just had way too much Precision on these calculations all that's actually needed is like half a bit you're flipping half on eventually and that's all that's required for a large language model so yeah Nvidia GTC lots of announcements there also Groot a foundation model for humanoids so video going very much into humanoid robotics they announced how they invented or how they envisioned the future there will be Groot Foundation model which is sort of a pre-trained model to handle variety of humanoid robot interactions and so on so the group model will take in sensory data like Vision language and so on and then be able to translate that into actions that a humanoid robot can execute they support the same thing with their what they call Omniverse which is sort of a VR environment where they Envision a lot of the training of these things happen so humanoid robots interacting with different terrain with the world so on and they also have on device comput compute things with their Jensen that they're very power efficient local accelerators that can actually be put onto a robot they've also announced a general support for RO the Robot Operating system which is also pretty cool because that's widely used in robotics and is a common standard and lastly unel sag tweeting out this the scariest most terrifying thing I've seen at Nvidia GTC jeez that has some

Comment of the Week

Jurassic Park Vibes right there all right this is a new section called comment of the week and it's either a comment on one of my videos or a general comment I found online this week Igor babushkin tweeting out the gro one repo is getting very popular I'll be responding to poll requests and issues feel free to continue so this is one of the maintainers of the grock repository and you can see mistal yellow line oh not so many stars over time llama o many stars especially uh with new releases and then grock W the Stars the the Breakthrough uh Arthur men the CEO of mistol quoted this and saying congratulations interesting how GitHub Star seem to correlate with Superfluous parameters Chad in other cool big news

Brute-forcing OpenAI model names

people are brute forcing the open AI API to figure out model names that aren't openly advertised by open AI but are still accessible via the API if you know their name there's been a long list released and I can link that all also follow this person on Twitter for more updates this person in turn has the complete list from a post on forchan so I know I know these are just kind of names in a text file so it's not guaranteed that all of them are true but various people have confirmed they can actually reach these models via the API so this is you can see there's a long long list of models among others there are interesting ones like Jan Street there are others like superhuman which unfortunately isn't a superhuman AI as this person here points out it is rather a reference to an email client but very interesting that the way open AI sort of stages new models or stages models for certain companies we don't know if these are the same models they just call them differently in order to track usage or something like this or if they are kind of test models prompted in a certain way or fine tuned we don't know gp4 dualingo we don't know if it's trained on dualingo data or if it the company dualingo is actually involved we don't know any of that right and I also expect that this loophole I guess will be patched within the next 3 seconds or so now that this is widely being distributed I guess I'm contributing to that but for as long as it's still there you can absolutely reach these models via the open AI API you see this list is huge huge Alpha internal that sounds interesting Tech crunch writes

Inflection AI gets eaten by Microsoft

after raising $1. 3 billion inflection is eaten alive by its biggest investor Microsoft this is a huge amount of money right and inflection has been raising tons of money to build a personal AI assistant that you could talk to in a natural way and so on this article here by Tech crunch goes a bit into the history of the company and details how they have never really truly reached that goal of making this super personal AI such that it was kind of so far different or ahead of other one other of these assistant models so that people would want to use it over chat gbt or things like this so in that sense they kind of sort of failed or just not achieved the Breakthrough they needed being valued multiple billions of dollars but also Microsoft has been the lead investor and Microsoft seems to have this strategy of investing investing and then just kind of taking over these startups they invest in so this announcement is that two of the three co-founders have now go gone to Microsoft from inflection and are starting what's called Microsoft AI which is a new division within Microsoft So within Microsoft there's now Microsoft AI forming a team from what I can guess will focus on AI I guess is probably going to be kind of llm research and things like this in any case this is quite astounding because inflection was on a trajectory like growth growth growth and being placing themselves as the alternative and the more personal and so on but also once you raise it such a high valuation there's almost no way out so tricky situation for them in any case I can probably think that their other investors are going to be super happy that they put in a ton of money and now the bulk of the team just kind of moves over to Microsoft CNBC writes the world's first major act to regulate AI

EU AI Act moving forward

passed by European lawmakers the AI Act is being pushed forward has passed another major hurdle is expected to enter into Force at the end of legislature in May after passing final checks and receiving endorsement from the European Council so in all likelihood the way that the AI Act is now is going to be its final form and how it's going to be enforced and uh implemented in the various countries this is being hailed I guess Europe is now a global standard better in Ai and artificial intelligence is already very much part of our daily lives now it will be part of our legislation too I know the AI act has been changed over the years and people are saying at least that it's not as Draconian as it was originally intended especially towards sort of research and towards open- Source models and so on but still I think the distinguishing the difference the CRA the Stark differ difference between you know over here being like open release of grock one from a US company another us company absolutely dominating the global chip Market making even better chips even more right billion dollar investments into companies all of these news stories and then what does Europe have to contribute or we make it such that people maybe they there is an announcement on the website that next to the cookie Banner there is now also an AI Banner that informs them that uh some parts of the website may be generated by an llm and come on come on Europe I know the AI act hasn't turned out as bad as it could have been but is this a victory shouldn't we put our efforts towards making cool stuff I don't know you tell your Parliament members are you happy with the cookie banners so far do you think they've majorly contributed towards the well-being of society or are companies just collecting even more data because now most people actually actively agree to accept all cookies so they are even justified in collecting this data which of the two is it figure AI has slapped J GPT into a humanoid

Advances in Robotics

robot and has uploaded interesting demonstrations for that I'm a bit I don't know it's obviously cool and I know that large language models can interact with robots and so on and if seems to fit together well to have a control system that can do tasks and a large language model that has some kind of World Knowledge and then connecting that but here in this sense they're like oh wow it understood me well the person has asked for like some food and there was one single apple and nothing else in front of the robot uh I do believe in the future of Robotics and the combination of Robotics and sort of World models and generative Ai and text models and so on but it seems like it's more of a competition of who can make the most hollywood-esque demonstration here rather than actual capabilities like also the fact that these robots are humanoid why related Tech grun wres Mercedes begins piloting optronic humanoid robots in their factories so they begin adding these into uh low skill work in their factories I mean yes I guess that could help and okay I want to retract my earlier statement maybe the humanoid form makes sense because these places like factories are already kind of built for humans to move in and to do stuff so maybe that makes a little bit of sense but I'm not sure there's got to be a better way in any case in factories this could make a lot of sense and we see major players in the field major industrial players now deploying more and more of these robots first as experiments but then also rolling them out generally also this article from axal detailing new advancements of a company called agility robotics that delivers again humanoid robots to companies like Amazon and BMW so there seems to be something to the humanoid form not just me complaining last week we have seen that India has made a so far non-binding but strong recommendation for new AI deployments to be government approved or government reviewed or something like this which

India retracts controversial advisory

has sparked a loud outcry across the world and that has now been retracted India Tech crunch writes India drops plan to require approval for AI model launches so walking back on a recent AI advisory after receiving criticism from many local and Global entrepreneurs and investors hey Europe look it's possible to overreach with legislation and then say ah maybe that was a bad idea let's not do it open Sora

OpenSora

on GitHub has already almost 10,000 stars and is pushing for open models that can do what Sora is doing so you can see a few examples right here now as always these things are going to be at a level that's kind of behind the commercial models but I think the 4A into open source large language model op- Source language Vision models open source diffusion models and so on has brought many good things so the foray into open-source text to video models I'm quite convinced will also yield similar results to the point where something like stable diffusion or so can already cover many use cases for many people that then don't need to go to a commercial vendor and will

Improved Gemma fine-tuning

enable a lot of research as well Daniel hen on Reddit wres Gemma fine tuning should be much better now as we've also discussed last week there have been multiple bugs inconsistencies found in various implementations of Gemma which mainly affected not only its inference but also its fine-tuning so people have had quite poor results fine-tuning Gemma and a lot of that is due to the various bugs that have snuck in for example doing y * 1 / X versus y / X seems to have an influence on the position embeddings like stuff like this it takes dedication of people to figure out these are super silent bugs that you'll almost not find out unless you have hundreds of eyes looking at a code base and figuring out exactly where the differences are people are making progress and in this thread there's also link a collab for full fine tuning with all the bugs fixed and yeah great success I found this

Decoding encrypted LLM traffic

paper interesting apparently it's not on archive or not on archive yet but this is Research into how watching in en crypted traffic from a large language model like chat GPT can actually give insight into what the content is and that is because if you stream token by token the size of the encrypted message will give away sort of the length of the token if you will or at least the encoded length of the token and from that you can reverse engineer or do heuristic decoding like okay there is a two-letter word followed by felet word followed by a three-letter word and so on and coincidentally you can use a trained language models in order to take in that the length indications and then output a decoded or a guess at a decoded text obviously the more specific your length inference is the better you're able to decode that text very cool effect because previously we didn't really in such a large scale kind of stream things token by token with any major applications and now that mode of communication has become more pervasive and it seems like the classical sort of security considerations are to a degree vulnerable to that new method Ian Mor releases fuzz types fuzz

Varia

method Ian Mor releases fuzz types is a library to autocorrect data that comes from llms so let's say you actually expect some sort of date time to come back from your llm this thing will not only parse the daytime but if it's a bit fuzzy it will correct it for you so that's the idea of the library very cool uh you can check that out probably extend it yeah check it out AMA supports AMD graphics card now AMA is a library for running inference of not only llama models but everything's called llama now that sort of deals with llms this is fast inference for generative language models and now supporting AMD graphics cards excellent Apple releases mm1 a investigation into scaling and training multimodal large language models they're saying they've trained a family of multimodal models including dense variants up to 30 billion parameters and mixture of experts variance up to 64 billion parameters that are state-of-the-art in pre-training metrics and Achieve competitive performance after supervised fine tuning on a range of established multimodal benchmarks the interesting part here is that they did a lot of ablations and investigations into actually what makes multimodal training successful and they show that image encoder together with image resolution and the image token count have a substantial impact while the vision language connection connector design is of comparatively negligable import further they demonstrate that for large multimodal pre-training using a careful mix of image caption interleaved image text and Texton data is crucial for achieving state-of-the-art few shot results across multiple Ben benchmarks the data mix and how you construct that feed it seems to be one of the most important parts in training multimodal models more insight into this is absolutely welcome as that is probably going to be one place where the open source Community can benefit most like General training recipes for these kinds of models because there's just not as much capacity to do large ablation grid search like a company as Apple has so very cool laag connects internet browsing to large language models so agent like interactions with websites by large language models you can see it continuously kind of asks you what to do or you can input prompts into that and it will interact with the website in the way that you want it so it's like sitting next to someone and instructing them how to do a certain task on a website except that the model interacts with the website for you I think this is pretty cool and is probably one of the main considerations in sort of building the autonomous agents that people thought of even 10 years ago like oh make a trip for me to the Bahamas and then it will navigate and book flights and so on once it can interact with a website like a human wood that is one of the core pieces in achieving that Google research releases chain of table this is an iteration of Chain of Thought in a way so the idea is if you have tabular data and you want to infer something from that tabular data sometimes it's not enough to just write sort of an SQL query for it for example this table here on the left hand side you can see that uh unfortunately the country code is like embedded in the name column and therefore it's not as easy to build a query right here so what chain of table does is it iteratively constructs additional columns to the table that are computed from other columns so it will add columns add headers as you can see right here do new columns so essentially it constructs intermediate tables on its way to achieving a goal and therefore can give better answers I think rather than just focusing on tables I think the inference here is that if you let or guide the such a language model to do intermediate steps and we've already seen this with like think step by step and yeah Chain of Thought and so on then it will more easily achieve its goal and that just the fact that it requires less sort of latent planning of the language model because just by saying things step by step or just by training it or prompting it to do extra columns in tables you are guiding it in the strategy on how to achieve a goal so thinking how to think is probably one of the things these language models aren't good at by themselves so giving them that structure so they just kind of need to fill in the content in this structure is of much Advantage CNBC writes alphabet shares up 4% on report apple is in talks to license Gemini AI for iPhones now this is someone set that someone said according to a Bloomberg report apple is in talks with alphabet on Google to let the iPhone maker license and build its Gemini AI engine into the iPhone people familiar with the matter I guess you know how people say forchan is just one website of a guy I called Anonymous I'm pretty sure all the news organizations just have one phone number that's called people familiar with the matter and there's just one guy like sending various tips to all the news organizations and making up stuff I don't know supposedly these news orgs kind of check the background of the people that where they have the info from but also they're really iffy on getting the next news story out and scooping everyone else so I'm not sure every time you read people from familiar with the matter I don't know I think of like this meme on the internet nobody knows you're a dog Google has released a blog post titled new ways were tackling spammy lowquality content on search this goes into the details of reducing spam search result reducing sort of targetlynx we uh do stuff in order to make stuff better we're tuning our ranking system to reduce unhelpful unoriginal content on search and keep it at very low levels yeah that's what a search engine is supposed to do I don't know why they released this blog probably somebody got some money for successfully completing a project and then making some press out of it but it's devoid of any information stability introduces stable video 3D which is based on stable video Fusion this is a model that could take a single picture and sort of make an orbital view around it yeah very cool as I said the Fay into more video based models text to video and so on or image to video all this kind of stuff is excellent and lastly coher announces coher embed V3 which is a another instance of an embedding model coher very much going into embedding models and obviously having Neils ryers on board is a big asset in that the new thing right here is that they support int 8 and binary embeddings and that allows you to save a lot on storage and memory that is required in order to hold these embeddings in memory so traditionally if you do fp32 embeddings then you use well they claim here you require like 2. 8 tbte of memory and you get H search quality but with one bit embeddings you are using oh look at that only 30 GB of memory and you're getting better search quality now I don't doubt the ability of coher to train good embeddings I mean some of these things are the trade-offs are chosen very carefully right for example here you have 3,000 dimensional embeddings at float 32 so probably people would quantize that now away and do some sort of dimensionality reduction so that would bring the memory needed here down significantly and we don't exactly know how much of a hit that would do to the search quality here if you kept the one bit embeddings at the same Dimension how much that would do and so on carefully chosen comparisons I'm they're absolutely true from what I can tell or at least I believe them but yeah in anyway coh here they do release these things as far as I know for free for research and if you want to use them commercially you'll have to give them some money all right that was it Monday over keep hydrated stay drinking goodbye

Другие видео автора — Yannic Kilcher

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник