OpenAssistant is Completed
11:49

OpenAssistant is Completed

Yannic Kilcher 24.10.2023 39 475 просмотров 1 941 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
#OpenAssistant LAION's OpenEmpathic: https://laion.ai/blog/open-empathic/ Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

hello there today I want to give you a bit of an update on what's going on with open Assistant about 10 months ago we launched this initiative to reproduce chat GPT in the open and to collect all the data necessary and it's been a great 10 months so far so where are we and what's going on I'm not really good at building up tension or anything like this so I'll give you the tldr in the front and then we'll go into stuff at the end the tldr is we're going to stop open Assistant we're going to put a line under it say we're done our mission is accomplished we've done what we've set out to do and now it's the time for other people to continue it has a bunch of reasons and I know people will disagree with this some people will be upset with this people wanted us to be the open AI the actual open Ai and so on um for us it's really good where we are right now and we don't want this to become a zombie project that just kind of carries on forever and there's just kind of a brand that at some point has achieved something great we want this to be the thing we achieved to be remembered for that and yeah to say now it's time for someone else so we'll go into the individual reasons a little bit but this is the short of the story now why have we decided on that when we started there was nothing there was chat GPT there was a paper from open AI that everyone thought could be how chat GPT was created but there was extremely little in terms of data especially open data model there was no knowledge around of how to do how to train these models well and so on and there was no infrastructure in place to do any of that and we came into that and we I want to say we pave the way in many ways we have built up a data collection infrastructure and collected data that will live forever we have the most ethical data set on the planet every single data point was contributed with full consent and full intent by the contributors and that's awesome and the fact that we have achieved that is absolutely mind-bogglingly amazing we've also trained models for sure we've also done some RL experiments and supervised fine-tuning and model mixing and even mixing with synthetic data trying out all of this stuff other people are going to train better models they already have right people are going to train bigger models better models people are going to make prettier chat interfaces than our chat interface so all of this was fleeting was experiment spouse which was cool but other people are good at it as well what only we could do was collect the data and so I'm very proud to say that we have made the absolute best of that momentum that existed at that time we have tried to not waste a single ounce of human effort in the data collection and I believe that's what we have achieved and that's what we want to be remembered us what people don't realize is that you know efforts like this require a lot of work and we are so grateful for the contributors but we're also extremely grateful for the volunteer human moderators that go after every flagged message you know cluster of down votes and so on uh this is giant amounts of effort and over time people move on and not only the moderator move on the programmers move on and the contributors move on and we'll just have to recognize that it was a special time when everyone was super excited we could all make time to some degree to come together and Achieve something that could only be achieved at that very moment and now frankly we're overloaded uh many people are moving on many people don't have time anymore and of all the people who complain that things aren moving fast enough even at this point um you don't have capacity either so we want we'd rather want to say okay we've done our job we've collected the data that data can be used from now until forever and we have various confirmations of big institutions that the data is really high quality uh so that data set will live forever and we'd rather say that's our work product and yeah leave it at that now we're obviously

Segment 2 (05:00 - 10:00)

going to release the data we've gotten so far and we're also going to try to release a lot of the chat data especially the ones with annotations that we've got because we also know people have put work into that and we again we don't want to waste a single amount of human work that we don't need to waste so we will release all of these things um hopefully soon we'll have to do cleaning and all that we did for the first data set and then we also going to uh send some goodies hopefully to the people who are high on the leaderboard list because these are like the real heroes is everyone who contributed code data moderation and whatnot we want to want to thank the top contributors there a little bit so if we can find you if we can you know contact you through the data that we have on the website well we'll reach out to you if you're if you happen to be on top of the leaderboard uh also our paper got accepted to the data sets and Benchmark tracks of NPS and that's also very cool if you have contributed data and if you want to be acknowledged uh I invite you to go to the account section so here you can click on your name you can go to your account and then there is a page that registers your name of you know being acknowledged in that as well we'll keep that up to date in the coming um in the coming time so go there uh tick the check boox and say what you want to be acknowledged as so lastly we've turned off the chat interface and you might have seen that and that's what probably most people uh noticed first that the chats don't work anymore the chats and by the way also the data collection platform they are distributed systems that are a nightmare to keep running so these are real systems that run on Hardware that is faulty people do all kinds of inputs people bought the things people abuse it um Hardware fails the backend workers on the gpus fail and so on and all of this has to be kept up and running um redis will overflow all kinds of things backups need to be made this all requires active human effort and as I said we are overloaded and we no longer have the capacity to do that the chat system is fully open source you know you can use it we feel when we started there was nothing now there is tons of stuff there are hundreds of places where you can try out chat models even our chat models so on hugging chat um they will give you the opportunity to chat with these things so it's more accessible than ever we don't feel it's necessarily our purpose to provide yet another chat interface especially if we don't have the capacity to keep it up on top of that interface was actually partially sponsored by corporate sponsors but also partially maintained by contributors to the open Assistant project giving their personal money into that and you know it was really good to offer these models when they first came out because people wanted to try them out but now there's so much opportunity everywhere um that's it's not necessary anymore at least that is our feeling so we wanted to start a revolution I think that's what our front page said and we did we got the open source space or at least we had a small hand in getting the open source space the ball rolling uh we collected data that no one else could collect that's available now and now the ecosystem is sprawling and More Alive than ever and yeah we'll want to hand it off to other people you know now in the open source space there's not just a few hackers around and a few people doing things there is there are giant Tech corporations like meta meddling in the open source space there are even universities from the Arab world which are backed by entire nation states right so to to sort of uh trying in the model train space and so on compete with that is ridiculous right as far as the data collection platform goes as I said this also requires a lot of human labor to keep it up and to keep it running moderated and we're looking currently into maybe giving it up uh for adoption or something like this but um you know we'd rather have people take the code and whatnot and pull up their own thing uh rather you know

Segment 3 (10:00 - 11:00)

because we think there's so much things to do on that hand I want to Lion has a few follow-up projects one is called open empathic uh that deals around data collection for empathic conversations so if you want to contribute to that you can something now that open assistant is shutting down uh that would be a very good place so in total I know it's a bit um we have to say it with like one tear in an eye but I feel mission accomplished we have set out to collect the data necessary to train an open source chat gbt we have set out to train these models and to make progress and to push a revolution in open- Source instruction models and that's exactly what we've done and the space is more alive than ever so we feel we've done our part and with the help of all of you which we're extremely thankful for I hope everyone can let sort of Pat themselves on the back for this and you know let's do new things let's do the next cool thing um let's keep pushing and let's drive open source forward I hope this was at least a bit informative if you're upset I understand um I hope you can understand a bit from our side I will be hosting an AMA uh probably next week on the YouTube channel for everyone to come and ask their questions about this and uh you know hopefully uh figure out how to go forward yeah that's it uh thank you for contributing and I'll see you around bye-bye

Другие видео автора — Yannic Kilcher

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник