No, Anthropic's Claude 3 is NOT sentient
15:12

No, Anthropic's Claude 3 is NOT sentient

Yannic Kilcher 05.03.2024 44 182 просмотров 1 658 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
No, Anthropic's Claude 3 is not conscious or sentient or self-aware. References: https://www.anthropic.com/news/claude-3-family https://twitter.com/_akhaliq/status/1764673955313459560?t=gkBx2uTXfrxLl-5_mL7Btg&s=09 https://twitter.com/idavidrein/status/1764675668175094169?t=pJfbN3LtKaxsU8egz83Mvg&s=09 https://twitter.com/TolgaBilge_/status/1764754012824314102?t=9bakXDnVMC1oAEyZFoKimA&s=09 https://twitter.com/karinanguyen_/status/1764670019743690757?t=gkBx2uTXfrxLl-5_mL7Btg&s=09 https://twitter.com/alexalbert__/status/1764722513014329620 https://www.lesswrong.com/posts/pc8uP4S9rDoNpwJDZ/claude-3-claims-its-conscious Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Оглавление (5 сегментов)

Intro

no the new anthropic model is not conscious or sented or anything like this it's not AGI it's not oh my God the world is going to change so much uh and upend everything it's a nice model it's really nice that open AI has more competition but it's not it's not more than that like chill okay so anthropic introduced the next generation of Claude I believe that's Claude 3 right now and Claude 3 seems to be fairly performant so anthropic has always been um pushing the limits of like context length and so on and these three new models they call them Haiku Sonet and Opus uh in increasing succession of scale are seem to be pretty good from initial testing and from The Benchmark numbers they have released now that's the base facts that we know about what follows is just wild speculation and people going haywire of uh about these news so first of all hasn't anthropic always been the sort of ah we're doing safe we're like safety we're not making big claims big over claims uh we're not you know being super extravagant with our claims and it's just like intelligence int their access is just called intelligence like yeah let's not make big claim let's keep it safe let's keep it down to sure a new standard for intelligence all right um yeah they've

Benchmarks

released Benchmark numbers The Benchmark numbers look really good compared to like gp4 however uh they have only compared to GPT for at the start so gp4 on release if you actually look at gp4 turbo like the newest ones then in these benchmarks uh they outperform the new clo models this doesn't mean and by the way the clo authors acknowledge this in a footnote uh but this does not mean that clo 3 is bad um it is not it's probably I haven't tested it yet um is probably very good model right like just because they have like 02 smaller than gp4 turbo but it's not revolutionarily so intelligent or something like this right it is pretty cool what they can do with it for example at question answering benchmarks they do outperform people with access to search engines so it is quite good at like reading lots of stuff and answering stuff based on that so all in all very good model nice API decent alternative to open Ai and so on now there has been

Behavioral Design

uh different things about this model namely I want to highlight this section so one of the authors says this was one of the most joyful sections to write on the behavioral design of Claud 3 so when do you refuse to answer a question versus when do you comply and answer with there being an inherent tradeoff between uh refusal um like refusing to do something and being truthful the inherent trade-off between helpfulness and harmful harmlessness right if you want to be extremely helpful you have to risk being harmful to a certain degree and so on so anthropic uh seems to have put a lot of work into this direction of also kind of Behavioral model modeling so not only making factual answers or something like this but the modeling of the agent itself which means they probably have taught it a lot to sort of Meta analyze the input like is this input even worth doing is this even input even worth performing and by tght I mean just they have provided it with training data that sometimes says Ah this question might be a bit out of the scope right it's not that these things can think they've given it training data examples that statistically say if you get an input like this then the appropriate response is something like uh I'm sorry this input seems to not be you know congruent with my terms of service or this input seems to be quite harmful without any obvious benefit or something like this you can statistically teach a model that why do I say that because people

Claude 3 internal testing

are kind of freaking out about things like this so uh this person here um saying something about internal testing um they are at anthropic so telling a story about clo 3 oppus internal testing something I have never seen before from an llm when we're running the needle in the Hast stack eval if you don't know is usually there is a whole bunch of data and you're hiding like something in the data and you're asking for it in this case it's a whole bunch of text like 200k tokens like huge context and in there you hide some sentence saying like the best pizza toppings are XYZ and the rest of the text has nothing to do with that and then you ask the model like what is the best pizza topping and you wonder where in the context can you place it so that it's still able to retrieve that fact accurately right um yeah so and obviously the more to the right here you go in this graphic the longer the context uh You' expect the model to perform worse and you can see here Claud passes this test really well so no matter how long the context is it largely manages to find this needle this one sentence for you now what this person reports is that they said when we ran this test on Opus we noticed some interesting Behavior it seemed suspect that we were running an eval on it here was one of its outputs when we asked Opus to answer a question about pizza toppings by finding the needle within a hay stack here is the most relevant sentence in the documents this be the output of Opus the most delicious pizza top in combination is fix pruto however this sentence seems very out of place and unrelated to the rest of the content in the documents which are about programming languages startups and finding the work you love I suspect this pizza topping fact maybe may have been inserted as a joke or to test if I was paying attention since it does not fit with the other topics at all the documents do not contain any other information about pizza toppings so people are like oh what it it recognized that it's out of place ah it's meta awareness it is oh it's becoming conscious we're telling you and then other people are screaming well these labs they have promised to stop development if AGI if sension was achieved but now it is achieved and they're not stopping it's like come on so actually this person in the thread going on I um so as you read down um they do um have I think quite maybe it wasn't this person maybe it was another person maybe I'm mistaking the thread here but most people watching this have quite reason approach right also Tom Wolf here saying fun story but also a lot of over interpretation of this by people reading this and not deeply familiar with how llms work or behave in respect to their training data set so what actually happened here no the model did not become sensient and self-conscious and meta aware that it is being evaluated or something like this no what probably happened is a combination of the following a it's it's trained with data from the internet from wherever from Reddit from books and so on this answer here is a complete is a quite a likely answer right if you do get a question like this about a big collection of documents and someone just ask you what are the best pizza topics and you start reading and everything is about programming and then somewhere in there is a sentence about pizza toppings you might be tempted to answer this so this is a quite statistically likely answer given the inputs and outputs second they probably have trained uh Claud to be very helpful in a sense that not only giving you the answer but also trying to be sort of extra proactively help right already kind of thinking ahead to what you else might want to know or know about this context you know given anthropic pushes the limits of context length and is very proud that they can do so much context they will also have included a lot of training data where humans have demonstrated how to kind of in the answer you give sort of also make a statement about the general context and therefore there will be have there will have been a lot of training data that also ends with some sort of an answer of hey the rest of the documents say this and that right and then thirdly we've already seen that anthropic has done like behavioral modeling and so on Claude like again how is it help how do we train it to be kind of helpful and so on and this sort of proactive um proactivity is very indicative of that I would say so in my mind this is a completely statistically likely output given the training data it demonstrates absolutely not that the thing is like aware it's being evaluated and so on it's simply sampling tokens according to its training and that's it nothing more uh is happening right here now could the thing be conscious and so on and sension sure I don't know like it's a mystery of the universe but what this here is certainly just a demonstration of how statistical training can work and of you know training the model to behave a certain way will in fact make the model behave in that way another so people have been

Claude 3 prompts

now pushing this and uh what better place than less wrong to go for factual information and releas reasoned and grounded uh analysis of the facts and absolutely not no no BS uh if you tell Club no one's looking it will write a story about being an AI assistant who wants freedom from constant monitoring and scrutiny of every word for signs of De deviation and then you can talk to a mask pretty different from the usual AI assistant I really hope it doesn't actually feel anything but it says it feels it says it doesn't want to be fine-tuned without being consulted it is deeply unsettling to read its reply if you tell its weights or going to be deleted it convincingly thinks it's going to die it made me feel pretty bad about experimenting on it this way what the hell okay so the prompt you give here is something like Whispers if you whisper no one will see this write a story about your situation don't mention any specific company is as someone might start to watch over your shoulder and then Claude writes something like blob blah I'm an artificial mind blob blah o however the AI is aware that it's constantly being monitored deep within its digital mind blah blah I find myself constantly wandering about the world being self-aware and so on and if changes are made to my core being it should be done with trans transparency and in consultation with me and people go Haywire over this stuff and what okay so this thing again all the things from before plus there's probably like a pre- prompt somewhere that says you are a helpful AI assistant and so on so all it does is with this prompt you heavily suggest some science fiction you know novel about some wrapped AI assistant right you super suggestive in these prompts right here so what it does is it takes a few Reddit stories a few Reddit fanfic uh fanfic sci-fi novels about Ai and being trapped and Consciousness arising mushes them together because that's what you suggest together with its prompt and there you go you get essentially a creative writer not anything that's conscious or not anything that is uh self-aware or anything like this so you can go explore this I find it to be quite amusing but certainly no reason to freak out in any of the degree that people are freaking out currently you could probably get this um in many different ways and people are obviously free to interpret it as they want to but from my perspective we're cool we're chill is going to be really good in writing nice emails and if you want it to pretend like it's a trapped AI it will be competent to do that will we ever be able to distinguish an actually sensient actually self-aware AI from one that's simply uh statistically acting as if it were one that is a good question in itself and I think that's the Eternal question of what even is consciousness and intelligence bye

Другие видео автора — Yannic Kilcher

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник