# What is GPT-3 and how does it work? | A Quick Review

## Метаданные

- **Канал:** AssemblyAI
- **YouTube:** https://www.youtube.com/watch?v=xB6hZwYsV2c
- **Дата:** 27.12.2021
- **Длительность:** 4:59
- **Просмотры:** 19,477
- **Источник:** https://ekstraktznaniy.ru/video/13282

## Описание

You probably have heard of GPT-3 and how it is a fascinating development. But have you learned why or how GPT-3 managed to impress so many people?

In this video, we will learn why GPT-3 is so unique, and how it manages to help bring in a new wave of excitement for AI. On top of this, we will also briefly look under the hood of GPT-3 to understand its architecture and some of its potential dangers.

Want to give AssemblyAI’s automatic speech-to-text transcription API a try? Get your free API token here 👇
https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_12

Apps made with GPT-3: https://gpt3demo.com/

B-roll credits:
Video by Julia M Cameron (https://www.pexels.com/@julia-m-cameron) from Pexels
Video by Jack Sparrow (https://www.pexels.com/@jack-sparrow) from Pexels

## Транскрипт

### Intro []

if you're into deep learning or generally artificial intelligence it's very likely that you've heard of gpt3 and even if you haven't heard of it you might have used a tool that was built with tpd3 but what is it and how does it work let's learn that in this video gpt3 is a

### Overview [0:16]

language model that was developed by openai what's special about it is that it can perform really well on multiple nlp tasks and that it is very big it is based on a very big transformer network it is the biggest dense network there is in fact with 175 billion parameters you might ask what is a language model well a language model is basically a probabilistic model that is able to guess what the next word should be in a sentence what makes gpt 3 unique is a lack of fine tuning you might say there are multiple models that you can make that will guess the next word in a sentence but that's not the only thing that gpt 3 does the network is only trained on the task of guessing what the next word should be but in this manner it learns how the language works and as a result then it is able to perform many other nlp related tasks in this way it works kind of similar to how a human learns once you know a language you can translate it to another language you can understand what the next word in a sentence should be or you can fill in the blanks of a sentence the dominant approach to nlp tasks before gpt 3 was to fine-tune certain models based on certain tasks and in certain tasks state-of-the-art fine-tuned models still work very well even better than gpt3 but gpt3 performs much better when it comes to machine translation filling in the blanks and question answering and this was a very

### How does it work [1:44]

big deal in the world of artificial intelligence because it basically proved that a very big unsupervised model can match or even surpass the performance of fine-tuned models alright so now let's look into how the gpd3 works and how it learns so the architecture is very similar to the transformers architecture that we learned if you don't remember it or if you haven't watched that video go ahead and check the video we made on transformers to learn more about how they work and what their architecture is the main difference between the transformers architecture and the gpt3 architecture or the general gpt architecture openai came out with is what part of the transformers it uses if you remember a transformers architecture has an encoder and a decoder whereas for gpt3 the only thing they use is decoder blocks another difference is that in the transformers architecture in the decoders we have a mask self-attention layer another encoder decoder attention layer and a feed-forward neural network and in between we have some layer normalizations with gpt3 however inside the decoder blocks we only have a mask self-attention layer and the feed-forward neural network layer so basically they got rid of the encoder decoder self-attention layer on top of this different locations for the layer normalization was tried with the gpt architecture inside the decoder block and with gpd3 they also introduced alternating dense and sparse self-attention layers to create gpt3 these layers of decoders were trained on 300 billion tokens being either words or parts of words and this data was collected from the internet and also books a very strong model like this comes with some drawbacks of course main ones being unwanted bias that was inside the data that you collected against minority groups another one is the environmental impact of training a model as big as gpt-3 and the potential abuse of the system by creating fake articles or fake news but even though there are

### Conclusion [3:46]

drawbacks with models like these they also still benefit humanity gpt3 has already been used as base of tools and companies that help people in their everyday life some examples of these are creative writing code generation content creation or customer service for example there will be a link in the description to a web page with a comprehensive list of all the tools and companies that are built on gpt3 and their different domains gpt3 is licensed by microsoft to be used exclusively in the core of it but if you want to use it to develop apps you can now have access to it in a public way using the open ai's api but of course when you're creating the apps you have to abide by their ground rules overall this is a very exciting development for the world of ai and the world generally if you want to learn more about the latest developments in ai and techniques that i use in these technologies don't forget to subscribe to our channel so you can be one of the first people to know when we release a new video don't forget to give us a like for this video if you liked it and leave a comment with your thoughts or your questions we would be delighted to see them but for now have a nice day and i'll see you around