# Recurrent Neural Network Writes Sentences About Images | Two Minute Papers #23

## Метаданные

- **Канал:** Two Minute Papers
- **YouTube:** https://www.youtube.com/watch?v=e-WB4lfg30M
- **Дата:** 07.11.2015
- **Длительность:** 2:41
- **Просмотры:** 26,662
- **Источник:** https://ekstraktznaniy.ru/video/14924

## Описание

This technique is a combination of two powerful machine learning algorithms:
- convolutional neural networks are excellent at image classification, i.e., finding out what is seen on an input image,
- recurrent neural networks that are capable of processing a sequence of inputs and outputs, therefore it can create sentences of what is seen on the image.

Combining these two techniques makes it possible for a computer to describe in a sentence what is seen on an input image. 

_____________________

The paper "Deep Visual-Semantic Alignments for Generating Image Descriptions" is available here:
http://cs.stanford.edu/people/karpathy/deepimagesent/

A gallery with more results with the same algorithm:
http://cs.stanford.edu/people/karpathy/deepimagesent/generationdemo/

You can train your own convolutional neural network here:
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html

The source code for the project is now available here:
https://github.com/karpathy/neuraltalk2



## Транскрипт

### Segment 1 (00:00 - 02:00) []

dear fellow Scholars this is two-minute papers with Caro neural networks can be used to learn a variety of things for instance to classify images which means that we'd like to find out what breed the dog is that we see on the image this work uses a combination of two techniques a neural network variant that is more adapted to the visual mechanisms of humans and is therefore very suitable for processing and classifying images this variant we call a convolutional neural network here's is a great web application where you can interactively train your own network and see how it improves at recognizing different things this is a data set where the algorithm tries to guess which class these smudgy images are from if trained for long enough it can achieve a classification accuracy of around 80% the current state-of-the-art in research is about 90% which is just 4% off of humans who have performed the same classification this is already Insanity we could be done right here but let's put this on steroids as you remember from an earlier episode sentences are not one thing but they are a sequence of words therefore they can be created by recurrent neural networks now I hope you see where this is going we have images as an input and sentences as an output this means that we have an algorithm that is able to look at any image and summarize what is being seen on the image buckle up because you're going to see some Wicked results it can not only recognize the construction worker it knows that he's in a safety vest and is currently working on the road it can also recognize that a man is in the act of throwing a ball a black and white dog jumps over a bar it is not at all trivial for an algorithm to know what over and under means because it is only looking at a 2D image that is the representation of the 3D world around us and there are of course hilarious failure cases well a baseball bat well close enough there's a very entertaining web demo with the algorithm and all kinds of goodies that are linked in the description box check them out the bottom line is that what we thought was science fiction 5 years ago is now reality in machine learning research and based on how fast this field is advancing we know that we're still only scratching the surface thanks for watching and I'll see you next time oh and before you go you can now be a part of two minute papers and support the series on patreon a video with more details is coming soon until then just click on the link on the screen if you're interested thank you
