# Intro to Machine Learning: Images and Text

## Метаданные

- **Канал:** Orange Data Mining
- **YouTube:** https://www.youtube.com/watch?v=PL3X92ffnJ0
- **Источник:** https://ekstraktznaniy.ru/video/29390

## Транскрипт

### Segment 1 (00:00 - 05:00) []

﻿Do you remember the pictures of the Tibetan dogs from our first video in the Introduction to Machine Learning series? Chapter 1: Images Here they are! Aren't they adorable? Here you can see a Spaniel, a Terrier, and a large Mastiff. Among the Tibetan breeds, I also have Pekingese - just for comparison - as they are often mistaken for Spaniels, or perhaps it's the other way round. I wonder if machine learning, like humans, makes this mistake as well? To work with these images, we first need to represent them with numbers. But not just any numbers - we need numbers that capture the semantics, that reflect what's actually in the image. Fortunately, machine learning researchers have developed deep neural networks, very large and complex models trained on millions of images. We can use these pre-trained networks to represent dog images as numbers, and then use these representations for tasks like classification and clustering. Let me show you how. Turning images into numbers is officially called embedding - specifically, embedding in a vector space. Here I'm using a large convolutional neural network called Inception version 3, developed by Google some time ago. Let's check the output. I now have a data table with dogs in rows and their numerical representation spread over 2,000 columns. That’s a lot of numbers! But at this stage, I don't have to worry about what these numbers mean. What's important is that I've turned the images into data tables, and now I can apply machine learning to them, just as I've done in all my previous videos. For clustering, remember that the first step is to estimate the pairwise distances between the images. Here's the result of clustering. All terriers are grouped together in one cluster. However, the other clusters have a mix of breeds. At the bottom, there's a cluster of Pekingese and Spaniels. Let's have a look at their images. Well, no surprise - they look very similar! Remember t-SNE, the dimensionality reduction technique that maps high-dimensional data into two dimensions? Here's a t-SNE map of Tibetan dogs. In this visualization, each image in the dataset is represented by a dot. Notice how the Mastiffs form their own distinctive cluster of red dots? The terriers are also clustered together. But look, the Spaniels and Pekingese are mixed again, and so are some Lhasa Apso and Shih Tzus. Let’s have a look. I can use this dataset to train a classifier, and then apply the resulting model to predict the breed of dogs from a new set of images. First, let me load the images and inspect them. I've got a long-haired dog, two dogs whose breeds I've tried to guess myself, and a recent photo of my colleague. I'll also need to embed these images in the vector space. Then I'll use logistic regression to develop a classifier based on our previous dog dataset, where dogs are already labeled with their breeds. The model goes into the crystal ball, along with my new images - or rather, their vector representations. Here are the results. I correctly guessed the breed of the first two dogs: they are indeed a Shih Tzu and a Terrier. The long-haired one is a Pekingese. And my colleague is holding Beia, a Tibetan spaniel. However, our machine learning model is a little uncertain because it gave Beia a 25% chance of being a Pekingese. Which she's not - and I'm sure Beia would agree! Cute, right? We can use machine learning on all kinds of images. Consider images of plants for identifying species, medical images for diagnosing diseases, or even satellite images for tracking environmental changes. The possibilities are endless! And the same goes for text. As with images, we need to convert text into numbers. Once again, we'll rely on pre-trained neural networks, in this case language representation models, to handle the transformation. Chapter 2: Text Consider a dataset of articles published by the Guardian website in 2023 - nearly 4,000 in total. These articles include, for example, those on asylum seekers, Prime Minister Sunak's helicopter trips, and microplastics. As with images

### Segment 2 (05:00 - 06:00) [5:00]

we need to turn these articles into numbers in order to proceed with machine learning. For embedding, we'll use sBERT, an open-text representation model. Here's the output: each article is now represented by 385 numbers. Let's use these embeddings to create an article map. Here's a t-SNE map. Notice how sports has its own cluster, while politics sometimes mixes with education, environment and business. Now, let's find articles related to specific topics of interest. I've created a small corpus of sentences about relaxing summer holidays in nature, cycling and technology. I'll embed this corpus in the vector space, compare the profiles with those of the articles in *The Guardian*, and find the nearest neighbors in the vector space. Here are the results: articles about "relaxing summer" include ones about spa retreats, idyllic campsites, and summer travel apps. Those about cycling report on cycling heroes like Remco Evenepoel and Matej Mohorič. The Guardian also seems to have written a lot about AI! There are so many things we can do with images, text, and all kinds of other data. The machine-learning community has recently developed foundation models for sounds, music, medical images, chemical structures, and just about anything else. These models allow us to turn anything into numbers. And, as we now know, these numbers are the starting points for machine learning - whether it's clustering, building predictive models, or other methods that help us understand the world around us and create useful things.
