How Top 1% Data Scientists Are Using AI
10:54

How Top 1% Data Scientists Are Using AI

Thu Vu 05.05.2026 4 312 просмотров 222 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Recommended AI courses to help you level up as top 1% data professionals: DataCamp AI Engineer for Data Scientists Track - https://datacamp.pxf.io/7XJOYg?sharedID=yout_int_aieng-track19_apr26 DataCamp AI Engineer for Developers Track - https://datacamp.pxf.io/ZVyKgX?sharedID=yout_int_aieng-track19_apr26 📩 Get my FREE weekly AI & data insights 👉 https://thu-vu.ck.page/49c5ee08f6 🌟 Master Python and Build Awesome AI Projects 👉 https://python-course-earlybird.framer.website/ 🔑 TIMESTAMPS ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 0:00 - How DS is changing 1:07 - PandasAI for EDA 2:48 - Data Formulator for EDA 4:01 - AI Engineer Tracks (DataCamp) 5:00 - Creating synthetic data 5:57 - Using foundation models 7:25 - Automating workflows with MCP 8:34 - Building with AI 10:12 - Conclusions - What really matters longterm #datacience #ai #ThuVu

Оглавление (9 сегментов)

How DS is changing

Data science jobs are not dead, but they are changing fast. And the top 1% of data scientists are the ones who know how to adapt. Data scientist is the top job title referencing generative AI. So that means that companies don't just need data talent. They need data talent that knows AI. So what does know AI actually mean? Well, I found that it's actually twofold. Firstly, you know how to leverage AI to speed up your existing workflow. Not just in a way of prompting chatbitt to write Python code kind of way, but it's more about rethinking the way you get things done from generating or collecting data, exploratory data analysis, modeling, reporting, and communication. Secondly, you know how to build AI systems. For example, creating and deploying tools and agents to automate part of or an entire process or workflow for your team. Now, let's first talk about different ways to leverage AI to speed up a data science workflow. I'll talk about a few specific tools in this video, but I want to be clear that tools isn't the point because tools can change. What matters is knowing what's now possible so you can think about how to adapt and rethink the way you're

PandasAI for EDA

working. So, the first major way data scientists can leverage AI is in EDA and data prep-processing. If you've worked in a real data science project before, you probably already experienced this. Too much time spent preparing data, not enough time analyzing it. You got the job thinking you spent all day finding data patterns, coming up with hypothesis and interesting analysis. In reality, you spent 60 to 70% of your time cleaning data, writing repetitive code to check data types, handling missing values, or creating standard visualizations. Here are two open source tools I've come across that can make your work a little bit easier and more fun. The first one is Pandanda's AI. It's a Python library that basically allows you to ask questions to your data frame in natural language. For example, what's the total sales per coffee type? Depending on your question, it can return different kind of responses like a string, a data frame, a chart, or a number. It's totally open source and you can hook it up with any LLM you like. And I'm connecting Pandas AI with an LLM from OpenAI here. Now, if you're wondering if your whole data set would be sent to an LLM or not, the answer is no. When you ask Pandas AI a question, it actually sends just enough context. So your question along with the column names, a few sample rows, and some meta data. The LM reads that context and writes a small Python script to answer your question. Pandas AI then checks that the script is safe to run and then it runs the script locally on your actual data using pandas. So if you want to get away with writing pandas and mplot lip codes yourself, you can use this library to do quick analysis, data cleaning and data manipulation. But for more complex tasks like joining data frames or creating correlation matrix, you should still use pandas. The second

Data Formulator for EDA

tool that I think is pretty cool is data formulator. This one comes out of Microsoft research and the easiest way to think about it is the combination of Excel formulas, power query and modern AI. It also allows you to interact with your data in natural language. But the nice part is the tool shows you exactly how the data is being transformed and visualized step by step. Nothing hidden which means the workflow stays transparent and reusable. Give it a broad prompt like show me interesting trends in this data set and will start transforming the data generating visualizations and suggesting directions you can explore further. This tool is also open source and you can install it locally. So in your workflow, if you happen to spend a lot of time in the Microsoft ecosystem like Excel, Power Query, PowerBI, it's definitely worth experimenting with this tool. Now, with these kind of tools, you still need to understand your data. But they can drastically speed up the exploration phase where you're just trying to understand what you're working with before you get into building the actual pipeline and the real modeling work. Now, as a top 1% data scientist, you don't just limit yourself to tools that are already available. you also know how to build AI tools and systems to support

AI Engineer Tracks (DataCamp)

your team. So, I want to quickly shout out to Data Cam, who sponsors this part of the video. I used Data Cam back when I was learning Python for work. What I really like is that it's hands-on. You actually write code, solve exercises, and build projects right in your browser. They're offering some solid tracks for data scientists to level up your skills. The first one I'd recommend is associate AI engineer for data scientist dist track that covers psych learn pytorch hugging face building apps with lang chain and mlops basics to help you take models into production if you're more on application building side their associate AI engineer for developers track covers the more practical stack so open AI API prompt engineering embeddings pyon and langchen they also offer an AI engineer for data scientist certification if you want a credential show you can put on LinkedIn. If you want to level up your skills in AI with these tracks, check the links in the description. Okay. The second way data scientists can leverage AI is creating synthetic data sets for

Creating synthetic data

prototyping. Getting access to real and highquality data is a problem every data scientist has faced. Probably the data is locked behind privacy constraints, doesn't exist yet, or you just want to test an idea quickly. So instead of waiting, many data scientists use AI to create synthetic data sets. whether they are structured or unstructured data. LMS have become so good that they can actually generate artificial data that mimics real world patterns. In fact, synthetic data is widely used in privacy sensitive industries like healthcare, finance, where real data can't always be shared freely. For example, I used claw to generate this whole e-commerce data set of 5,000 rows with customers, products, order items. You just describe what you need, the structure, the number of rows, and it gives you something you can actually work with in minutes. Similarly, you can also generate unstructured text data for testing a model training pipeline. Now, here's

Using foundation models

another interesting way you can use AI that a lot of data scientists are sleeping on. That is using foundation models. Foundation models are those that are trained on massive and varied data sets. They are different from traditional machine learning models that are customuilt for specific tasks. Today, foundation models are no longer just for chat bots and image generation. They're starting to show up in core data science work like time series forecasting, tabular prediction, and even recommendation systems. Let's say you're building a time series forecasting model. You usually collect your data and clean it and engineer your features and then pick a model and train it and fine-tune it and evaluate it. Now the problem is if your company wants forecasts across 200 product lines, you would normally have to repeat that process over and over. But nowadays you can use foundation models like Kronos by Amazon and Time GPT. So here's an example where I tried one of those models. You just feed your data and get a forecast. No feature engineering, no model selection, no training loop. This approach is really changing the whole modeling workflow. Netflix recently replaced a whole stack of separate recommendation models with one foundation model trained on billions of user interactions. And now every team just fine-tunes on top of it. The data scientists pulling ahead right now are the ones who know when to build from scratch and when to build on top of something that already works. Next

Automating workflows with MCP

let's talk about how a data scientist can automate workflows with MCP model context protocol. Think about your typical day. You're jumping between five different apps. You're quering databases, pushing code to GitHub, sharing updates on Slack, pulling some data that your colleague shared on Google Drive. That constant back and forth burns your energy. So, the idea of using MCP is that it lets you connect all of those tools to an AI assistant like Claude. So instead of switching between apps, you're doing everything from one place. For example, I connected claude desktop with my Postgress SQL database through MCP and now instead of opening a SQL client, writing a query, running it, copying a results, I just ask cla for example, show me the top 10 customers by revenue this month. And I get my answer right there. Now, quick heads up. Not every employer allows tools like claw desktop or clot code due to security policies. But even if you can't use it at work yet, it's useful, I think, to explore it personally because I think sooner or later, this kind of integrated tools will become the standard because it removes a lot of friction in the day-to-day work. Now

Building with AI

beyond just using AI tools to speed up your work, the data scientists who are really standing out right now are the ones who can actually build things with AI systems that solve real business problems for their teams and companies. From my conversations with all of my data scientist friends, their jobs are less and less about prototyping a model in a notebook. It's more and more about delivering something people can actually use. For example, imagine your finance team processes hundreds of invoices every week manually. You could build an AI system that read those invoices, extracts the key information, and saves it straight to your database. Or say your company wants to understand patterns across thousands of customer interactions or research documents. You could build a knowledge graph that maps out how everything connects. That way, you uncover insights that are impossible to find with traditional analysis methods. I also have a tutorial on this. I'll link here somewhere on the screen. These are the kinds of projects that make you invaluable at a company these days. Also, there's a real gap that most data scientists need to close right now. Myself included. A lot of data scientists can build a great prototype, but they can't get it into production. They can't deploy it, monitor it, or make it reliable enough for other people to depend on. And to close that gap, you need to start learning things like how to containerize your application with Docker so it runs anywhere, how to deploy and serve your models on cloud platforms and how to monitor and maintain AI systems once they are live. These skills are more engineering skills, AI engineering specifically, but they're becoming essential for data scientists as well. if you want to help your team build AI systems. Now, we've

Conclusions - What really matters longterm

talked about a lot of different tools and hard skills, but I think eventually data professionals are increasingly about judgment and influence. Understanding how business works, how your company makes money, domain knowledge, stakeholder management, trust building, communication, and data storytelling. These soft skills in the long term are becoming the core of what you do right now. I think one thing that's going to pay off dividends for you is to stay open to trying out new tools, learning them, using them, questioning them, adapting them for your own need. I run a free newsletter where I share my latest insights and experiments in data science and AI. So, if you're interested, check it out in the description below. Thank you for watching. Bye-bye.

Другие видео автора — Thu Vu

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник