Master Data Analysis with ChatGPT (in just 12 minutes)
11:54

Master Data Analysis with ChatGPT (in just 12 minutes)

Jeff Su 15.07.2025 329 191 просмотров 9 982 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
➡️ Coursera Data Analysis course (40% off for 3 months): https://imp.i384100.net/c/2464514/3102764/14726 Learn how to analyze any dataset in minutes using #ChatGPT and the proven DIG framework. This practical guide shows you how to turn ChatGPT into your personal data analyst without any technical skills required. Perfect for professionals who work with spreadsheets but lack formal data analysis training! *TIMESTAMPS* 00:00 ChatGPT for Data Analysis 00:45 The DIG Data Analysis Framework 01:49 Step 1: Description 05:31 Step 2: Introspection 09:16 Step 3: Goal Setting 10:55 Bonus Prompt *RESOURCES MENTIONED* DIG Framework prompts: https://jeffsu.notion.site/184-data-analysis-resources Apple TV+ sample dataset: https://jeffsu.notion.site/184-data-analysis-resources https://www.jeffsu.org/newsletter/?utm_source=youtube&utm_medium=video&utm_campaign=184 ChatGPT Pro Tips video: https://youtu.be/p3840QxlYzc *BUILD A POWERFUL WORKFLOW* 📈 The Workspace Academy - https://academy.jeffsu.org/workspace-academy?utm_source=youtube&utm_medium=video&utm_campaign=184 ✍️ My Notion Command Center - https://www.pressplay.cc/link/s/DE1C4C50 *BE MY FRIEND:* 📧 Subscribe to my newsletter - https://www.jeffsu.org/newsletter/?utm_source=youtube&utm_medium=video&utm_campaign=description 📸 Instagram - https://instagram.com/j.sushie 🤝 LinkedIn - https://www.linkedin.com/in/jsu05/ *MY FAVORITE GEAR* 🎬 My YouTube Gear - https://www.jeffsu.org/yt-gear/ 🎒 Everyday Carry - https://www.jeffsu.org/my-edc/ #dataanalysis

Оглавление (6 сегментов)

  1. 0:00 ChatGPT for Data Analysis 148 сл.
  2. 0:45 The DIG Data Analysis Framework 210 сл.
  3. 1:49 Step 1: Description 689 сл.
  4. 5:31 Step 2: Introspection 710 сл.
  5. 9:16 Step 3: Goal Setting 336 сл.
  6. 10:55 Bonus Prompt 178 сл.
0:00

ChatGPT for Data Analysis

Let's get straight to the point. All of us work with data regardless of our role. But very few of us were taught how to analyze data in a structured way. So in this video, I'll bridge that gap by sharing a simple three-step framework that basically turns chatbt into our personal data analyst with zero technical skills required. Let's get started. First, a bit of context. In all my past roles, management consultant, account manager, product marketing manager, I've had to work with data and present findings. Like most of you though, I've never received any formal training in data analysis. But I also knew AI could help. So after taking the top AI for data analysis course on Corsera, I learned that the key was giving Chachi PT a proven framework to follow and have the AI perform all the hard analytical tasks on our behalf.
0:45

The DIG Data Analysis Framework

Diving right in the framework I learned is called DIG description, introspection, and goal setting. And in a nutshell, by using chatbt to apply the dig framework to any data set, we're able to number one, understand data we've never seen before in a matter of minutes instead of hours. And number two, extract insights that we as non-data analysts would have missed. Here's a simple visualization. When you get handed a spreadsheet with no context, you're at 0% understanding. But with every dig prompt you input into chatbt, your understanding of the data increases. And by the end of the dig framework, you've uncovered insights that would have taken hours to find manually, if you found them at all. Two quick things before diving to a real case study. First, I'm using a free Apple TV Plus data set that you can download and follow along. And it's actually pretty cool to analyze real data from popular TV shows and movies like Avatar the Last Airbender, The Godfather, and Sherlock. Second, the industry standard framework is actually called EDA, exploratory data analysis. But I'm using dig in this video because one, the principles are the same. And second, the professor on Corsera, probably used dig because it's easier to
1:49

Step 1: Description

remember. Step one, description. Picture this hypothetical scenario. Your colleague, let's call him Tim Cookie, just rage quit for absolutely no reason and left you with a spreadsheet with zero context. At this point, we need ChachiBt to explain or describe what's in the file as quickly and effectively as possible. So let's open up ChachiBT, upload the data set, select the latest reasoning model, and start with a first description prompt. List all the columns in the attached spreadsheet and show me a sample of data from each column. The reason we start off with this prompt is because it forces chache to actually look at every single column in our data set and more importantly gives us a quick overview of the data we're working with. Looking at this output, I want to point out two things. First, having ChachiBT return just one sample output is much easier for us, the human to digest versus having to make sense of an entire spreadsheet we've never seen before. Second, the sample is selected as Forest Gump, a classic. Nice. And it returned all eight columns from the original spreadsheet. Great. But the release year is 994. 0 and there are two genres separated by a comma. So, this might represent issues for Chacht down the road. So, we want to make a note of that. And I'm actually not really sure what IMDb ID means. Does every show in T movie have a unique ID? I'm not sure. So I might want to follow up and confirm. Next description prompt number two. Oh, I'll link to all these prompts down below. By the way, uh take five more random samples of the data for each column to make sure you understand the format and type of information in each column. So why are we asking for more samples? Because that one sample we received might be an outlier and therefore misleading. Multiple samples help us spot inconsistencies. Looking at this output, we see there are TV and movies under type, but we knew that already. This TV show has three genres. Okay, this these movies have one. All right, and uh some of this is available in one country, others are available in multiple countries. Okay, so our understanding of the data set is increasing. Moving on to description prompt number three. Run a data quality check on each column. specifically look for missing or empty values, unexpected formats or data types, outliers or suspicious values. This is pretty self-explanatory. We want Chhatabt to explicitly tell us if there's anything weird about the data we should know about before proceeding with our analysis. Okay, there's several tables here. This first one tells us how many values are missing from each column. So, for the title column, we're missing 589 values uh representing 3. 1%. Going down, 10% is pretty high. Whoa. Okay. 99. 7. This number tells us that for the available countries column, we're missing 99. 7% of the values. I can double check really quickly by going into the raw data set and doing a quick filter or sort rather. Going to sort this and I scroll down. Yeah, most of these rows are completely empty for the available countries column. This means we should not perform any geographical analysis with this data set because we're missing that information. At this point, it should be pretty clear that first, although Chachi PT is not doing 100% of work for us, it's making our job as a human analyst much easier. Second, remember how the goal of the description step is for us to understand the data set as effectively as possible. I'm not going to waste your time here, but in real life, I would have at this point asked follow-up questions like, "Hey, what does this TT number mean here? " And chatbt would have confirmed it is the unique IMDb number. By the way, if you use Google Workspace tools at work, you might want to join my newsletter to receive an insanely actionable tip every week. Link down below. Next up, we have
5:31

Step 2: Introspection

introspection. And the purpose of this step is to have Chat JBT brainstorm questions it could answer with our data. This shows whether ChachiPT truly gets our data and often services insights we hadn't considered. Prompt one, tell me 10 interesting questions we could answer with this data set and explain why each would be valuable. And long story short, good questions mean CatchPT understands our data. Bad questions equal there's a misunderstanding that needs fixing before we proceed. These first three questions are solid. How has Apple TV's yearly output grown since launch? If we're putting out more TV shows and movies yearon year, it might mean we're capturing more market share. Uh, what share of releases are movies versus series each year? This might tell us about viewer behavior. Are we trending more towards TV shows or movies? I love this one. Which genres dominate the catalog and how have they shifted over time? Imagine you were on the Apple content team, right? You might want to invest more in the most popular genre next year. Or maybe additional analysis tells you the genre is oversaturated, so you want to pull back. Prompt two. For the first three questions, tell me exactly which columns you need to use and whether the current data is sufficient to answer it. This basically forces Chachbt to show its work and tells us whether or not we can perform these analyses. We see that for question one, yes, we just need to fix 0. 3% of non-numeric entries. We can ignore that. For question two, yes, we need to do some light data cleanup. That's fine. We can tell chatbt to do it. For number three, yes, we have all the information we need to perform the analysis. Awesome. Prompt three is my personal favorite for the introspection step. What questions do you think someone would want to ask about this data but we can't answer due to missing information. This basically surfaces gaps in our data set and helps us manage our boss's expectations about what insights we can uncover. Here we see questions like what's the most watched genre? We can't answer that because we don't have the viewing metrics or from an ROI perspective which genres deliver the best cost per hour of content. We don't have the production budget, revenue, or cost fields. But here's where it gets interesting. What if I had access to some of that data? Let's say my friends at Apple invited me into Apple Part and I hacked their servers. That was obviously a joke. Very unrealistic scenario. I don't really have friends. So, I created a fake second data set. And to be very clear, this is made up. Don't report me to Apple. uh with the IMDb ID in column A, total viewership in column B, and the total cost of producing that show or movie in column C. I can now upload this onto the same ChachiPT thread and say, I just received this data set from a colleague. Your task is to explore and explain the relationships between this new data set with the original one and how they might be used to join the data together. After running for a bit, ChachiP confirms we can use the IMDb ID field to join the two data sets together and even gives us suggestions on how we can use this newly merged table. For example, we can calculate the cost per viewer ROI by genre. For example, after instructing Chacht to merge the data sets using the IMDb ID field, it gives us a sample output of this newly merged spreadsheet. Right? We see that for Forest Gump, if I scroll all the way to the right, yes, it now has the total viewership and total cost data in its row along with everything else. And I can even click here to download this merge CSV file. That's pretty awesome. Quick note, I'm focusing on the core dig prompts in this video to not waste your time. In real life, I would have branched out much sooner. For example, earlier when Chachib mentioned genre popularity, I'd immediately ask for that analysis and keep digging based on what it find. The
9:16

Step 3: Goal Setting

third step, goal setting. This is extremely important to get right because imagine if our manager asked us to analyze the sales data and after working hard to create 20 beautiful slides, our manager says, "Wait, I just wanted to know if we should discontinue product X. " This is what happens when our manager is an idiot. I mean uh when we analyze data without setting clear goals, we have something that's technically correct but ultimately useless. Obviously, the prompts we use in this step depend on the specific goal. So, I'll just share one example. Here's the prompt. My goal is to understand and you specify your goal here. I want to understand what content Apple TV should invest in next. Given this goal, which aspects of the data should we focus on? This is basically like giving CatchBT a mission briefing. It helps the AI prioritize what's important and ignore what's not. Okay, this is very useful. So, Traded D first breaks down our options for us. For example, if you're in the Apple content team, you might care about viewership, audience demand, and content supply, right? So, you would want to do this. If you're in the finance team, you might want to learn more about unit economics and do that. Scrolling down, we even see a step-by-step road map. So, first we might want to clean our data. Okay. Then, we build a genre scorecard. That could be very interesting. Then we rank our opportunities, layer in trend velocity. I would have never thought to do that. And finally, stress test with outliers. Makes sense. And here's the type of insight this process would surface. Uh, true crime series deliver three times the medium views of all series. They cost 18% less per finished hour and have climbed from 4% to 9% share of total watch time in the last 3 years. Okay. Wow, that's really impressive. Pro tip. A final question I always like to ask Chad PT before any
10:55

Bonus Prompt

presentation is what are the key questions someone reading my analysis would ask and how should we proactively address them. This prompt single-handedly saved my ass multiple times by anticipating but Jeff what about this questions from managers and overly ambitious peers trying to put me down. Just kidding. Everyone loves me. How could they not? Two things I'd like to leave you with. First, the dig framework plus chajbt levels the playing field for regular untrained people like us. It's a simple repeatable process we can all use immediately. Second, although I cover the essentials today, the full Corsera course touches on other important concepts like how to mitigate hallucinations and debug weird data errors. So, if you want to level up your data skills, sign up for Corsera using the link in the description to take advantage of my special offer of 40% off for 3 months of Corsera Plus. If you enjoyed this, check out my comprehensive Chachi PT pro tips video next. See you all there. And in the meantime, have a great one.

Ещё от Jeff Su

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться