30 Seconds to 4 Seconds - A Real SQL Server Case Study
18:22

30 Seconds to 4 Seconds - A Real SQL Server Case Study

Database Star 31.05.2026 1 125 просмотров 64 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
📝 Get the AI prompt from this video: https://databasestar.mykajabi.com/lpkj-ai-sqlserver/?video=QybD_ybyyyI 📝 Get 56 bad query patterns & fixes: https://databasestar.mykajabi.com/offers/zPYFUthD/checkout?video=QybD_ybyyyI In this video, we look at a slow query in SQL Server and make a series of changes to try to get it to run faster. Then I ask an AI tool to analyse the query to see what improvements it suggests. TIMESTAMPS 00:00 - Our query 03:35 - Improve the filter 04:19 - A structural change 06:44 - Move calculations 10:13 - Add indexes 12:29 - AI analysis

Оглавление (6 сегментов)

Our query

This query runs in about 3 seconds. It looks simple, it's not throwing any errors, it returns the right data. But 3 seconds is slow when thinking about user experience and loading a page. Your users would already be clicking away. In this video, I'm going to show you the real query, use the execution plan to find exactly where the time is going, and then make a series of targeted changes to speed it up. I'll also run the query through an AI tool at the end, and we'll see what it finds compared to what I find myself. This is a repeatable process you can use on your own slow queries. And if you want the exact AI prompt I use to analyze queries and execution plans, you can download it for free from the link in the description. Let's start in SQL Server Management Studio. Here's the query we're working with today. It's based on the AdventureWorks 2022 database. I've added extra rows to a few tables to make the performance issues more visible, which is something worth knowing before we start. So, what does this query actually do? It gets the customer name, sales ID, and date, and a range of metrics for each order, such as total revenue and number of items. It limits the results to orders between 2011 and 2014. We run it, and the results come back in about 30 seconds. We can see in the bottom right that there are about 90,000 rows. Now, before we change anything, this is our baseline. A baseline is important. If you don't measure before you start, you won't know whether your changes actually helped. The next thing to do is look at the execution plan. The execution plan is a visual that shows you the steps the database takes when it runs your query. It's one of the most useful tools you have for diagnosing slow queries, and it's worth getting comfortable with it. In SSMS, there are two options. There's the estimated execution plan, which shows you what the database thinks it will do. And there's the actual execution plan, which runs the query and then shows you what it actually did, including real row counts and timing. I always prefer the actual execution plan when I can use it. It takes a bit longer because it has to run the query, but the extra information is worth it. We run the query again and switch to the execution plan tab. There's a lot going on here and it can look overwhelming at first, but we just need to focus on a few things. Each icon represents a step the database takes. The arrows show data flowing between the steps. In SSMS, the flow goes from right to left, so we read the plan from the far right and work towards the left. The percentage on each step is its estimated cost as a share of the total query. All the percentages add up to 100. So, if one step is showing 62%, that's where the majority of the work is happening. There's also a number on some steps that end in s. That's the actual run time for that step in seconds. This is useful for finding where the time is really going. We can see a lot of steps in this query. If I move around a little and look at some of the steps, there aren't any that have a large percentage number taking the majority of the cost. There are several steps that have 20% of the cost, which are all accessing the sales order detail table. For now, we have our baseline and we have the execution plan. Before we make any changes, I'll save a screenshot of the execution plan so we can use it in our AI analysis later. Now, let's start improving things. Looking at the query, a quick fix we can do to improve the performance is to change this year function here. Using a function on a column in the where clause means that the database can't use any

Improve the filter

index that may exist on that column. So, if there is an index on the order date column here, the query can't use it. We can update our query so any index can be used by removing this year function. We still want the query to filter on this date range, but we can do this by filtering on the order date being greater than or equal to the start value and less than the end value. We use less than because we don't want to include the entire day here. I'll have a video in the future about this in more detail. So, our query is now ready. Let's run it. We run this and see we have the same number of results and it runs in 36 seconds. So, actually a little slower than the earlier query. It shows that an index was probably not used.

A structural change

Remember earlier when we found out that the most expensive steps in the plan were the sales order detail access steps? Let's look at improving those. These steps say clustered index scan and if we hover over the step, we can see the index being used is the one starting with PK_SalesOrderDetail. So, that's a good sign. But the problem with this query is that this table is being accessed five times. Each of them takes a few seconds each, which add up to most of the overall query time. Why is that? It's because of this. There's a subquery inside the select clause and if we look closely, it's a correlated subquery. There are many of them, but we'll look at the first one here. A correlated subquery is one that references columns from the outer query. In this case, it references the customer ID from the main query, which is from the SOH table. That link means this subquery can't be evaluated once and reused. It has to be rerun for every single row in the main result. How many rows is that? In this case, around 2 million. So, this subquery is running 2 million times. It could be running less if the where clause filters out data that is not needed. The fix here is to remove the correlated subquery entirely. We have five subqueries on the sales order detail column, which happen to match the instances of that table in the execution plan. We also have a subquery on the customer and person tables. We'll start with this first one, the customer and person tables, and then move on to the sales order detail tables. So, what is this subquery trying to do? This subquery gets the customer's full name. It joins the customer table to the person table and filters on the customer ID from the outer query. That's the correlation. The where clause references SOH. CustomerID. There's no reason this needs to be a subquery. It's joining two tables to get a name. We can do that with a regular join in the from clause instead. We add the two joins to the from clause, which is one to customer and one to person, and then select the name directly in the select clause. The subquery is gone. We run it and it's similar at 32 seconds. The same 90,427 rows. That's not surprising. One subquery out of six isn't going to make a dramatic difference on its own. Plus, it didn't change anything about those sales order detail tables, but it's the right direction. Let's keep going. The remaining five subqueries all

Move calculations

read from the same table, sales order detail. And each one does a different calculation, sum, count, average, max, and min. They're all filtered on the same sales order ID. So, we're hitting that table five separate times per row. For 90,000 rows, that's 450,000 individual subquery executions just for the sales order detail lookups. There's a much better approach here. Instead of five separate subqueries, we can do all five calculations in a single pass over the table and join the result in once. We can do this with a subquery in the from clause, which is very different thing to a subquery in the select clause. A subquery in the from clause is evaluated once for the query, not once per row. Let's start by moving the total revenue calculation. We create a subquery in the from clause that groups by sales order ID and calculates the total revenue. Then we join it to the main query on sales order ID. The result is available in the select clause as SOD rev. totalrevenue. Let's run this query. We run this and it's now 36 seconds, but the row count has changed. It's showing 86,859 rows instead of 9427. This is worth looking into as we may have broken something. Why are we getting fewer rows? The difference is 3,568 rows. These are orders that exist in sales order header, but have no corresponding rows in sales order detail. In this database, where I've inserted extra rows to increase the data volume for testing, the header inserts and detail inserts didn't produce a perfect match for every row. In a well-maintained production database, this might not happen, but it's a good reminder that when you change a query structure, you should always verify that the row count still matches what you expect. Fast is meaningless if the results are wrong. In this case, the original query kept all 90,427 rows. So, we need to change the join type to match that. We need to keep selecting the sales order header rows even if they don't have any sales order detail rows from the subquery. To do this, we change the inner join to a left join. Then, we run the query. Back to 90,427 rows. The row count matches the original of 9427. The run time is 40 seconds, which is a little slower than earlier. We've only moved one of the five calculations so far, so that's expected. Let's move the rest. We've moved total revenue into the join. Line count, average unit price, max unit price, and min unit price are still correlated subqueries. Since all of them read from the same sales order detail table, we don't need four more separate joins. We can add all four calculations into the subquery we already created. Let's add line count, average, min, and max to the subquery, then select them from the main query. All six correlated subqueries are now gone. The sales order detail table is read once, grouped once, and joined once. We run this query and see the results. It now runs in 15 seconds, down from 40. It has the same results and the same row count. We went from 40 seconds to 15 seconds without touching the database structure at all, meaning no indexes and no schema changes, just a better query. Let's see the execution plan. We can see there are much fewer steps than the original. The sales order detail table is only being read once, on the right here. This is good to see. The query structure

Add indexes

is now clean. Let's see how much further we can improve it with indexes. When deciding where to add an index, we think about what the query is filtering on and what it needs to retrieve. The where clause filters on order date and total due. The joins use customer ID and sales order ID. Let's create an index on sales order header that covers those columns. The key columns here are order date and total due, which are the ones being filtered. We include customer ID and sales order ID as well. Including extra columns means the database can get everything it needs from the index without going back to the underlying table rows. That's called a covering index. We create this index successfully. Now, let's go back and run the main query again. We do this and the run time is 25 seconds. It's actually increased in time, which is unusual. Perhaps there was a caching thing that happened or some other reason. However, this tells us that the sales order header scan isn't where most of the remaining time is going. The heavier work is done in the sales order detail aggregation. Let's add an index there. In the subquery, we're getting the sales order ID, order quantity, and unit price. So, perhaps a covering index can be created to be used for this query. This index is on sales order ID, which is the column we're joining and grouping on in our subquery. We include order quantity and unit price because those are the only other columns from this table that our query uses. With all three columns in the index, the database can run the entire aggregation, which is the sum, count, average, max, and min from the index alone without reading the underlying table rows at all. We can run the main query again. We see it now completes in 7 seconds. It has the same row count and the same results. 40 seconds to 7 seconds through a combination of restructuring the query first and then adding two targeted indexes. Now, we'll turn off that actual execution plan option because it can add some overhead that we don't need and rerun the query. We can see the run time here is 6 seconds, so a little faster. That's a pretty good result. I'm not sure we could get much faster than that without other strategies that involve moving data, such as temporary tables.

AI analysis

Now, what does AI think? Earlier, I mentioned we'd run this through an AI tool and see how it compares. I'm using Microsoft Copilot for this. If you're working with SQL Server, there's a reasonable chance your organization already has access to Copilot at an enterprise level. One important thing before we go any further, you need to confirm which AI tools you're allowed to use at work, especially around data privacy. You're essentially putting your query and your execution plan into an AI prompt. That could include sensitive business logic. So, make sure you know what's allowed before you do this with production code. For this video, the query and execution plan are from a sample database, so it's fine. I'll paste in the prompt and add the original query. For the execution plan, I couldn't attach the file directly because of a format restriction. For the execution plan, I've added in the screenshots from earlier, one from the left side and one from the right. I'll attach them all and let Copilot it together. Let's see what it found. There's a summary at the top. It says the query is suffering from correlated scalar subqueries executed repeatedly per row. This is something that we covered in this video. There are massive scans, cardinality misestimation, expensive joins, indexes, and aggregations. Let's scroll down to learn more. The first suggestion was correlated subqueries. It says each of these forces SQL Server to rescan SalesOrderDetail once per row of SalesOrderHeader. This is exactly what we found in our analysis and was the biggest cause of the slow performance. Copilot explains why it is bad and what the fix is, which is what we did. The next suggestion is a correlated subquery for customer name, which is the same problem just with lesser impact. The next section is about execution plan red flags, including the huge scan on SalesOrderDetail. The suggested fix is to create a covering index, which is exactly what we did. Next, there is a point about the merge and hash joins and why they are bad. It says the fix is covered by indexes and query rewrite, which is what we did earlier. The next point is something we didn't catch ourselves, cardinality misestimates. This means the database thinks there are a different number of rows in a table than there actually are. In this case, it could be a side effect of the script [clears throat] I ran to add extra rows for testing purposes, which might not have triggered a statistics update. Statistics are what the database uses to estimate row counts when building a query plan. If they're out of date, the estimates are wrong and the database might choose a less efficient approach. Updating statistics is something we could have done and didn't. The fix for this is to update statistics on these two tables. We'll come back to this. We can keep scrolling and see this point about a non-sargable predicate, which is the year function we addressed earlier in the video. Copilot has also provided a rewritten version of the query. It uses a CTE to pre-aggregate the product totals. The approach is similar to ours, just structured a little differently. I won't run this as it should show the same results and performance. We've got a list of recommendations here, then a summary table, which is a nice output. The AI analysis was useful. It confirmed most of what we found, and it surfaced one thing we missed. Let's update those table statistics that Copilot suggested. We can scroll up and see the commands here. I can copy these and paste them into SSMS. Then I can run them. After a minute or so, the commands are completed. Let's run the main query again. We run this and see the results here. It shows the right number of rows as a total, and the run time is 4 seconds. So, it seems this suggestion has helped the run time. Let's look at what we did across these steps and what it tells us. We started with a 30-second query that had six correlated subqueries in the select clause and a non-sargable date filter in the where clause. The first thing we fixed was the year function on order date, not because it gave us an immediate speed improvement, but because it was structurally wrong and would prevent an index from helping later. Then we removed the correlated subqueries. The customer name subquery became a join. The five sales order detail subqueries were consolidated into a single group subquery, joined in once with a left join. After those changes, the query ran in 15 seconds. No index is added, but we had a better structure. Then we added two indexes, one on sales order header covering the filter and join columns, and a covering index on sales order detail covering the aggregation. That brought it down to 6 seconds. Finally, we updated the statistics as suggested by Copilot, which brought the query down to 4 seconds. A few things are worth taking away from this. The first is that query structure matters before indexes do. Indexes run on a query that runs the same subquery 90,000 times are not going to solve the problem. Fix the structure first, then think about indexes. The second is that correlated subqueries in the select clause are worth checking for in any slow query. They look harmless, but they run once per row. You might not notice them on small tables. At 90,000 rows, they cost you a lot of time. And the third is the row count check. When we introduced the join on sales order detail, we accidentally changed the row count by using an inner join when we needed a left join. Catching that required checking the results, not just the run time. You should always verify both. If you want the AI prompt I used in this video, it's available as a free download at the link in the description. You can use it with your own slow queries. Just add your SQL and attach your execution plan, and it'll give you a structured analysis to work from. If you found this useful, you'll want to watch this video next, where I go through the same process on another query and see how it compares. Thanks for watching.

Другие видео автора — Database Star

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник