SQL Tutorial - How to update data using joins in SQL

SQL Tutorial - How to update data using joins in SQL

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI

Оглавление (5 сегментов)

Segment 1 (00:00 - 05:00)

hello and welcome to my channel i am beardeddev in this video we're going to be looking at how to update data using joins in sql we're going to be covering updating data based on a join update in multiple columns in the same statement updating data based on table expressions and in this video i'm going to be using cte so make sure you stay tuned for that and then finally some sanity checks we can carry out before we update data in a live database now if you do appreciate videos on data engineering or data analysis subscribe to the channel and if you do enjoy the video hit that thumbs up button and comment down below your thoughts and if you found this video helpful as always if you would like to follow along with the video the statements are available in the description at the bottom and we're just going to be simulating a simple product price increase process within our database today so we have a product table that contains a few columns along with the price so products we're selling in the pro the current price we're selling those products at so we're going to be into creating that table and inserting data there and then we're going to be like say simulating a process where we load data into a product staging table perhaps on a daily basis depending on how often our prices change and how often they need to be updated and we're going to be using that data to update our data in the pricing table so our product table doesn't store history this is just a sort of back end application database that just stores the product price as it is currently we could look at doing future videos on how to write history to another table perhaps some auditing of what changes we've actually made so let's dive right in okay so we've got a couple of tables now created with data we've got eight products typically we'd probably have a lot more uh we have a code that uniquely identifies a product which is what we're going to be basing the join on we have a name category and our price as it's stored in the product table then we're going to be picking up data from our product staging table and this does include pricing for all products so we'll have a look at that shortly but we can see a simple example of our laptop uh there's going to be a price increase there of 40 pounds so let's start off with having a look at how we can join data so the key is we need an identifier column and in this case we've got a code um so i'm just going to alias our product table as tgt which just shorthand for target common methodology within uh sort of working with them data we have our source and we have our target so our product table is going to be our target and then we're going to inner join our product staging table i'm going to alias that as src for source and like i say we have the identifier between the two tables as code so we're going to join on target equals src code and if we have a look at now combining those tables i'll just comment this query out for now this gives us an idea of when we're looking at the data in the same row we can see that there's going to be some price increases now top tip uh when updating tables we could have hundreds thousands or even up to million billions of rows in this table the amount we can store is really endless with uh with storage these days so the top tip is really just to limit the amount of rows we were updating so if you think about it if we were to write an update statement now we'd update eight rows every time but we only really want to update rows where there is a price difference so we're going to add a where clause to say where target price is simply not equal to a our source price and that's going to limit uh our update statement the amount of transactions where we're carrying out uh also limits the amount of logging that has to be done so we've got our select and then we're simply going to change it to an update key aspect of working with updates based on joins is we must have a good understanding of our aliases so from clause is evaluated first that means we've changed our product table to

Segment 2 (05:00 - 10:00)

our target table and we've changed our product stage into src so if we was to write out our update to say update product set product dot price equals product staging dot price i'm going to be updating multiple columns and within our set we can just put a comma separated list so in this case we're going to be updating a product modified date to current timestamp and if we wish to try and have a look at running that we will get enough uh messages about identifiers that we can't be bound in other words we don't sequel engine doesn't know what table that's coming from uh and typically because we would have column names that exist in the same table we might get even get some ambiguity errors as well so if i was to set like price equals price and if we try and run that we'd get an ambiguous column name so in other words the database engine doesn't understand what we're actually trying to do so in this case we're going to be updating tgt that's what we've aliased uh product table as so we're going to be setting tgt. price equals src. price our tgt dot modified date and notice that isn't actually coming from a product staging table we're just setting this to the timestamp within the database engine of when we actually perform this update so it's quite straightforward to do key thing is we could write out our join as normal but aliases do play a major part within an update so we could go ahead and run that we know we've got four rows where we've got a different price so we can see four rows have been affected and if we now change this and have a look at that product table we know our laptop price was expected to increase to four three nine point nine but the good thing about having this condition in here is where we check that the price is different if we was to simply run the update again we wouldn't expect any rows to change because we haven't made any changes to the product staging table so i've run that again and we can see that zero rows have actually been affected i'll just comment this back out for now hide the results now typically we want to check that things are results are going to be produced as we expect them before we run these kind of queries so i'm just going to reset the tables for now so we will see an actual difference so typically we might see an approach where right we'd write start with our select to see what's going to change so i'll just start with the select all and if we run that we can see these four rows are going to change we're happy with that and then we'd comment out this we'd comment in our update and then we'd run our statement now one approach we could look at is using ctes for this it can be quite cumbersome we need to comment things out comment things in so we could use ctes for that approach so let's have a look at how we would do that so let's remove our update for now and we're going to wrap our select within a cte so let's say with cte as and the key thing here is the columns we've got multiple columns called the same name such as code that exist in both tables and we're going to have to alias those within the cte so i'm just going to simply put tgt underscore the column name followed by source so i'm just going to quickly write those out name you'll notice i have put in square brackets or square parentheses um and that is simply because it is

Segment 3 (10:00 - 15:00)

recognized as a keyword in sql so we've got category we've got our target price so again tgt underscore price we will include date stamps as well because we are using our date stamps to verify that some rows have actually been changed so we could write a query after our update to say select all when modified date is greater than the beginning of today or a few hours ago or whenever we run the query and then we can i include uh source columns as well so again it's just going to be src underscore the column name which in this case is name src underscore name category as src category and lastly the price okay we're going to leave our where closing remove the results grid for now i'm just going to indent within the cte if you've ever watched my videos before you know how pedantic i can be with things within parentheses um indented to make it a bit more clearer so if i have a look at that we've got the same results but what we can actually do now is just perform the update on the cte so we've got our select so if we ever wanted to check we could just run this select and we can have a look at what's going to change but now we can simply actually perform an update on the cte so i'm going to write out update cte i'm going to be setting our tgt price that's going to equal our src price our new column names i'm going to set our tgt modified date equals current timestamp we can see i'm just going to uncomment this to see the results at the end we can see again we're expecting our laptop price to change to four three nine execute the query have a look at the results and again that's been executed and at this point we can simply check in a query and we can see there's actually no results so there's nothing else that needs to be updated that's been completed a very important point to remember though is when working with ctes the cte doesn't actually have any data it doesn't store any data within the database it only exists for the duration of the query so whenever we're updating a cte we'll prefer performing any dml so anything insert update or delete when we're doing that against the cte the database engine is clever enough to understand what we're trying to do and it goes back to make those changes to the underlying table so it's not that it's been updated within the cte it's been updated in the underlying table that the cte is returning data from so that's an important distinction you've not changed the cte you've changed the underlying data in the database and then the cte is pulling that data from there so there's no changes to the cte actually happening even though we're writing out update cte in this row in this case and the last thing i wanted to move on to is some sanity checks regarding um update statements so again updates are permanent once they're committed the changes have happened so it's important to get this right so one of the things we could do is we've talked about our identifier being code so one thing we could put in uh as a check is if we have a look at our product staging table which again stores all of our products um even though there might only be some price changes against some of those so perhaps we could have a look at that process to only send in changes um that could be something we could sort out within that operation now one thing with our identifier is we want to make sure that

Segment 4 (15:00 - 20:00)

we only have one value per code so we could put in a simple check ahead of our update statement to say right select code and then we could group by want to having count greater than one so this is a very simple check to say have we got any duplicates or maybe we've got codes in there that different prices and we would expect no results because although that's a unique identifier within our product table it perhaps won't be within our staging table so we need to verify that is unique and we only have one value but let's take a look if we didn't have that in place if there was the values so again let's update uh original table and into the product staging i'm just going to add another price for our laptop we've been looking at in detail and now if i check if i have my sense check query here and we could even build on this a bit further pass this into a variable have some if logic within the if we have this with installed procedure say to say check this table if this is good doesn't return any rows don't run this sorry do run the update if it's not then we can send in a message back to the caller to say there's an issue here something needs to be resolved so we can see that is actually present in the table more than once now and if we go back to uh select within our cte we can see we've got five rows now if we have a look at the top row and the bottom row they both have the same target code and they're going to be changed to two different prices so what would happen in that scenario well if we keep running if we run the cte so we don't get an error we've got an update and if we look at this code we've got the price of 4 3 99. 99 okay and what would happen if we run that again say so let's imagine this is within the stored procedure that we run once a day and if you would like to see if i'm talking about things you're not sure of please do let me know in the comments and i'll be happy to put some videos up explaining in more detail what i mean if i execute that we can see the price has now changed to 479 if i execute it again imagine this is running daily so we're in a situation where it's going through a loop of changing the price every time and that's because if we look at our where clause we can see every time we run it we're still going to get that one row because it's going to compare the price it's going to do the join and it's going to compare the price and whatever it's been changed to there'll always be a different row now interestingly enough if we take out the where clause and we keep executing it remember before how the price changed so the where clause is in there to limit the amount of rows we're changing so at the moment it's 439. 99 and if we keep running it without the where clause it's not changing it's still staying the same now this is what could be called a false positive so it's where something appears to be working well but is not actually working so whether that showed nine 439. 99 or four seven nine i think the other price was it would be correct it's just that the database engine is working that way to set it the price now we're giving the database engine a choice we're saying choose one of these prices i've set two for it choose one and whichever it chooses will be correct in terms of database processing now it's just coincidence uh execution plans how it's executing that it's consistently getting the same price at the moment now it could be that works for a long time and i have experienced this in the past something happens perhaps a database migration um perhaps a clearing of the cache over the execution plans or something along those lines to cause this to change and then one day it will just appear as 479 and it's very hard to understand and backtrack exactly what's happened and why um it's one of those where it's like

Segment 5 (20:00 - 22:00)

i've spent days on this problem and i can't understand but if we did just check the simple things it would have been straightforward the other thing we can actually have a look at with um updates this is a bit of a bonus within this video so if we were to go back to our simple sort of without cta we could do this within the cte but let's just go back to our original update based on a join so if we comment that out sorry uncomment it and again we'll go back to our starting position is we could actually wrap this within a transaction so we want to have a look at our laptop price changing or we could actually start a transaction and this is something i do occasionally particularly for ad hoc updates so i'm going to start a transaction i'm going to run the update but the data isn't actually being written to the disk until i actually commit this transaction so i'm opening a transaction here with begintran i'm going to perform the update and then i'm going to check my target table to make sure it looks as expected now if i wasn't happy with that if i was getting results that perhaps didn't look right i can then simply roll back that transaction as this as if nothing had happened notice there it's not a way of undoing queries it's just another sense check that we could have in there now the opposite of that is if we were to start a transaction again run that update and then have a look and think we were happy is we can just commit that transaction and committing is writing that to disk so anybody querying this table in future will be able to see that data has been updated but we'll definitely do some more videos on transactions so keep an eye out for those coming to the channel thanks a lot for watching i hope you have enjoyed that video like i say if you have let me know in the comments below really appreciate it really helps me to grow the channel 5

Другие видео автора — BeardedDev

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник