# Custom schema and multipart naming in Microsoft Fabric Lakehouse

## Метаданные

- **Канал:** Azure Synapse Analytics
- **YouTube:** https://www.youtube.com/watch?v=gjK-tT7C4ZA
- **Источник:** https://ekstraktznaniy.ru/video/44747

## Транскрипт

### Segment 1 (00:00 - 05:00) []

hi everyone welcome back to fabric expresso new episode about data engineering and data science and today we have a guest Ted is joining us for like it's another time fourth or fifth time when you are joining to discuss the latest innovations that happened for Lakehouse hey Ted hi everyone so what has changed recently so it was long anticipated feature uh and finally we released it so we announced release of schemas in a Lakehouse which basically allows people to organize their tables within a Lakehouse in the folders so I'll show later you a little bit how it works but yeah there's a little bit more things to that in addition to this we also introduce possibility to query data using multiart naming and we introduce possibility not only to specify schema in your queries but also workspace that gives some cool features such as cross workpace queries using spark SQL sounds exciting and now I'd like to read one question there is a fabric community portal and a few months ago one user asked the question I'm reading what are the various ways of creating of new schema in l lake house by default dbo is available so is the feature that you mention related to schemas solving the question and the problem that the user came up with yes absolutely actually that's one point question so uh when you provision a new Lakehouse today we are having a public preview so it's opt in so you'll still need to select a checkbox when creating a new Lakehouse which will enable schemas and as you can see when a new Lakehouse is created you have a dbo schema created by default dbo schema cannot be renamed cannot be deleted it stays there now if you click on tables and select new schema that's where you can add any additional custom schemas that allows you to organize your data into any kind of areas like it could be business areas or something else for example creating a marketing scheme sales schema Etc now in addition to that what you can do now you can move tables across schemas using drag and drop so that's one way of creating schemas another way of course is using spark code you can easily go and execute your P spark and create additional schemas as you know uh everything in fabric is quite open in this case it's two we haven't closed any data around schemas so schemas basically are folders in your one Lake repository which Lakehouse runs on top of it so if you will go with your uh one Lake Explorer and look at the lake house you will see that within the tables folder there are other folders which are basically representation of schemas so if you want to create schema using one Lake Explorer you could just create a folder in your tables folder and that will be recognized as schem but that option of course is to use something which is as a managed service it's either lake house Explorer or spark yeah once we are around it so let's assume I use spark and I play with the lak housee I create a new schema can DW read it is it synchronized yes so um SQL endpoint first of all uh recognize all of the schemas so basically uh all of the data which you have in the lake housee will be represented in same way in SQL endpoint and data warehouse support a scheme as before Lous didn't so everything what was in Lakehouse was represented in dbo schema now same thing will still happen everything what in dbo schema will be in SQL endpoint in dbo but you will also have other schemas with tables represented in SQL endpoint as they are in lake house so yeah all of the data which is seen in lake house will be also s end point and from the scenarios perspective what are the main customer scenarios as a customer when I should consider using it what is the flow what's the path one of big scenarios which was highly anticipated and customers were asking for it is how could I bring multiple Delta tables imagine you have like a data Lake where you have like thousands of Delta tables

### Segment 2 (05:00 - 10:00) [5:00]

and you want to reference them you don't want to bring that data through your pipelines copying them or data flows but you want just to reference it previously the approach was that you would have to create individual shortcuts for each of the table and when the number grows to a high like thousand and so on it becomes to TDs tasks another thing is you know if there are any changes which is happening in your data Lake like additional tables get at it you would need to go and create new additional shortcuts now you can create a schema shortcut which points to a folder which contains all your Delta tables and automatically all of your tables will be represented in the louse so you don't need to pick H an individual Delta table you just need to point to the parent one and all of that Stables will be represented in your Lous so that's one of the scenarios which is more of Rel related to Bringing data majority of the scenarios where customers use heas is for data organization as I mentioned either breaking it by geographies or breaking by business areas what additional things schemas will enable with one L Security is to set permissions not only to tables but also to particular schema access so you can set for example certain people read access only to in schema like sales but not give them access to human resource data and vice versa so more granity better uh access control I think these are the key things with schemas do you see that will impact the design or how we are cing up with Medallion architecture in fabric I do not think that will be impacting Medallion architecture directly uh mostly because you wouldn't be willing to put two layers of your medallion architecture into the same Lakehouse first of all due to access control secondar due to the data movement itself I would imagine that you would still go for the option where you create multiple lake houses which serve different purposes the thing with Lakehouse is that when you provision a lake house you get a full feature set which is used for you so you get your SQL endpoint you had semantic model and all of these models will be consuming all of the data in a Lakehouse and if you have for example bronze layer maybe you don't want your semantic model to consume all of your data so as a result of that I think mixing layers and schemas it's possible but depends on the business case in terms of the feature limitation do you see any other than its preview so we are encouraging to evaluate it test it but still a few months few weeks that it will hit the ga stage GA means it's designated to work and serve you for production how do you see it so there is a list of limitations at the moment both related to the workspace naming uh related to watch Spark Run times works with semas Etc you can read more about it in our documentation online the team of Engineers is working to remove these limitations even before we reach General availability but for today there are some limitations which hopefully will not block most of the scenarios as a user is there anything else I should know about schemas before jumping to the usage of this so there are some exciting hidden benefits with int us introducing multiart naming so one of those which I briefly mention is ability to specify your workspace in the name space so today when you reference certain object previously you were using a combination of The Lakehouse and table now you are not only using louse schema and table combination but we also added workspace into reference combination what that allows it allows you to reference tables which are located across multiple workspaces but where the user still has access so you are able to crossjoin data across workspaces for example you might have a workspace with some companies uh sales data in one workspace and you might have your human resource data in a different workspace uh that segregation makes total sense if you want to run a query and check for example which of your employees are actually also customers of your company so you would be able to run a single query which is able to reference the data from Human Resource workspace join it with your sales data and get you a

### Segment 3 (10:00 - 12:00) [10:00]

result so that's just one of the examples where cross workpace where is a really beneficial but yeah I think multi-art naming is a big thing too the feature is in a preview stage what's next with that functionality yeah so we definitely will work on stabilizing the functionality and enhancing the uh support of different run times for example so we have now uh 3. 5 which is experimental runtime and at the moment for public preview it's not supported but we will definitely include that in support uh support of different characters currently there's some limitation in workspace name we also plan to remove all of that and actually enable support of almost any of the characters in The naming and yeah I think that's mainly focus on this particular feature and once it's stabilized our goal in Lakehouse is to continue enhance the capabilities around data Discovery so schemas is one of the option where you able to kind of discover your data uh better by including additional organization of the tables but we also look like enhancing the data with metadata so adding possibility to add custom metadata and other things so definitely follow that space and you'll see much more interesting things coming soon awesome thanks that for joining and for those who are watching us please remember to leave the like button leave the comment with a question or maybe with the idea for the future improvements or future features for the Lakehouse we're happy to take them to our uh road map at the same time uh remember to also visit the page ideas. fabric. microsoft. com just to submit your feature idea thanks for watching and until the next time happy using the schemas and multi-art naming in lak house thanks thank you bye-bye