You’ve Been Underusing Dataclasses (These Tricks Are Wild)

You’ve Been Underusing Dataclasses (These Tricks Are Wild)

ArjanCodes

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (6 сегментов)

Segment 1 (00:00 - 05:00)

Do you see that in it far thingy there? Do you know what that actually is? It looks like a field, but it's actually something else. I'll explain later on. Today, I'm going to show you seven interesting, sometimes even weird things you can do with Python data classes. Now, if we look at data class, they were designed to make plain data types of classes a bit easier to deal with. They generate a bunch of boiler plate like an initializer, a wrapper, a comparison, and a few other things as well. But under the hood, they're just normal Python classes, which means that you can combine them with uh class hooks, descriptors, introspection, context managers, and a bunch of other things as well. Now, if you treat data classes purely as some kind of strct, then you're missing out on most of that power. And that's a pattern I see all the time in Python and software design in general. People learn some type of feature but not really the deeper design ideas behind it. That's also why I'm working on a new program called software design mastery. This is way more than just another online course. I can't say too much about it yet, but it's going to go deep into design, way beyond what you see here on my YouTube videos. It's the most comprehensive thing I've ever created. If you want to be the first to know when the program opens, make sure you join the waiting list at ion. code/mastery. The first thing that I want to show you is a data class as a singleton like factory. Let's say you have a config object like this and you want to have different types of configs per environment. Pretty standard thing you're going to need. Right now, I've created a data class here called config. It has an n so I can keep track of which environment it's part of. And I have the actual config values. In this case, there is just a debug flag. But of course, you can add any types of instance variables that you want here. And then I create a few of these config objects and I simply print the value of the debugger flag. Now when I run this, then this is what we get as a result. Pretty simple. But let's say you want to share this config object in different places in your code. Now you could turn these things into global variables and import them in the various places or you could do dependency injection but then you have to inject the config everywhere. Another thing you could do is actually build a singleton like factory where actually creating the config object will return that object if it has already been created earlier for a particular environment. And in order to do that of course we have to implement the singleton pattern inside our config class. So the first thing that I'm going to do here is keep track of the cache which is the instances that we need to uh keep track of. This needs to be a class variable because we only want one for every instance of the data class. The problem is if I simply declare the cache as a dictionary of let's say the class name to self which is the config type that we have here. Let me also import that from typing like so. So now the issue is that every instance of config is going to have this cache and we that's not what we want. We want a single cache. So we can store the config instances there. Now in order to solve this you can actually annotate this as a class variable by importing the class v annotation from typing like so. And now the cache is a class variable instead of an instance variable. So it's going to be shared between all the config classes. We can initialize this to the empty dictionary. And now the only thing we need to do is to provide a mechanism for creating instances of config that also take into account the cache. Now you could try to use the initializer for that. The problem with that is that you have very little control of how these values are going to be initialized. Uh for example, I have here an A config and a B config. When I create the A config, I set debug to true. I want this to be reflected in the B config when I load it here. And unfortunately, that's not going to properly work with initializers. So instead, what I'm going to do is I will create a class method. Let's say forn and this is going to get class the environment and the debug flag. And let's say default that is going to be false. And this is going to return an instance of config self. So if n not in the cache. So since this is a class variable, I can access it directly like so. So if it's not there, then we need to store it. And we use the n value for that. And then we create the instance of the class.

Segment 2 (05:00 - 10:00)

And finally, we're going to return the cacheed value like so. And then instead of creating the config objects here like so, we're going to use the class method like so. So now we've created a singleton like behavior. Now this is not ideal because every time we add extra instance variables here we're also going to need to add the extra parameters here. There's some duplication of default values. So you could optionally just remove this to simplify that. And of course here we also need to pass again those arguments. So this could be improved. One thing you could do is simply turn this into generic arcs and keyword arguments and that will work just fine. Or you can leave it like this if there's not too many elements in your configuration. But the nice thing now is I create my uh production config right here where debug is set to true. And then I create another one right here. But I don't pass the debug value. Well, since it's already been created, the production environment config. This is not going to create a new instance. It's simply going to return the existing one. So as you see when I run this, actually the debug value of B is also set to true. So this means that wherever I need a config, let's say we have some other function. So here I can basically say okay I need a config. So I'm going to do config dot forn let's say defaf and then let's say we're just going to print the config. And if we run that of course nothing happens because I have to actually call the function like so. If we now run that, you see that we actually get our config object set with the value that we initially set the debug flag to. So this is an alternative for accessing configuration settings without having to import an actual config object somewhere without uh having to pass a config object as an argument and do dependency injection everywhere. Let me know what you think about this approach and whether you prefer this over one of the other options. Second thing that I want to show you is data classes with automatic registration. Now data classes don't change how classes behave. You can still implement hooks and dunder method and they're going to work exactly the same as with regular classes. Now let's say we have these data classes here. User created, user deleted and we're going to create some instance of that and print it. Well, it's not very exciting but as you can see we get this as a result. Now let's say that these are actually events and of course when you define event classes it's helpful to have some kind of registry of the types of events that are available in the system. So ideally I'd like to have let's say a dictionary or a list containing all the different event classes so that I can add listeners or do other things with that. Now of course one thing you could do is let's say okay let's uh create a registry and this is going to be a dictionary from let's say string to type any because we don't have any restriction on the type of class that we're dealing with and initially this dictionary is going to be empty like so and now what we can do is now that we have this registry I can in my main function add the various classes manually. Right? So my registry has a user created class and I can do the same thing for user deleted and other any other type of events that we need. Now ideally it would be nice if this was actually done for us and fortunately we can do that by using decorators smartly. So let me uh remove this line again. So I'm going to solve it in a different way. Now one thing you could do is create let's say an event class that does this registration part for you and then we could have a user created and use deleted inherit from that event. But we can also make something that is very similar to the data class decorator and actually calls itself which is a kind of slightly more fancy way of solving this. So let's create this decorator and let's call it event. And this is going to get some class of let's say a type T. We can use generics in this case like so. And of course the class is a type T. So we pass the class in. And we're going to also return something of type T. And then what we can do here is registry. And then we're going to take the class name and we're going to store the class. And then we're simply going to return the class like so. And now we have a

Segment 3 (10:00 - 15:00)

decorator to specify something as an event class. So I could uh write this on top of the data class decorator like so. And then maybe we could also print registry to see what this actually does. So let's do that. And as you can see, my registry now contains these two classes, which is nice. But we can do this a bit smarter, right? Because now every time we have to type event and data class, that's not so nice. It would be great if uh we only had to type this and the data class bit is actually done by the event for us. So what you can do that's actually really easy is instead of storing it like this, we could actually call the data class decorator directly on the class like so. And now I can actually remove these. Here we go. And there's an issue here. I'll come back to that in a minute. But when you run this, you see we still have our classes here. And we can still print user created as a data class because it is a data class. Unfortunately, Pilance the type checker doesn't like this. That's why I get these uh red squiggly lines here because actually see when I hover over this, it says expected zero positional arguments. And that's because Pilander no longer detects that this is actually a data class. So it doesn't detect that these things are actually fields, instance variables. Fortunately, there is an easy fix for that. We can import from typing the data class transform decorator. And what this does is that it tells the type checker that some function actually behaves like something that creates a data class. So we're going to simply put this on top of the event function. And now you see that our warning is gone. And when I hover over the user created class, you see it still marks this correctly as having a single argument. And of course, this doesn't affect how it's run. So this still runs as expected. So now really cool. We have created a sort of advanced data class thingy which also registers itself in a dictionary that can be very useful for events or plug-in systems or message handlers or anything else in that vein where you would like to maintain a list of handlers or plugins. The next thing I want to show you is data classes and validation. Now data classes don't validate fields by default. You can set them to any value you like. Of course, if you want validation, the typical thing you would use then instead of data class is pyantic. But then you also have to install an extra library and maybe you don't want to have all those extra dependencies in your code. Now instead of using pideantic, you can actually build a tiny validation framework yourself using a bit of decorators introspection and post in basically building your own mini Pantic. I'm not going to create this from scratch because that would make this video way too long. But I'll just show you quickly how you could set that up. So here's the code example that I use for that. So first I have a constant which is used to store a validation function on an object. So basically if you create some object and you want to validate it, it needs to have functions that validate certain values. In order to do that job to turn a method into a validation function, I created this validator decorator. So this gets a field name which is the thing that it's going to validate and then basically it takes a function which is the validation function and then basically what it does is on that function it adds an extra validation attribute which is as a value the field name. So we know that particular function is a validator function and then we have a validatable class something that can be validated and this has a post init method where it validates the fields. So what it does it goes over the validators that have been defined on this object that's actually a another internal method that goes over all the attributes and checks whether these are methods and whether these are actually validator methods. If so, then it's simply going to yield those things. And then for each validator, it checks whether the field name is there because of course a field name for it to be valid, it needs to be there. And then it calls the validate function on that value to check that it's actually valid. And as you can see, the validate function returns a new value that is also stored as an attribute. So that means that you can also do things like remove whites space or other types of cleanup that you want to do as part of the validation process. And now that we have all this boiler plate set up, actually the rest is pretty easy because what we can do then is create another class. Let's say we want to use this. We can create a user which is a validatable class and it's also a data class. So we have name and age and then I specify my validation functions by using this validator decorator and there I have my

Segment 4 (15:00 - 20:00)

check. Okay, age cannot be negative. Uh or I can validate a name and for example remove some information like uh stripping the empty space and things like that. So that means that now as an example I can create a user. I have a bunch of white space here and uh I can also create another user which has a negative h which is not allowed. It's not valid and then I can catch the value error. So when I run this then this is what you see. We get the user Alice with the whites space removed and we get an error that the h cannot be negative. So it's really a pretty tiny validation framework. And by the way, here I'm combining data class with something that's validatable. Means that this works with regular classes and with data classes. If you want this to only work on data classes, perhaps you could integrate them just like I did in the example before. You can try that yourself or leave that as an exercise. Next thing you could do with data classes is use them as some simplified SQL schema generator. data classes expose their structure at runtime because they need that information in order to create all the instance variables and that makes them useful as a sort of single source of truth. So the idea is that Python fields are SQL columns and on top of that you can specify metadata in data classes and those could be constraints like primary keys or indices or things like that and here's an example of how you could set that up. So I have a user row data class. Uh this has an ID which is a field and I'm supplying it with metadata. In this case I'm saying that PK primary key is true. And metadata it's basically unstructured. You can add any information here that you want. Then I have an email and I have an optional age which default is none. So then what I have is a protocol to specify that something is a data class and that basically means it will need this attribute. It's purely there as a helper to make sure that the types actually work out correctly. So you don't need to do this if you're not as fussy about types as I am. But then we have a function to SQL schema that gets a class and then it generates the schema from the field information. So in this case you see I have a for loop where I go over the fields in a particular class fields comes from the data classes module. I print the type just for debugging and then I extract the information. So for example I check for the type and see whether it maps to any of these types that I already predefined here. And of course you can expand this to other types as well if you want to. If there is PK in the metadata I'm going to add primary key. If the default is none I'm going to add null. otherwise move that not know. So these are the kind of things you can generate here and then I create the final SQL statement from all of that information. So if I now run this then as you can see we get this as a result. We have a user row table with an ID an email and an age and ID is a primary key. So now you can actually build your own little SQL alchemy which is really cool. Now, important to know, generating schemas like this is fine, but of course, if you're building actual SQL queries, especially if you're using user supplied values, don't build them this way because of course you want to avoid injection attack. So, you have to always be careful when dealing with SQL statements. So, data classes, they're great for structures like this, but when you're executing queries, you should always parameterize them. Next thing that I want to show you is cached properties. Here I have a data class called endpoint. This is basically used to represent some sort of API endpoint and the information in the endpoint is the URL right. So here for example I have an example of such an endpoint which is my software design mastery weightlist signup page. Want to sign up just go there and then because I have this class I created these uh properties right here and then I can use that to print some information about this particular endpoint. Right? So I'm using uh URL parse to get that information. So this one gives me the host name and this one checks whether this is an HTTPS URL. So if I run this as you can see we get the host and we get true it is HTTPS. Now the issue here is that every time I call this property it needs to parse the URL. Now you could say why not simply use an attribute where you store the parse URL. You can use post in it or something like that works but it is kind of finicky because you have to call setter in post init. It's a bit hacky and also your class then contains redundant information which is maybe easy to mess up later on if you accidentally uh do something with that data instead of parsing the URL here and then any other property that you needed is you could actually use another property but cache it. So fun

Segment 5 (20:00 - 25:00)

tools has a cached property that you can use for this. So we can do then is create a cached property and notice that the data class is frozen. So we're not adding extra data to the instance here. We're simply adding a cache property. Let's call that parsed. It's going to return a string and this will return the parsed URL like so. And actually that's not a string at all. That is a parsed result. And I think we need to import that from URL parse results like so. So that's our cached property. And then what we can do is instead of calling URL parse here and here we can simply do self. part parse. hostname self. parse scheme. And then let's print this. And you see we get exactly the same result except now the nice thing is that we turn this into a cache property. That means that every time we call one of these properties it will call this but this is computed only once. So the class is still immutable which is nice but you still get convenient efficient computed values. This is great for uh parsing configs. URLs, ids, timestamp, anything where you want to avoid recomputing it all the time. Next thing I want to show you is using a data class as a sort of selfbuilding CLI argument parser. Now data classes already define field names, types, defaults. So instead of defining uh command line interface arguments twice, we can actually build the parser automatically and even better put all of that CLI plumbing into a superass so that subasses stay really simple. And by the way, if you're enjoying this video so far, consider liking the video and subscribing. This would really help my channel because YouTube then pushes out my content more and then even more people see my face on the internet. Anyway, this is how that would look like. So I have a super class here called CLI arcs and that has a class method from command line. As you can see I'm using the built-in arc parse uh module here to parse the arguments. So what I do here is I simply go over the fields in the class. Actually later on you see that I define the actual arguments and then I do the parsing right here and depending on the type of things that I get I add the arguments to the parser and then I parse them and then I store those things as a value. So then once you have this set up it's actually really easy to use because I can simply create a data class called arcs and I inherit from uh the CLI arcs class. I define my fields and I can even give defaults like this. Then I parse them from the command line and then I have a data class instance with the argument values that I print right here. Now if I simply run this without any arguments whatsoever, as you can see I get verbose false. I get the default file name and I get a default number of retries and these are the default values in my data class. But if I add arguments now like a verbose for example, it's now set to true. So this works really neatly. Or I can set the file name to uh let's say test. txt like so. And now it changes the file name. So this is a really easy way of dealing with command line interface arguments. And it's all hidden behind this nice little data class here. And I can simply specify them like so. So you can put this in a separate module and then you can use it like this. Really neat or you can use typer. It's up to you. Last thing I want to show you and then I'll spill the beans on that in thing. Data classes as a context manager. Yes, data classes work actually great as a context manager because you can specify all sorts of extra information uh like metadata belonging to the resource or the resource itself. Here's an example of how you could do that. So let's say we have a file resource context manager. So this has an enter and an exit method. And what happens here is that when we create the file resource, it basically sets these values. And then we can do when we enter the context manager is that in this case we open the file based on the path and the mode. And same thing we can also close the file in the exit dunder method if the file is not none. And then in our main function, we can use this file resource really easily by simply creating the object, passing the path and the mode that we want to use and then just checking the file is there and then we can use the file and access it directly from the resource by using the

Segment 6 (25:00 - 27:00)

dot notation. And because we store all this information in the object, we can also access it here. Like for example, knowing what the path is and what the mode is. So when I run this then you can see that we open the example. txt file we write to it and then we close it again. As a result you can manage a resource in this way. You have very little boiler plates. You can store extra data about that resource in the object and uh returning the data class from enter basically keeps everything together because then you can use it within your context management scope. So, it's very clean, explicit, and quite pythonic, if I may say so. Finally, back to initar, the thing I mentioned in the beginning of the video. I kept you hanging way too long. Now, what does initar actually do? It's not a field. Raw password in this particular example is not a field, but it is passed into the initializer. It also shows up in the post init method, but it's not stored on the instance. So, here's how you can use that. So let's say I have a user with a password. Of course, we don't want to store the password. That would not be very safe, right? So what we want to do is instead store a hash and that's exactly what's happening here. So post init gets raw password. That's what invar actually does. It basically tells the data class decorator, hey add this thing to the initializer. And then we hash it and we store it in password hash like so. And the nice thing is that if we create a user like so with a password and then you see when I run this that we actually get a hash that is stored right here. And interestingly if I print the entire user as you can see it only has the email and the password hash. It didn't store the raw password even though it was defined here as sort of a field. So that also means if I try to let's say print the raw password like I'm trying to do here, you're going to get an attribute error because this is simply not stored on the instance. So if you need to pass some information like temporary inputs, passwords, secrets, stuff like that, and you want to do something with it, but you don't want it to be stored in the instance, then initar is a great way to do it. So small feature, but very useful. So the takeaway, data classes don't change Python's object model or anything. They generate a bunch of boiler plates, but a data class in the end is still a regular Python class. That means you can do interesting designs with them. But what do you think? Have you used data classes like this? Uh what are some patterns and designs that you use yourself a lot with data classes? Did you find any of this useful? Let me know in the comments. Now, if you want to learn more about data classes and some features that few people know about, watch this video next.

Другие видео автора — ArjanCodes

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник