The promise of Rust

The promise of Rust

fasterthanlime

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (6 сегментов)

Segment 1 (00:00 - 05:00)

This video is sponsored by Google Security. The part that makes Rust scary is it unique. It's also what I miss in other programming languages. Let me explain. Rust syntax starts simple: This function prints a number. We've got all the things that we get from C family languages syntactically. We've got parentheses, we've got curly brackets. We've got a string interpolation (which is not very C-like, I guess) And this is how we would call that `show` function from a `main` function. And we can even call it twice if we want. We can pass it the same variable `n` two times with no issues whatsoever. However, if we were to change the type of the argument from a number to a `String`, then it wouldn't work anymore. And I know that decades of poor tooling have taught us collectively to treat the output of compilers as noise that we can safely ignore while searching for the actual problem. But in Rust, the compiler is designed to teach. You actually have to read the output. It even has pretty colors that I captured here. Here, it's teaching us about the `Copy` trait. `i64` does implement `Copy` because copying an integer from one register to another is very fast. Manipulating numbers like these is something that computers are very good at. That's why we made computers in the first place! `String`, on the other hand does not implement `Copy`. The `String` type specifically with a capital S refers to a valid UTF-8 sequence stored somewhere on the heap. The heap is a memory area managed by an allocator, which has to keep track of what is allocated where. When we create a second copy of a `String`, we first have to ask the allocator to reserve enough space for the copy. Most of the time, the allocator gets to reuse something that was freed recently, but it can end up calling all the way to the kernel, for example, if it needs to map more memory pages. And that's what the compiler suggests here in the help section. And indeed, the following program does work. The other suggestion was passing the `String` by reference, also called borrowing the `String`. And it works just as well in this case. Note that taking a `&String` and we would do something different, but it's not relevant for this discussion right now. So I'm focusing my mind very hard to ignore it. The difference between these two suggestions only makes sense if you're used to thinking about memory management. In other words, if you come from non-garbage collected languages, like C or C++. However, it is relevant even if you're coming from languages like JavaScript or Go, where memory safety is a lesser concern. Because, you know, C Go isn't Go, but it still exists, and so do native Node. js add-ons. JavaScript simply doesn't have the concept of passing something by value or by reference. Primitive types like numbers are passed by value, which means this program passes two different copies of `s` to the `inc` function and ends up printing 0 twice. If we wanted the `inc` function to be able to modify or mutate something that we're passing, it would need to put it in an object first and then pass that object, like so. On the other hand, if we wanted to make sure that `inc` could not mutate something we pass it, even though we're passing it in an object, then we'd have limited options. We could pass `inc` a clone of our actual object, making sure that the original remains untouched. Now, I personally would not ship that clone function in production, but a lot of people have. Just taking a second to point out that if the code in the video is hard to follow, if it's going too fast, there is a text version of that video available on my blog. Patrons, ten euros per month or above get to read it right now, and it's going to unlock for everyone six months later. I've been doing this a few times now. It's a nice model because eventually everyone gets access to knowledge. There's that exclusivity window to show patrons that they're paying for my rent, and I really appreciate that. So thank you. Another option is freezing the object with `Object. freeze`, which would prevent modifying the object's properties and adding new ones and deleting object properties. However, the object remains frozen after the call to `inc`. And if you forget to enable strict mode in browsers by slapping `"use strict"` at the beginning of your code, yes, that's really how you turn on strict mode in browsers. Any modifications are silently ignored instead of throwing exceptions. I'm still not entirely clear on why `Object. freeze` exists, like what it's useful for. I feel like it's very rarely what you want, but you know, tell me in the comments if you're using it, I guess. The situation isn't much better in the Go language. Another popular option. I just want to be able to tell if a function is going to mess with its parameters, whether it'll be able to mutate them or not. And in Go, as in JavaScript, we're not really able to express that. This program prints 0 twice because integers, like in JavaScript, are passed by value. That one also prints 0 twice because Go structs are also passed by value, unlike JavaScript objects. `inc` gets and modifies a copy of `o`. If we actually want to modify the `o` from `main`, we need to pass its address, changing `inc` to accept the pointer denoted by the star here. And then it actually prints 0 and 1. But what if all we have is a pointer to

Segment 2 (05:00 - 10:00)

something and we want to pass it to a function, but we want to make sure it doesn't mess with it. We can't rely on reading the body of the function to make sure it doesn't mess with it. The body may change at any time, it may be provided at runtime by some fancy loading mechanism. All we know is the function signature. Now we've seen in JavaScript, we could freeze the object before passing it. Go has a much simpler object model and just doesn't allow that. What's typically done in this case is to clone the object so that some external function operates on that clone rather than the original, again, like we've done in JavaScript. And because structs are implicitly cloned, all we have to do here is to de-reference or pointer by using again, the star operator. But you know my JavaScript's bad deep clone function from earlier? I've been wondering why I use `JSON. parse` for it. The answer is in the name. We need a deep clone, a clone not only of `o`, but of any objects `o` might contain, and any objects they might contain, and so on and so forth. If we use the object spread operator, three dots inside of curly brackets, that spreads all of o's fields into a new object, then we'd get a shallow clone. And it wouldn't take much to demonstrate that we're still able to mutate parts of `o`. That code prints 0 and 1, because even though we've created a copy of `o`, when we now have two copies of `o`, they both point to the same copy of `s`. Writing a decent deep clone in JavaScript is actually fairly tricky. What do you do with cycles? What if `s` contains `o` which contains `s` which contains `o`? A naive deep clone would call itself recursively, resulting in a bigger and bigger stack until the stack overflows and the program stops completely. My dirty round trip through JSON solution is pretty bad and will choke on things like dates and other custom user types, but at least it detects and rejects cycles. Now, is shallow versus deep cloning a problem in Go? Of course it is, because de-referencing a struct with the star operator only creates a shallow copy. So we have the exact same problem. That program prints 0 and 1, because even though we've created a copy of the outer struct, we haven't inner struct. There are quality deep clone implementations in Go available, just like for JavaScript, and as of Go 1. 18, those clone functions can be generic, so they can return the type you pass in instead of using reflection voodoo and returning an empty interface. A quick read of the README for Go clone will hopefully discourage you from rolling your own. They mention special handling for reference cycles, which are only handled by a separate `clone. Slowly` method. Arenas, an alternative memory allocation strategy introduced in Go 1. 20. I actually have zero experience with that. Pointer types that are actually scalar values, like `time. Time`. enum values, like `elliptic. Curve`. Value types that cannot be copied, like `sync. Mutex`. Atomic pointers—have we stopped pretending that Go is simple already, or are we still kind of entertaining the fantasy? But enough about garbage collected languages. What about languages like C and C++ that make you think about memory and have a `const` keyword? If a function takes a reference to `const s`, can you trust it not to mess with `s`? Of course not. That's not what `const` is for. You silly goose. We cannot trust a function signature. `const_cast` will gladly remove any pretense of constness, allowing us to mutate memory that really should not be mutated. There is no such thing as safe C++ and unsafe C++. There's no boundary to protect us from doing clearly nonsensical things like this. Even the best static analyzers let things through because the language is, at its core, tragically permissive. But starting with the C language is similar. A pointer to `const s` can have its const ness cast away, frighteningly easily. It can add more `const`, but it'll only prevent `s` from being reassigned, which wouldn't change at all whether or not we're allowed to mutate its fields, its contents. C and C++ can never win at this game because they're inherently unsafe. When you start from something lax, trying to make it correct by adding on layer after layer static analysis is like playing whack-a-mole. No matter how many bugs you squash, there's always another one hiding. I'm not arguing we should throw everything away. The teams that are carefully sifting through the piles of C and C++ we all rely on are absolute troopers and there is tremendous value in doing that. But those languages should be seen as asbestos, definitely banned from new construction and gradually, very carefully removed from current infrastructure by professionals. After having Rust click for me, I never again want to rely on a human being for catching those bugs. Not me, not someone with 50 years of programming experience, and not the junior developer we just hired. And that's why I love this Rust code sample so much. The fact that the second call to `show` is a compile error makes me genuinely happy every time I see it. And the reason why it might be a bit clearer if I make our example a bit more realistic. Now we're not passing around the `String`, we're passing an open database connection. The signature of the `close` function in this code

Segment 3 (10:00 - 15:00)

lets us know that we're giving up whatever we're passing to it. On the first line of the `main` function, we own `conn`, it's ours. On the second line, we give it to `close`. And on the third line, we don't have it anymore. And so that's an error. It's not ours to give. And that's just one of the ways in which Rust lets you encode your intentions, express what a function is and isn't allowed to do with its parameters. There's a lot of vocabulary associated to this. If you study programming language design, you will eventually get into arguments about what this should be called exactly. But in Rust, we would call that ownership and move semantics. And similar ideas have been around for a while. In 1990, Philip Wadler wrote, "Linear types can change the world. " Then he waddled away, waddle waddle... And he was right. We've had linear Lisp, we've had uniqueness types in the Clean language, we've had single-assignment C, Hermes, Cyclone, linear types in the Ada language. And there's certainly a whole range of what could have been in regards to C++ and its move semantics. But I will make the careful claim that Rust is the first language to actually take those ideas mainstream. It certainly won't be the last, and I'm looking forward to what's next, of course. For now though, let me try to demonstrate why I like it so much. Sponsor, go! I'm gonna be straight with you. It's time to harden your sh**. Google Security presents— Patch for cash. Rewarding up to $45k for patches that improve the security of open source projects. Aww yeah. Every Tuesday I need to update my phone and my computer and my TV and my watch with an urgent security patch that stops evil nation-states from remotely detonating my Roomba by sending a carefully crafted picture of a lizard! That's embarrassing. Harden your sh**! Does your program _really_ need all those privileges? No? Then drop them! Can it be sandboxed? Then do that! Can you use a hardened memory allocator, or some sort of sanitizer? Then yes, please! Harden. Your. Sh**. Hey, I get it. Not everybody's ready for Rust. If you're stuck with C++, then use the Safe Buffers programming model at least. It's 2025. What's that? You are ready for Rust? Then what are you waiting for? You don't have to port everything at once. Maybe you just replace a couple dependencies here and there. Maybe you just do the legwork of making it possible to write Rust components for a project. Build engineering is real! And it _can_ hurt you. All this work... is what Google Security wants to sponsor through the Patch Rewards program, with payouts ranging from $100 to $45,000 if you do high-impact changes to a tier 1 project, scoped as "core infrastructure data parser". On the list we find, AVIF, VP9, WebP, JPEG-XL, my own favorite. There's a little something for everyone. And notice that the list doesn't just include Google projects. They understand that by making foundational software more secure, it benefits everyone. They're putting their money where their mouth is. And for once, you won't have to convince your boss that yes, indeed, it is worth working on. You'll just be able to go for it and then get paid. Go to g. co/prp now to find out more about the program and read the complete rules, thanks to Google Security for sponsoring not only this video, but also security for everyone. Thank you. That code is not the only thing we can do in Rust. We're not giving out ownership of something every time we call a function. We can also borrow it, and that allows us to pass a shared reference to the function. That's enough to make a getter for the connection's name, for example. We're not mutating the connection, we're not taking ownership of it, we just want to read some things from it. A shared reference is enough. And you'll notice there's no `const` keyword in that code. We have a `const` keyword in Rust, but it's for globals and compile-time code execution, not immutability. So... exactly like C++. Despite the absence of a `const` keyword, it is absolutely impossible to modify the connection from that getter. The compiler says we cannot assign to `c. name`, which is behind a shared reference. `c` is a shared reference, so the data it refers to cannot be written to. Even if we add nesting like we did in the Go example, or the JavaScript example, and like we could have done in the C and C++ example, actually, we still cannot assign to the inner field. We can't escape it. The compiler knows everything. It sees through the whole program, all the types, everything is available at compile time. It is able to check what's going on. And it can see that we're trying to mutate something through a shared reference, which is against the rules. There are ways to allow interior mutability in Rust. More on that later. But what if we feel cheeky? What if we want to try to mutate it anyway for a laugh? Like casting with a `const` qualifier in C and C++, like we did. Can we do that? We totally can. Casting a `const` pointer to a `mut` pointer in Rust is allowed, even in safe code. That's totally fine. Using it, however

Segment 4 (15:00 - 20:00)

that's another thing entirely. As soon as we try to assign one of its fields by dereferencing that `mut` pointer still with the star operator, the compiler complains. It says raw pointers may be null, point to zero or whatever null is defined to be. They may be dangling. They may point to something that used to be allocated, but is not allocated anymore. Or unaligned, I'll let you look at that one. They can violate aliasing rules and cause data races. All of these are undefined behavior. We can still use them, but we have to acknowledge that we've read the disclaimer and wrap our business in an `unsafe` block. In there, the rules of safe Rust still apply. You still can't magically write past the end of a `Vec`, for example, but you are also allowed some more dangerous things, like playing with raw pointers and of course, calling other unsafe functions. With our terrible, cheeky idea wrapped in an `unsafe` block, the code compiles and even runs. Or rather, it ran once on a given day, on my machine, on a specific version of Rust, etc. But as soon as I turned on optimizations by adding the `--release` flag to `cargo run`, it broke. And this isn't specifically a Rust error. That message is from the system memory allocator on macOS, which as we mentioned earlier, keeps track of everything. And notice that, well, we passed to its free function something that, in its opinion, was not currently allocated. Maybe used to be allocated and it's not anymore. Maybe it never was. It was just not currently allocated. To be honest, we're lucky this even ran into an assertion. I was expecting a segmentation fault or a bus error or just silent corruption. And the explanation is simple. We promised we would uphold the invariants. We promised we wouldn't invoke undefined behavior. And yet, we did. So the compiler made some optimizations that would have been perfectly legal if we had held up our part of the bargain—kaboom. And I'm quoting from the Rustonomicon here. "Unlike C, undefined behavior is pretty limited in scope in Rust. All the core language cares about is preventing the following things: dereferencing using the star operator on dangling or unaligned pointers; breaking the pointer aliasing rules; calling a function with the wrong call ABI or unwinding from unwind ABI; causing a data race; executing code compiled with target features that the current thread of execution does not support; and producing invalid values. " Just to give you some perspective, there is no complete list of undefined behavior in C. The closest we have is Annex J. 2 of the standard, which lists 221 instances of known undefined behavior. So undefined behavior in Rust is more restricted, is more rare than in C, which isn't to say that unsafe Rust is easy to write. It still requires being extra careful because all the remaining Rust code relies on that small foundation of unsafe Rust to be correct, and the compiler itself cannot help us with it. That's where Miri, a tool separate from `rustc`, but still an official Rust project, comes in. If we run it on our program with `cargo miri run`, forcing usage of the nightly toolchain with `+nightly`, it reports that something weird is going on. In fact, it's pointing out exactly what we're doing wrong. We had a shared reference, `&conn`, and we're making an exclusive reference out of it, so we can write to it. That's undefined behavior. Quoting from the reference, the bytes pointed to by a shared reference, including transitively through other references. That's important, that takes care of the nesting. Both shared and mutable, and `Box`es are immutable. We can fix our code by updating the signature of `get_conn_name` to take an exclusive reference instead, marked by the `mut` keyword, for mutable, which in turn lets us remove one of the casts. Meanwhile, at the call site in the `main` function, we now have to make the `conn` binding, that's what we call variables, we have to make the `conn` binding mutable with `let mut`, and we have to borrow it mutably with `&mut` to pass it to `get_conn_name`. This new version of the program runs perfectly well in debug and in release. Even Miri is happy with it. Our code still has an `unsafe` block, but that block is no longer invoking undefined behavior. Through the power of being really careful, and also Miri, our program actually is memory safe. And that's why some people think that the `unsafe` keyword is a misnomer. What's inside of there is not inherently unsafe, it is simply doing things that the compiler cannot check. YOLO, or hold my beer, but it's a bit late to change it now. It's worth noting that the `unsafe` block is now completely pointless, since we can achieve the exact same thing in safe Rust, like so. It's what exclusive references are made for, mutating things. Unsafe code is harder to write, it's more dangerous, it requires careful human review, verification of it is

Segment 5 (20:00 - 25:00)

still a work in progress. For example, Tokio, the async runtime, does weird stuff. So when it's being verified, it enables different code paths just to let Miri understand what the heck it's doing. In short, if we can write it in safe Rust, then we should. We've seen that taking ownership of some value is helpful to avoid some categories of bugs entirely, like trying to close the same database connection twice. That bug simply cannot happen with this API, no matter how much you try. We've also seen the ideas of borrowing or borrowing mutably, which give us respectively a shared reference or an exclusive reference. There are programs that make sense outside of this constraint, but in Rust, the fundamental idea is aliasing XOR mutability, or AXM. You can either have several references pointing to the same thing, that's aliasing, that's why we call them shared references, or you can mutate the thing, but you can never do both at the same time. This program is silly, it's not doing anything useful, but it's perfectly correct. The compiler has no issues with it. That one, however, is not correct. We cannot borrow `conn` mutably more than once at a time. If there existed multiple mutable references to `conn`, then those could be passed to different threads, which could then result in data races, the same value being modified by multiple threads at the same time, or even written to by one thread and read by another thread at the same time. This could in turn result in memory corruption, which could be exploited by attackers to gain access to sensitive systems. This is not a hypothetical, by the way, just like putting on a safety belt is not superstition. People crash their cars all the time. We know what happens when they do. Similarly, we know how people break into servers. Sometimes it's social engineering, and sometimes it's a buffer overflow. You're looking at the infamous November 2022 memo from the National Security Agency about software memory safety, that ends with the NSA advising organizations to move away from C, C++, and assembly to memory safe languages. As you can imagine, this was well received and made no one angry. What about this program? Is this program okay? That one's a toughie. Although we are technically not borrowing it mutably more than once at a time, the compiler cannot prove that. It simply isn't able to analyze what's going on here. There's not enough information encoded in the source to let the compiler know that actually calling `clear` ends the lifetime of the mutable borrow from earlier. In fact, if it were able to prove that, that means it would have to keep track of all the individual lifetimes of all the elements ever inserted into the `Vec` at compile time, which would severely limit the kind of programs you can write in such a language. That slightly adjusted version would be perfectly fine. If we get the mutable reference back by popping it out of the `Vec`, then we can push it back in as many times as we want. But not the original version that calls back `clear`. The limitations of the borrow checker directly influence the way our code is structured. This is a hard pill to swallow when you come from a language like C++ and you're not used to rejection. A C++ compiler concerns itself with whether the program is well-formed enough to generate machine code from it. Not whether it makes any sense, although that's hard in any language, but also not whether it's memory safe at all. By comparison, the Rust compiler rejects an infinity of programs that are memory safe and useful, but that it simply isn't able to verify as such. There is work being done on the next generation of borrow checking within the Rust project, which I'm excited about. But before I leave you, I'd like to point out that when the borrow checker gets in the way, and it does, there are escape hatches without resorting to unsafe Rust. I always see this frickin' myth. Rust developers don't habitually use unsafe Rust to bend the rules in a couple places because that defeats the whole thing. You use unsafe Rust to build low-level primitives and then build safe abstractions on top of that. That's the whole point. That's how it's built. If you need an escape hatch, you can, for example, defer some borrow checking to the runtime through reference-counted cells. In this case, it's an `Rc

Segment 6 (25:00 - 29:00)

For example, the `im` crate provides immutable data structures, and there is a variety of small strings crates that store character data inline whenever possible when the string is small enough. In multi-threaded programs, the `Rc

Другие видео автора — fasterthanlime

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник