# Building a Minimal, Rootless Container in Rust—by Carlo Quick—Seattle Rust User Group, March 2026

## Метаданные

- **Канал:** Rust Programming Language
- **YouTube:** https://www.youtube.com/watch?v=scwI-xrawP8
- **Дата:** 13.04.2026
- **Длительность:** 38:40
- **Просмотры:** 5,333

## Описание

Building a Minimal, Rootless Container in Rust

A ground-up walkthrough of Linux container fundamentals in Rust: namespace isolation, rootless execution, and filesystem isolation using the nix crate and the Linux kernel. Inspired by Bento, a from-scratch container runtime in Rust built to understand how container runtimes work under the hood. Built live at the Seattle Rust User Group, March 2026.

Links:
🦀 Talk Code: https://github.com/CarloQuick/container-talk
🍱 Bento: https://github.com/CarloQuick/bento
💼 LinkedIn: https://linkedin.com/in/carloquick
📊 Slides: https://docs.google.com/presentation/d/1_7EwcKPybzNC_Qpz80lYa7p-9LG-ivGlpFoCjNTrVe0

Seattle Rust User Group:
• https://www.meetup.com/Seattle-Rust-Meetup/
• https://discord.gg/4pDnjgaEV6

## Содержание

### [0:00](https://www.youtube.com/watch?v=scwI-xrawP8) Segment 1 (00:00 - 05:00)

My name is Carlo Quick. I'm a software engineer in Tokyo, Japan. And today we'll be building a minimal ruthless container in Rust. Howdy. Okay. Today's talk is inspired by my personal project uh Bento, which is a container runtime in Rust. And even though I'm not talking about my uh personal project today, um by building a minimal ruthless container in Rust, uh I'm allowing you to have like maybe a small snippet into my journey uh into containerization and Linux kernel. Uh I'll show at the end a little bit of Rust Sorry, uh Bento at the end, but that's for later. Okay, so today's uh talk the goal I have for today's talk, the goals, excuse me, are to cover container fundamentals, uh give you the Rust tools to build your own, and a sense of why Rust makes, you know, working with the Linux kernel more approachable. Okay? So, when we talk about containers, we talk about isolation, right? And to get a feel for what containers are, we have to compare it to, of course, something a bit more uh well-known, virtual machines. So, virtual machines are machine-level isolation, right? We have your hardware, hypervisor which is going to chunk up, you know, CPU, memory, and other stuff. And then we're going to have an entire guest operating systems, right? So, that's great isolation, but it's also a little heavy because you have multiple guest operating systems on one machine. So, again, virtual machines are machine-level isolation. Then you have containers. Containers are process-level isolation. You have your hardware, you have your host operating system, kernel, container runtime which is going to maintain the container life cycles, and you have your applications. As you can tell, there are no guest operating systems. That's because they're going to share the same host operating system, uh which is going to, you know, use the container Sorry, the kernel to uh create and maintain these containers. So, again, virtual machines are machine-level isolation, and containers process-level isolation. Whoops. So, how does Linux kernel allow us to do that? Well, we have namespaces and cgroups, fundamental um what do you call it? Uh components of processes. A namespace is going to control what a process can see. Okay? And cgroups are use. So, think of it this way. Namespaces are going to give you Let's zoom out. All right, there we go. Uh namespace is going to give a process a view of the things like PIDs, hostnames, file systems, and cgroups are going to keep it from, you know, eating all your resources. Okay? So, today's talk we're only going to focus upon uh namespace isolation. Uh cgroups are very, very important, but um that's a whole different talk. So, just today, namespace isolation. Um each process has its own namespace subdirectory, where we can see namespaces for its PID, network, mount, UTS namespace, IPC, user, cgroup. And you can tell here from my image, if you can see it on yours, but I can't see it up here, but I uh get a process, sleep it, I go in and I can inspect its actual subdirectory NS, and we see here all the available namespaces for that process. We'll be using these to isolate our process to become a container. Now, we're talking about ruthless containers. So, what's the root problem? Well, traditionally, uh container runtimes will, you know, create, run your containers with uh root-level access. And we are building containers, you know, that have a restricted view of the system. So, then we think, okay, what privilege does a process actually have when we can when we uh create it with these traditional systems? Take for example, I have an image. No, there's no image. All right. Oh, here it is. You can't see it, but uh Sorry. So, by default, without any special configuration, Docker engine, for example, will create your containers using root-level access. And it's really tiny. If you have the slides in front of you, you can see it, but I create a container without any, you know, configuration, a busybox container. I check within the container who am I? I

### [5:00](https://www.youtube.com/watch?v=scwI-xrawP8&t=300s) Segment 2 (05:00 - 10:00)

am root. Level access outside the container, right? That sleep process I have up here is running on root, right? Why is it a big deal? Well, because if this process were to break isolation, then it would be running on my machine as root, which, of course, we don't really want that to happen, right? So, ruthless containers. Ruthless containers run as an unprivileged user. No root required. And uh so, if your process escapes isolation, it's still you on the host. Okay? Smaller blast radius, safer by default. Okay, so why Rust? Um Rust doesn't invent new container primitives. It doesn't replace, you know, Linux syscalls. So, why Rust? Why are we talking about this in Rust? For me, in my opinion, Rust gives you compiler-enforced honesty. So, fallible operations return a result, right? Unsafe operations explicitly marked the boundary between safety and dangerous is visible in code, right? The language doesn't let you gloss over you know, the difficulties of working with the Linux kernel. Can you zoom in? See that? Okay. So, we'll be using the fork uh syscall later in the build, but uh it's going to, you know, create a new process, you know, copy the last one, and it's called the child process. So, in C, calling fork, just calling fork, right? Nothing special up there. Then you have your switch case, right? And you actually have to explicitly handle, you know, the error or the success or default, right? What if you forgot as a developer to do the negative one case, right? Could be pretty bad. But in Rust, using the Nix crate, which you're going to see in a little bit, marks it explicitly unsafe saying, "Hey, this is risky. You now assume the responsibility of calling fork. " And you know from Rust, if you didn't handle your uh error case, if it requires one, it won't compile. So, Rust is allowing us to work with the Linux kernel, uh and making it way more approachable, right? And it's to me more readable. But, you know, that's just my opinion. So, today's tools that we're going to work with, uh just one crate besides the anyhow crate. We're going to work with Unix Sorry, Nix crate, which is short for Unix. Um it wraps low-level C lib C calls into more idiomatic Rust. Um so, for example, you see up here, uh getting a hostname in lib C versus Nix. It's more idiomatic. It just returns a result of an uh operating system string, and it's just better. Right? So, again, kernel is still the kernel, but the boundary is more visible. And you got to handle it. So, finally, let's start building. This is just for the slides. Um I'm going to post the uh We have the slides already, but I'm going to also give you a uh GitHub link later with the actual code in it if you want to try it yourself. There's some things that you have to do to make sure it works by uh app armor allowing for unprivileged users to you know, manipulate the user namespace. Two dependencies, anyhow, and the Nix crate. Let's do this. All right. So, I'm using presenterm to do this. It's actually running Rust script in the back. Um and actually running this code that we're have up here on our slides. So, in the back you can see that some stuff printing out. That's what it is. Don't think I'm just like like, I don't know, printing stuff and saying it works. There's actually a print uh function that's going to print out what we're doing behind the scenes. Just for simplicity, I'm not going to include it, but just know it's there. Print process information. Here's our road map for today. One, we want a ruthless container. Two, we're going to isolate the hostname. Three, we're going to get PID one. So, it's going to think it is the machine. it's the first uh PID in the entire thing. Then we're going to isolate the

### [10:00](https://www.youtube.com/watch?v=scwI-xrawP8&t=600s) Segment 3 (10:00 - 15:00)

root file system and think it's actually at root. So, going to go ahead and do a process baseline. Let's run this. All right, cool. So, small snap snapshot just on my machine UID 1000, host name, current PID, and then current working directory. Right? So, all four of those things should change to make ourselves a rootless container. Let's try and do something containery. Right? Uh let's just try and set those name. Let's change it to my container. No permissions. Right? I can't. I'm not running sudo. So, how do we fix that? I could run sudo. Could run as root, change the host name, but we won't have a rootless container. We're not going to run this with a root. So, first thing we're going to understand is unshare. Unshare is going to disassociate parts of our process's execution. Um well, it's right there. It's going to disassociate parts of our process execution context. Um it's going to take a flag argument, clone flag, and we're going to specify which part of that process we want to have unshared. Today we'll only be using these four um clone flags these unshares. We're going to do a new user name space, new UTS name space, new PID name space, and new mount name space. Okay. Here we go. So, first we're going to print our process isolation, see what it is. We're going to unshare the new UTS name space, hopefully get the ability to set a new host name for this process. Still can't do it. We still don't have access We don't have the privileges to do so. So, now we're going to go rootless. So, there's a lot of stuff, but going rootless we're going to rewrite the process's UID and GID um maps. Right? We're going to get our current UID, our current GID, and we're going to map it. I need to back up. Uh yeah. And uh we're pretty much All of this saying is we're going to give this process we're going to let them the kernel know that, hey, we're going to run this as root, but here's my UID, so you know who I am when I try and run it. Right? So, we're going to go ahead, we're going to set the new UID, we're going to do the GID. Also, I'm going to set deny to set groups, which requires the kernel to is required by the kernel before it accepts a GID map to prevent any sort of privilege escalation. So, we're going to go ahead and root Sorry, write that. So, here we go. We are going to write to its mappings, then Sorry, we're going to unshare our new user name space. We're going to then write to our mappings, and then we're going to hopefully set that new container name. Yes. We got it. Right? So, UID zero, I'm root within the container. I set the host name. So, now instead of into the world, it is called my container. And we still have uh to worry about PID and the current working directory. So, we got a little bit of isolation. Let's make sure and check that the host name on my machine has not changed. Nope, it's the same. Right? So, now we're rootless, and we were able to set the host name for that process. So, two birds with one stone. Whoa. So, we're rootless, we isolated our name space. Now we got to go PID one and isolate our root file system. And we're doing Any questions so far? Before I can go on, that's like the halfway point of the build. Questions? All right. What Linux build? Uh Linux. So, Ubuntu. Which operating system does this work out of? Or server desktop uh Linux kernel. Yeah, please repeat the question. Oh, sorry. So, which operating system uh would be the Linux? So, we're using Linux kernel because these things are available only to Linux kernel. All right. Okay. So, let's do the same thing. Now we're going to do our PID name space. So, same stuff as before, but now we're going to unshare our PID name space. Um it's not going to work, just a little heads up, because when you unshare a PID

### [15:00](https://www.youtube.com/watch?v=scwI-xrawP8&t=900s) Segment 4 (15:00 - 20:00)

name space, the next forked process will get that new PID one that we're hoping for. So, we're just going to do it anyway just to see that it doesn't work. Yep, same PID as before. Right? So, let's go ahead and revisit that fork from earlier. So, again, do our un our fork, just on safe, get our error, and then we will have our new process that has PID one. So, a few more things we're going to add to our code um is a child function that we're going to move our unshare of the UTS name space into the child function, and then within our fork, right, which is going to be below um unsharing our user name space and PID name space, we are going to have our fork. Parent is going to wait for the child to finish before it exits, so main is going to wait for the child. Child is going to unshare our host name, and let's go ahead and run it. Oh, sorry, next one. All right, cool. So, same That's all our code so far. Right? Rootless, PID, so next forked process is going to be one. We're going to go ahead and do all that stuff to see how it works. Nice. So, we went from UID 1000 to zero, so we're rootless. Host name is now my container. PID is one. The last piece of the puzzle is that current working directory. We need to mount a new root file system. Because so far, by doing this, we've been using my own machine's root file systems. LS, my uh what do you call it? My bin, all that stuff on my actual machine. So, we will actually want to now give this container its own root file system. Right? So, here I'm just giving you a little extra code. We're actually going to inspect what the child can see. So, I'm going to pass it some arguments, and I'm going to close those arguments. Exec is going to replace the current image with whatever that is. So, in this case, LS. Is that my mouse there? Yeah, that's good. I'm going to go ahead and run it and see what this process can see. All right, so this process can see everything. We don't want that. Right? It can see everything within my presentation, my assets, everything in my build, all my entire presentation sees everything. Because we want it to be containerized. We just want to see, you know, whatever we want this container to see. So, we want to get rid of this. We want to change, of course, the root file system, which I have a minimal one here on my machine under talks root FS. It's a very minimal Linux root file system. Uh it's busy box. So, now we're going to tell this container, hey, this is now your root file system. Don't look at mine. Look at this one. Okay? now our child function is kind of a little beefier, right? Here is some stuff giving it the new root file system. Right? So, child is going to do the same thing, but except now we're going to actually mount a file system. Right? We have our new UTS name space. We're going to isolate it. Then we're going to unshare a new mount name space, and we're going to give it root file system location for the container. It's going to be a bind uh bind mount. And then we're going to change this process's or this container's uh root directory. We're going to say, "Hey, your root directory is now where I say it is. " In this container directory, where we are mounting our root file system you saw just a second ago, the busy box one. And it doesn't have a current working directory anymore, so we have to give it one. You're at, you know, root. Okay? And then we should have, hopefully, the finish line. Right? We go rootless down the beginning of main. Isolate the name space PID, PID name space, excuse me. Fork, give it LS, going to do a new host name, and then unshare the mount namespace. Lovely. We have a container. Yay.

### [20:00](https://www.youtube.com/watch?v=scwI-xrawP8&t=1200s) Segment 5 (20:00 - 25:00)

Right? So, let's rule this. New host name, PID one, and it thinks the current working directory is root. And for that, its root has that root file system that we just shared with it from busybox. Which it What? Do you know can you drop the right permission? Can I Or what you had or have your file system you know there's like a way to I know you can drop like syscap and like Yeah. Do you drop rights? So, what when you go Oh, can you drop right permissions? So, what you're going to want to do later when you get more advanced, you're going to do an overlay file system where you're going to give it a lower one that says, "Hey, this is read-only. " Which is going to be the root FS that we just saw. And everything else we can give it a work directory and we're going to give it a lower directory. We're actually going to mount, you know, stuff from the image and stuff like that. So, yeah, we will. We'll have to indicate that. Did that how did I say that you get if that had busybox? Yes, exactly. So, the LX that So, which is a the LX that executed, which LS was that? Was that on my machine, like my root file system, or was that on the busybox when we gave it? That's on the busybox one. So, when I ran LS, this is what it saw. And it's from its What? Is it been LS? Is where it gets it from? Right? So, yeah. That's the one we just gave it, the new root file system we just gave it to it. So, we have a container. Right? Not the greatest container, but hey, it's a container. It's isolated. Right? Yay. Woo. I would have done Hey. — So, we built it in Rust. If you saw the whole thing, it's less than 100 lines of code to do it. A lot of that is just um the process printing function I have there. But we did it. We built a container from scratch in Rust. It's rootless. Right? It's a high isolated host name, PID one, and it has its own isolated root file system. Linux namespaces, fork, and Rust. That's all we need, those three things. Right? And So, let's bring you to Bento real fast. Just want to shameless plug of my own project. Um Bento takes um OCI spec images from Docker Hub. Um I set the namespace, um implements an overlay file system. Uh copy-on-write layers. Let me show ahead go ahead and show it really quickly. Um it's super fast. We got to pause it throughout. But uh so, my commands create busybox container with image busybox. Come on, work for me. All right. So, create the container. Come on. What happened here? Come on, BL seat. Check its status. — [clears throat] — Hey. Created, no PID yet. I'm going to go ahead and start it behind the scenes. So, behind the scenes, right? Who am I? Me. PID, current working directory. Um I started behind the scenes, so my status is now running. Here's the PID. And I'm going to exact into it and give it the command sh. Running roots. Check out what it actually sees. Uh today? Okay. So, it's root. PID is two. And its current working directory is at the root level. And I can kill containers and remove containers in Bento. Isolated run. I slow it down. Goes too fast. I'm not sure what I'm supposed to do, but it'll do it. Cool. Today. Okay. There we go. It's completed. And that's it. And we did. Yes. Any uh surprises as far as networking, device pass-through, sys, proc file system, sorry. Um Deep crash. So, any surprises with network pass-through, sys Networking. Device pass-through. Sys, etc. Proc

### [25:00](https://www.youtube.com/watch?v=scwI-xrawP8&t=1500s) Segment 6 (25:00 - 30:00)

sys, etc. Um there were lots of surprises. Um lots and lots of them. I'm not sure um how to answer that. Um But I'm just wondering uh like if I were particularly thing I wanted to Mhm. Are there things I can't do right now in but the way your Bento is? So, yeah. So, Bento in its current state, you're asking what limitations there are to Bento in its current state. Um for the biggest one is that it only controls namespace isolation, so I haven't gotten yet to see groups. Um so, it can kind of run wild on your machine a little bit right now. It's only an educational project. Um Uh I don't actually have a thing what I'm going to be into implementing Oops. Next. Um so, my future plans are for C groups. Uh I don't have network namespaces yet. Um and I want to actually split Bento into two crates. I want to have the BentoD, which is going to be the daemon, and then Bento itself to be able to uh be the container runtime. And not have it yet, but pseudo terminals. So, um that's the one thing. For example, if I were to uh create Bento with uh sh, right? Being its first uh command, it will uh kind of has an error because there's no pseudo terminal. So, I have to actually close that terminal and exact into that container to be able to do anything with it. So, yes, there are limitations. It's only a educational uh project at this point. Um But yeah. Yes. So, I work somewhere and my using Bento. Something called Bento. Oh, really? Yeah, so we're near like uh it's a reasonably large company with AI. Uh-huh. Buddy, and um Oops. — And uh I'm trying to Yeah. It's not my Bento, I don't think. I only have 23 stars, so. — That'd be cool if it was. So, if anyone uh holler at me, but Okay. — I searched that name pretty uh pretty thoroughly, so maybe someone does have it. I don't know. So, you didn't tell project. Uh that'd be I don't know. Yeah. I don't know what I don't know which I put, but — Some guy I wrote it for free or Okay. I don't know. Um but yeah, that's uh Bento here my rep Yes. Um Uh do you know what happened to you? Confusing surely volume out. And then Native operating with file system to the container. Mhm. Wrote the new file. What for? What UID would that show up on? Good question. What Yeah, so if you were to create a new file, you're saying Yeah. Yeah, essentially like it's just like what is that mapping from container UID to uh native operating system. Yeah, so if you were to create a new file under the new UID, like in the container as root, what what's that mapping? That's a very good question. Yeah, and then for a volume mount. Mhm. That's a good question. I don't know how that would be handled. Uh I just got that speaking my question was like this. You in Docker I guess when I'm up here in the worst Yeah. You run a Docker dev container and then we volume mount the repo Mhm. Um but I personally can take for reasons, but It's Let me start over. Um so, Let me start again. There we go. Okay. Let me start over again. Um so, but I guess my question was the interesting problem at work and then I imagine other places run into it is you have a dev container for your repo and then you start the container and volume mount the repo in the container. — Uh-huh. Like the particular reasons at my work we have to um run the we're running the Docker containers with superuser. And then I thought it'd be cool if inside the Docker container superuser and then outside it was just a normal user. And it'd be cool if like the the some sort of the mapping that would just make that all transparent. So, outside of it, files to are handled by normal user and inside of it, files are touched and handled by superusers. I don't feel like Just a thought that Okay. Really cool project here, so thank you. Um I do know like for example you said that you can run containers as root Docker engine just few like super simple configuration to make them run rootless containers

### [30:00](https://www.youtube.com/watch?v=scwI-xrawP8&t=1800s) Segment 7 (30:00 - 35:00)

but it's not default out of the box. You still have to like go in understand what rootless is and make them so you don't have to actually run them as sudo. Yeah, well so to add more to it it's maybe it's just a matter of I need to look into this, but or the application is just like running in mini queue and then launched by helm and it's this whole stack of things that I'm honestly not an expert at. That's but anyway I just I thought I'd bring it up cuz I think I just like think your project is really cool cuz it just it just sort of out of the box is trying to advertise this layer of safety which I hadn't really thought of in too much in depth. Yeah, thank you. Uh yeah. Thank you. Yep, so the just so you need to hit the IDs of users and groups I wrote trust away. Yeah, it's a questionable too. There's another library called P root does the same thing and so action lets you start as non root but inside simulate groups and so for user management basically anything you can save as a text file and so when you go back in there it still keeps user IDs. So everything still running as non root thinks it's running as root. So it still saves those user IDs inside of that day. Uh and also just so folks know user IDs and group IDs are manually settable regardless of if the group or the user ID exists on the system. What we do is when CI we have a specific I think it's like 65,000 or something like that and we just set everything inside the container is owned by 65,000 and we use that everywhere and we static we set it in all of our images so therefore it all just kind of works. Um so yeah you can you don't even have to go through like mapping. There is hit mapping that happens in a container like root in the container is not root in the operating system. There's a map there. If you check out Sysdig which is free to install Sysdig is the command it'll actually it does this whole like container jumping between C groups and name spaces and all that. But anyway long story short you can statically set that somewhere as like a first try and you can CI and then it'll just be the same in all of your containers. Very cool. Thank you. Thank you Alex. Okay. I actually do have a little bit extra time. I had something else I wanted to do really fast. Um I have time correct? Okay. I want to have a quiz. Who's this person? Okay, start that. So that let's see. Cell phones cell phones or laptops. Go ahead and start it. There we go. So either go to kahoot. it. I think I'm also going to give you a yeah QR. So what's at stake here? Big stakes. I'll let you guys do that first before I walk over there. Nice. Top number one question answerer number one person gets a shrug sticker holographic one laptop sticker the shiny. And then number two and three shameful non shiny shrug stickers. Wow, right? That's what's at stake laptop stickers. I'm going to go maybe 30 more seconds. Anyone need time at all to get in? Anyone need time? 10 seconds. Six five three two one. Can I start? All right, here we go. Oh, yeah? All right, here it is. Here we go. We're starting. Which one is it percent? All right, first question. Containers provide which type of isolation? Motion emotional isolation machine level process level C level sandwich level isolation. I'm going to put on the two. All right, please pick it. I wonder which

### [35:00](https://www.youtube.com/watch?v=scwI-xrawP8&t=2100s) Segment 8 (35:00 - 38:00)

which answer kids I'm going for sandwich level. Two. I'm going to give you back this. Nice. I pick that two for sandwich level. It's opinion based I guess. All right, yes process level isolation containers provide process level isolation. Next question. Also I think Wi-Fi speed is the differentiator here too. Name spaces control what a process can pancakes use hear or see. pancake s news hear or see. It's weird. Yeah, definitely. See nice say done. I was also pretty hungry when I made this quiz. Pancakes the same with this sound. Two be fun. Cool. That's moment go bad go bad. Next question. Compared to C how does Rust handle risky sys calls like fork? Prevent you entirely from calling them. It allows you to eat more chicken. It forces explicit error handling via result and match. It replaces them with safer alternatives. Yeah, again I was pretty hungry. Forces explicit error handling via result and match. Boom. What type does fork return? Result fork result option PID I32 unsafe IDK. Hope you're paying attention to my code up there. Yeah. Boom. Who is this Alex? Alex cooking. What makes Rust approachable for Linux kernel work? Safer sys calls than C. Dangerous boundaries more visible in the type system. It hides unsafe operations behind bars. Replaces the kernel C interfaces with Rust ones. It makes dangerous boundaries invisible in the type system. Cool. One. Who gets the shiny? Not shrugs. Oh, you got non of shiny. Jason non of shiny. Who gets the shiny one? Alex. Very cool. I'll get you guys during the break by the way. That's it. Thank you very much. Thank you Carlo. Round of applause cuz I yeah. — Okay, let's take

---
*Источник: https://ekstraktznaniy.ru/video/51624*