Welcome to DevOps and Docker talk. And in this episode, I'm taking a clip from the live stream I did with my co-host Nirmal Mehta. Of AWS and I bring him up to speed on something. I've spent weeks on working with Eric Smalling of Chainguard, that's the Zero CVE image security company for Docker images. And Eric and I have been working for a couple of weeks on trying to create a piece of training. I wouldn't call it new 'cause this isn't a new issue, but something I'm calling silent rebuilds or more accurately silent upstream base image rebuilds. That's a long time. I like silent rebuilds better. That has a better ring to it. But this video is gonna be about a problem that I really want to call attention to on your path to basically having the least amount of CVEs in production as possible without having to have to do crazy amounts of work or custom home built images that have little parts taken out of it or something like that. Like we all are probably using, a majority of us at least, are using upstream based images from Docker Hub or from GitHub or AWS's public or GitHub's public image catalog or Chainguard or any one of the companies that are now providing not just base images, but hardened base images. These all are upstream controlled, meaning that we don't build them ourselves. We typically rely on the vendor to provide us that base image. It's usually a base os of some sort, whether that's Debian or alpine or maybe even Ubuntu or Wolfie if you're from Chainguard, which create, they created their own custom base image, and there's all these base images, and then there's usually something on top, like if you're not just doing a generic base image, then you're probably doing something like Python or Node. JS or something else. That's a programming language, which has its own libraries in dependencies. And then maybe if you're doing something on top of that, like you might be doing a WordPress site or a Ghost blog or Drupal, and so there's another application layer on top. Or maybe you're building your own custom images regardless of what you're doing. That base image is the cause for years and years of us talking about how to reduce image size. But more importantly, CVE Count. And CvEs in production are really all that matters at the end of the day for us DevOps people because something in test or staging is obviously important too, but really what really matters is making sure that production has the least number of vulnerabilities as possible. And we typically only have vulnerabilities in production even when. we're thinking about them and trying to reduce the number. We're only able to get our production total, like all the servers or the entire Kubernetes cluster or however you consider production. We're only able to get that number down to so little. Like it's really, really hard and challenging and difficult to get zero CVEs everywhere all the time. Like I actually have never seen it. I've never, at all the government agencies I've worked with, all the big companies, the small startups, everyone I've worked with, there's always something. So you have to do a lot of analysis and there's all this work involved. Well, regardless of all that, the base upstream images that you're using today, probably if they didn't come with CVEs on day one when you first started using them, the longer they sit somewhere, on any server running, the more likely they're gonna have more CVEs. And that is the topic of this conversation. And I wanted to call attention to a particular style of CVEs, sneaking in to your images after you've deployed them, because CVEs are discovered all the time for existing code. That's how CVEs happen. That's how we find the vulnerabilities, and then eventually patch them and eventually roll out the patch. Right? So there's a strategy to all this. There's multiple ways to update your images, which are really the production artifact. And in this conversation, I'm essentially explaining the work we've been doing on how to provide you a open source guide to the tooling and basically like a prescription for using GitHub Actions. 'cause that's my favorite. Hey, did you know I'm making a course on GitHub Actions? I'm making a course. Uh, you can sign up below in the notes. Uh, there's a link somewhere in this video and you can go check that out. But as a part of that. We were making this piece of content around Chainguard images specifically, and I started to imagine a bigger solution to this problem. And we are creating that for you. I don't really have a name for the solution other than a series of GitHub workflows and automations that are going to ensure from multiple vectors that your container images actually have the least CVEs that you can currently get. And that might be in the form of a Dependabot or a renovate tool, and that might be in a newer tool from Chainguard called Digestabot.
Segment 2 (05:00 - 10:00)
And it's not a very popular or well-known project, so I want to call attention to it and we talk about what it is and the, the problem and the solution, So let's reduce that down. Let's create a strategy. That people can implement, so I hope you enjoy this episode with Nirmal and I talking about silent rebuilds. I. hi. Hey man. Hey. I am excited. You and I have been talking about image Docker builds and Docker images and security and slims versus Alpine and CVE scanning and all that. We have been talking about that for a decade, Correct. and it still feels like I find new things learned things that I realize are way more important than I thought and that I've not been considering and that I should be solving for my clients, my students and my courses. we all know that Docker Hub has official images, over 200 images at this point, I believe, that are all very popular. Open source projects, languages, frameworks, and these things, even in the Alpine or the slim images come with an OS underneath them, it's either Debian or Alpine. And then, we've had other attempts at like base os container standards. over the years we've tried to create like container oss, Chainguard created Wolfie, which is essentially their container os and we've had them on the show multiple times to talk about like what Wolfie is and how that helps them create zero CVE images. we don't yet have that for Debian, for Ubuntu, for Alpine. I'm pretty sure that if I pulled down any image, running Debian underneath that, there's gonna be some vulnerability probably. Like, it's gonna be pretty rare when I'm actually zero. And that's just kinda like the nature of Linux right now. we can't update and guarantee and fix and then update and then roll out updates of everything all the time. So there's always, everyone's always managing like somewhere between one and infinity of CVEs in production, At any given moment. And the goal is to get that thing down as tight as possible. on a rolling basis, and the reality is, is it's all a moment in time. Like the minute we scan something, it's only whatever, you're, you have to trust a scanner because they all, they're all different. Some scanners are better than others at finding things. At that moment in time, you have that security stance of how many CVEs you have. A minute later, you're now outta date. So you, you do your best. You take snapshots, your security team maybe scans once a day or once a week or once a month, or not at all, but maybe someday they'll scan. Or you scan only in CI and you kind of ignore it once in the go goes into production. So there's all these variations and everybody's trying all these different things. But I think when we all talk about containers, we all tend to agree that you build a container, you want the smallest image you can get away with. If you can go distroless, great. If you can, do something like Chainguard or a, a paid hardened image where you have support and guarantees, great. But that image is only great on the day that you last looked at it. Like when you last certified it or last scanned it and approved it every day since then, it's gotten worse, right? And the only way you're gonna know how worse is to re-scan it. So. We, we now have these production tools, like you can use Trivy to an open source tool for scanning for vulnerabilities on in images amongst other things, but in images. And then it can now scan a Kubernetes cluster. So you can get a posture of what's my images throughout the cluster and what are the CVE counts in production in real time. 'cause that's probably different than what your CI is telling you or what your, you know, your local machine. So, so again, I think these are all known things. I think most people that are running Kubernetes today understand that, at least to some degree, Yeah. and they know that if I pin, which everybody tells you to pin, don't pin the latest pin, pin your images to Python three point 13. 3 so there's some determinism, like you're trying to achieve some determinism with the versioning. Like what you can kind of control what's inside your container in that Yeah. if you came to me as a boss, might come to their dev engineer, their DevOps engineer and say, I want you to give me strategies so that we can implement projects that will reduce the CVE count across our production infrastructure by 90%. Like my target, my stretch goal is 90%. My goal is 50%. Please implement, give me a strategy. And so you're gonna end up with these strategy plans around, obviously there's the host os and that's a whole other thing. I'm really just gonna talk about container images, but that's not enough because minification reduces the amount of blast radius, but you still have to have things like the application layer dependencies. Like if your developers aren't updating their dependencies in
Segment 3 (10:00 - 15:00)
their app, their gems, their jars, like they're not updating all these things, then I, as the infrastructure manager can only do so much. So Dependabot and renovate largely are the automation tools we all use to solve that problem for application developers. For US infrastructure people, it's all about minimal base images. Like that's the best we can add to that workload and say like, yeah. So like where a lot of people move to Alpine or they buy Chainguard yeah, so, or they go to, at least they go to Slim and they learn about slim versus like no normal Python or normal ruby images or whatever. that again, only gets you so far because the minute you move to that slim or that minimal, it's really only about how fast you update that when a new one comes out. that becomes the next big problem. If you update today and you're on this minimal image, and you're super lean on your slim images, great. But tomorrow your CVE count probably went up in production or the day after, it'll go up one, one of the days here. Pretty soon you will have more. And so your job then is, okay, how do I update faster? Like if that lower image, if that alpine base image three point 12 or whatever is in all my images as what we call the base image, then maybe my job is now to make sure that's always fresh. That's always the latest version. Yep. Dependabot and Renovate can also help with that. They now can manage helm chart updates automatically. Kubernetes compose Docker files, customize, you know, like basically any way you wanna deliver a container and infrastructure code. Those two tools will tell you about a new version. Right So when you go from Python, three point 13. 0 to Python, three point 13. 1, presumably that 0. 1 was a CVE fix or a bug fix. Maybe not a security bug, but a bug fix. something. Hopefully nothing will break, but something is fixed. And so generally, a lot of teams either have the stance of, we're gonna pin to Python three 13 and we're gonna have Dependabot and Renovate just kind of run every day. they tend to run only once a day. So the best I can get is the day that Alpine releases a BA new base image, or Ubuntu, or Debian or the new, the latest Postgres, like whatever that base image gets updated to a new version. Those two tools, one of those two tools will tell me they kinda do the same thing. that will gimme a PR and then it's all about how fast can I click the button and then deploy. So you're shortening the window between CVE detected and CVE fixed in production? Right. that is like half of my talk is to say, this is where we were. Now the line has moved. It was always moved. I just didn't always know it. It wasn't until about do you, wait a minute. Okay. Because what you just described, even for organizations today would be, would be great, would be sophisticated. it, but it's not the pinnacle. It's not enough. There are things happening and I'm calling them silent rebuilds. This has always happened. at some point in every container engineer's career, they realize this is happening because it's not shouted from the rooftops, I don't think Claude and ChatGPT, neither one of them could find significant discussions about this problem anywhere on the internet, except for one tool from one company. Chainguard made Digestabot. Digestabot like Dependabot and Renovate, but it only does one thing okay. Python, three point 13. 1 Okay. rebuilt By the person that owns that. Yeah. by Docker Hub, that image doesn't stay the same. If you were to go and download Python three point 13. 1 today. Mm-hmm. And you then downloaded that image in a month, would it be the identical image With just that tag, not like the sha the most specific tag you can get? okay. I, I'm not trying to put you on pressure. I'm not trying. No, no. I know, I understand. Let me, I want to make sure that we're explaining this properly. 'cause there's a gigantic amount of nuance about what you're about to talk about. So this is what happens when we don't pre preuss our show. Okay. So. I think, I know, I think I'm not gonna spoil the ending here. I think I understand where this headed, but just to reiterate to our audience you're talking about, like, if you go to Docker Hub, right. and Python's a perfect example because there's so much machine learning stuff going on I am picking on it. tho those workloads are insanely sensitive to the versions of like minor versions of a bunch of libraries. Right. So I've definitely been in this position where I need a very specific version of Python in a container. So I go and I say from, you know, Python, what did you say?
Segment 4 (15:00 - 20:00)
13 point something. Yeah. Three 13. 3. And so, you know, that's what I put as my base image. And then I, do QEMU, CUDA and all this other stuff that I need to put into like the notebook or whatever, container image. So what you're saying is I go ahead and build my image with that from, that base image. I am assuming if I change something in my application code, but I'm not changing that from base image, but I don't have it in my local registry. cached. I do a no cached. A week later I rebuild. In theory, it should be the exact same thing, except for the application code changes. Let's let, actually, let's just assume I don't even touch anything in the application code, right? Like all that's the same. You are saying that if I, the next day or the two days later or a week later and I redo that. It's non-deterministic. That build is not the same, even though I've done all the best practices to ensure that I'm thinking, making this as deterministic as possible, that's what you're saying. right. I will find out that there's differences for some reason. Yeah, if you were to manually hash the image, it would be a different hash. There's a reason, Okay. and that is because a little, a very, I think a very underrated, little known fact. we know that tags are, mutable. all official images. As far as I know, all official Docker Hub images are mutable. Docker Hub, as well as, harbor, both have the option for you as an individual in your images to make the tags immutable. Which means the minute I make that tag for any image, I can never reuse other image built. that's a new option in Docker Hub. You can go in and say for my organization and make them immutable, but official images are mutable. I think that's also an ECR. It's also in ECR. So anyways, keep going Yeah. It's probably gotta be, I mean, because Python runs on something in that image, whether it's Debian or Wolfie or Alpine, Python changes at a different speed than the underlying. right? don't tag, and in most cases, there are some that we do, some of the official images in Docker Hub are tagged where you can see the version of the, the app that's running on it, like Postgres, and then the version of Alpine underneath it, at least two minor version, not patch version. Okay. So you could then pin to, and then when we have Debian, a lot of times what you'll see is like we're, we got bookmark and then we're new. That's the version of Debian underneath. And then there's a new version of Debian. And for some reason we don't use the SemVer tac versions of Zambian. We always use it. It seems like at Docker Hub, we always use the, what do we call those? the friendly names, the fun names, the code names. I don't know what you wanna call 'em. The handle product names, I dunno, So we've had bookworm, yeah, we've had a lot of other ones. and those two move at different speeds. So you could then try to pin to a tag that is both the version of Postgres that you might need to use and the version of Alpine that you wanna stick to. Because you want to be as deterministic as possible, but even that's not enough, because underneath each one of those tags, even though they're pinned to both, it's still mutable, meaning that Docker rebuilds the image when the underlying os part changes and they don't tell you about it. This isn't a conspiracy. I'm not, I'm not saying they just don't have a good way There's no UX for it. Right there, the UX is staring at digest lists. That's the UX is looking at sha1 hashes and timestamps, and knowing when you last pulled it versus what the one is today, Mm. again. Okay. And just before the show, I'm gonna shout out to Eric Smalling at Chainguard. he let me know that their particular tool chain control keeps, they have this archive of tracking every digest for every tag. There's a limit to the re, the, to the what they do, but they track this. Okay. So I will just prove this theory. Here's their chain control tool that's going to pull their history of them essentially pulling, I don't know exactly how they get this data, but my guess is if I actually spent the week building a tool, I have all these tabs because I was using Claude Code to build a tool. I'm calling Tag Tracker, which will do this for me. So it's going to run every day and it's gonna download all my favorite images and all different variations of their tag, whether it's Python three, Python three 18, Python 3 18, like it's gonna get 'em all. And then it's going to track the digest. And if the digest changes, it makes a log entry. Okay? So they have this tool and they already do this. we've been doing this all along. And so the Python three 13. 8 image, the most specific pen image you can have, has had four builts in the time that it's existed.
Segment 5 (20:00 - 25:00)
So it was built, ten seven. Ten eight, ten nine and 10 13. So which one are you running No you decided to pull Python? 3 13 8. In production? have to look at the digest and compare. Yeah, now good news is Kubernetes and Swarm. The two orchestrators that I care about. they are smart enough at least to resolve that tag to the digest. And the digest is the you. For those not aware, the digest is a content guarantee that you will always, if you use the digest, it's in theory and nearly impossible to have a collision of that name. And so it is a unique content addressable. String. That was the whole premise and basis for how we build Docker images and how we store them in registries and how we run them. So the cool thing is Kubernetes and Swarm both will, if you say, give me this version of Python, they will at least ensure that the exact same digest is on every node. Right. So that like when you run it, you're getting I didn't even think about that because you could have had, like if you were running, if you were spinning. by the way, didn't If you were spinning up a new node, like you had a node with this Python app on 10, 0 8, and then in your EKS cluster or whatever, you spun up a new node the next day, yeah. it could have, it could And this used to be able to happen. There was a day where the orchestration was not always resolving the digest before it issued commands. To each node to download an image, right? Like it might have been resolving individually but at some point you have to resolve this inside of container D and Docker d and like the low level tooling always does this because it has to pull down these tar balls and it has to get 'em from a source. but somewhere there's a human that types in a number and then the computer converts it into the digest at some point and everything we use. So then what happens is when someone learns this, they go, oh, well, I was told to pin the digests. So they can use tools to go in their Docker files and then pin to the digest. And the way you can do this, I actually made a blog the challenge there is we want human readable version tags, right? Like we don't communicate, like, I don't say, oh, you need to, you know, the prerequisite for this is Python Digest, blah, blah, F1 three or something. Like, that's not how we communicate about versions or what we need, right? yes. And that is the problem that the way you get around that problem, at least, is how I do it, is some people don't realize that in the from line you can specify the tag and the digest together. Now, when you do it this way, the tag becomes useless to the machine. The machine does not care about the tag anymore. you could put whatever tag you want in there, but the tag is like documentation. It's like a comment for the humans to know that this is the tag that I intended and the digest was resolved by the computer. Now there's tooling that can give you this sha hash, and then we'll update when the Python version updates, it will make sure that the sha hash is the correct one, but. none of these tools we've mentioned so far will, on ten seven, you have this di like if I'm running arm on my servers, let's say I'm really cool and I'm running the latest arm image. So then, so when I created my cluster and I deployed, I was on this one. But then I, you know, a, a week later I realize, oh, there's this new version. You need to be sure that you're on latest build of every image, even though the tag hasn't changed. you need to be aware of this. You need to have PRs that are automated for this, and you need to be redeploying these. Okay, so if you don't know about High Fivers, this is a little ad. high Fivers is a group of DevOps professionals and we all meet once a month in, discord, and we have a video call and we complain about our jobs. talk about DevOps and the problems we're having and the solutions. And we talk about Kubernetes and containers. So if you're interested in high fives, you can be a member in this channel. And pick high fives. You can join us once a month. It costs it for a cup of coffee. you get to join and learn. And that's the whole point I have that group I'm thinking of renaming it to the Agentic DevOps Guild to make it sound like we're some sort of superheroes. But that's a thing we do. So yesterday, we did that for this month and I was listening to one of our regulars, and Brandon was talking about having this exact same problem at work, but not realizing the core problem of it and the actual best solution of it. And so what they had was they were going microservices. When they go microservices, the effect of that sometimes is if you get then they were going so specific with their microservices that you could, they were making a microservice per verb, not just per endpoint, but per verb. Wow. Yeah, I mean, Carpe diem, so.
Segment 6 (25:00 - 30:00)
that is not a wrong decision in the right positions, in the right state. But what happens is, is that means that you're writing code that doesn't need to change very often. So you're pinning to whatever, let's say it's a Golang thing, right? You're pinning Golang, or you're pinning, Java. And so you have that image, but the code's kind of done, like we've got a thousand lines of code or whatever we're kind of done. And so it sits there, it becomes stale on the servers. For lots of us that are deploying cloud native apps, the developers are constantly developing, so the images get refreshed pretty quickly all the time in production. So they don't age. But what about your stat, your demon sets, your Postgres, Yeah. Like everything, there's like all this other stuff There's this other stuff that just sits there. every day it sits there running. It's getting more CVEs over time. there's no way to avoid it. I don't care what you're running, it's going to have CVEs eventually. So, so we get back to this core problem of this thing is being rebuilt for the good of the internet, and no one knows that it's being rebuilt. That's why I call it Silent Rebuilds. Interesting. I'm clearly on a soapbox moment, right? Because I have known this for five years and I haven't cared enough. And it wasn't until Eric and I started hanging out and talking about what are we gonna do for this piece of content that we're gonna create together? And I started to care because I realized that I could buy all the Chainguard images and spend however much money I wanna spend, unlimited amounts of money just to buy every image I could possibly imagine. And then I could deploy those into production on day one with zero CVEs. And on day two, they now have CVEs. What do I do then? What's my plan? Yeah. And Chainguard, their answer to that is they literally rebuild every day. Like according to Eric, I'll put 'em on the spot. They rebuild every day. They are just constantly rebuilding. so, if this is, doesn't seem interesting to you and your operating container environments, this is super important. ' cause this should be part of that strategy that you started this conversation with, right? Like, how do you answer that question if you're being gold on be being in a better security posture, which part of that would be removing as many CVEs as possible in whatever window of time, you're tracking it in. yeah. And if you wanna learn more info, because I think you just barely scratched the surface on this topic with me today, so I appreciate that. to me it feels like it's a very useful tool, solves a very specific niche problem. It is very much a Unix type style, you know, solve one problem, do it great kind of thing. And, but I feel like in the quest for us moving to zero CVE, like the eternal quest that will never, shall probably never be fully realized. The north, the North star. Yeah. The North Star, yeah. CVE and CVE publication is random. CVE fixes are random upstream happens at random times. So at any minute of any day, there are many things in the works. And eventually it arrives at the base image that you are not building, but you are using some from some base image provider. And that moment. Starts a race to, in my, when I imagine this, it's like, I, I watched the F1 movie this summer. It was abr fa great Brad Pitt movie. Loved it. Of course, if you gotta be a Brad Pitt fan, but it was, I think it was the movie of the summer. It was fantastic. and so I'm thinking of a formula one scenario of the minute my upstream image is rebuilt to remove that CVE, it is a clock that is now my clock of how fast can I detect it, PR it, and push it to production. Right? And that for some teams is never, it never happens. but as the maturity of your team improves and you get more automated and more sophisticated, and you become aware even that some of these mechanisms exist, you start shortening that window to the point where ideally we're gonna get to someday where things aren't polling anymore, they're, everything's web hooked. Everything is, is chained. And. The way I understand, there's a great video on Chainguard about how they build out their entire internal infrastructure. And it's largely based on GitHub Actions, a little bit of Argo, CI, but it's largely GitHub Actions and they just, they are, they build from everything from source. They build up everything deterministically. They can truly do reproducible builds unlike a lot of what I see out there. and it's all very sophisticated and ex and like top tier expert level stuff. The rest of us aren't that. Right. like that. And there, and they have other competitors like obviously Docker has hardened images So there's a lot of smart people, but you and I are not the smart people necessarily. We don't have to be that smart. My point is, is that they, they're doing that hard work, but because they're all providing us these updated digests because they're rebuilding it like it's now on us.
Segment 7 (30:00 - 35:00)
now that I've told you, it's on us because we now have the knowledge that it is in our job now to detect that difference and then do something about it. I've gotta draw this diagram. I this is like the day the tag is invented and it's three point 18. 9 the day that it's invented the epoch. And then there are events that happen, downstream and there's a day where, that new digest happens, and then there's a day where you actually deploy it. And I don't think this happens in a lot of organizations at all, because there's never an opportunity for them to update the digest of the same tag because we've been yelling from the rooftops pin to the sha hash pin to the digest. But when you do that, you've now actually created a new problem because you're forcing yourself to a moment in time with a set of binaries that every day after get worse in terms of security. And so they're not aging like fine wine. They're aging like spoiled, spoiled think there was good intent behind that, Oh, yeah. you have to still do it. removing non-determinism is a good intent Right. I get what you mean, but pinning is like half the problem. That's what you're kind of saying, or half of the solution, It's like you unintentionally create a side effect when you pin to a digest, and that is there's no opportunity for your existing tools to deploy something newer. So like if you went to like three 18. ' cause a lot of teams, especially I see that node teams, especially teams that are just using Node to build a front end system, they're not actually running node on a server. They just use it in CI, they pin to the minor version. They might just pin to the major version on a node version because they only need it to do JavaScript compilation and CSS ification and like, stuff like that, right? So they don't really care as much as long as the original open source team obeys, SemVer verb rules and all that stuff, right? they don't break anything. the problem is when you pin to that thing where, you do this and then. It looks like that, is that Docker and Chainguard and GitHub and everyone who, you know, and your, AWS's public images, like I bet you they're all doing the same thing that Docker Hub originally did, which is, well, we've got a new version of the underlying os. Let's go ahead and reduce the CVE count, let's do the right thing and update the tags with a better CVE count. that's the right thing. But there is very little indication in the UI that ever happened, right? It's not communicated with the right language about the deterministic nature of that build right? or the reproducible Because you can see, when you look at all the tags, you see dates, you're like rebuilt. You can see all these built yesterday, built yesterday. but there's nothing that indicates this is the third variant or the third rebuild of this image imagine if it said like every time it rebuilt, it added something to the tag. Like build one, build two, Build three, build five. Yeah. Yeah, which they just don't do. And maybe they don't need to. Maybe that's not something we want to get into because again, that doesn't really solve the problem. It just makes you more aware. Like when os package managers rebuild an app, as far as I know, like an apt when curl is changed in some fundamental way, it gets a new version. 'cause you can, if you look at like app tags, they're, it's like the curl version and then something after, which is kind of like the app to. Variant. I, I think, and at least is how I understand it. So I actually think that this is a pretty unique problem to container images because they include app code independencies and os code dependencies or os package managers. I think it's because of the two you could be the most secure person in the world and if you don't do this particular step, you're still gonna have more CVEs in production than you should. it's actually very little work to fix this. It's just knowing about, it's the actual, I think, problem for everyone. Well thank you so much, Brett. Now I know what silent rebuilds are Silent rebuilds. and to everyone that's I'm gonna, we're gonna see if that sticks. I even made a little graphic, Ooh, well now it's done. Now it is for sure gonna be a thing. I tried to get chat g PT to make this image, and it kept getting it wrong. So I had to go manual and, use skeleton draw, humans are still valuable. like a true engineer. I used a basic diagramming tool to create a logo. okay. That was the stream we did. But since then I've had some updates that I wanted to tell you about real quick. First, I've released a video walkthrough, a repo, and a long blog post breaking down the problem, and a detailed series of solutions using either renovate pena bot or a bot to ensure you're getting PRs for all the silent builds. Second, I did more research and confirmed again that the official Docker Hub images are rebuilt to a different digest for the same image tag on a random schedule.
Segment 8 (35:00 - 36:00)
Remember, official images of open source software are usually volunteers in open source repos making these changes, and each of these official images had their own repo dedicated to the Dockerfile that builds and pushes the image to Docker Hub, and those images are silently rebuilt. Typically, like any GitHub Actions workflow, whenever there's a commit to the release branches. Now those commits might be something that could reduce CVEs or it could just be an update to a read me file or any other reason you might make a commit to a repo. The real point here is that we don't know why these images are silently rebuilt and there are no tools currently to help with that, which means that we have to assume that every rebuild of an image tag with a new digest is for a good reason and that we should deploy it with the same mindset that we deploy updates that do change the SemVer tag. We just don't know why. Unlike the SemVer changes, again, the solution isn't hard. Just implement, renovate, or depend a bot with the right settings, which I show off in the repo, and they will both ensure that your base images are using digest, pinning and that they're checking every time you run 'em. Recommended daily for silent rebuilds of that tagged to a different digest. I like these options because they bring your container dependency updates into the same tool and process that you check for application level dependency updates. In my example repo for this solution, I have the settings you need to add to each repo, and I have a bunch of sample PRs that show off the various ways these tools check for image updates, including the silent rebuilds, All the resources are in the show notes and if you have questions, Feel free to start a discussion in the repo or in my Discord server, and thanks for watching. I'll see you in the next episode.