An undergrad accidentally beat a touring award winner. A dog outperformed IBM's quantum computer. And Valve's video game code is now running on Meta's data centers. These are some of the wildest computer science papers from 2025. This guy is going to help me with Shor's algorithm over here. He just doesn't know it yet. All right, first 2025 paper. Now, some of the best discoveries are accidental. So, this makes this paper kind of one of my favorite ones. Now, picture this. You're an undergraduate computer science student and you're trying to work on a research proposal for a project. And during your research process, you decide to implement a hasht. No big deal. You create one new that makes sense to you. And so then you present your research to your professor. Now, your professor takes your research and he's reading it and he just drops the paper to the ground. And you're like, "Oh my gosh, I'm about to fail this paper. " but instead he says, "Oh my gosh, you just beat a touring award winner. " And this is actually a lot easier than it sounds. So, hashts are kind of like driving through a parking lot. Everybody knows when it's 50% full. This is super simple. You just drive into the parking lot, park right away, you're good to go. But everybody also knows once you start to hit about that 99% capacity, it gets really difficult to find a free parking spot. and you're just driving around and around until you can find that one last available space. Now, in computer science terms, this is called open addressing without reordering. And Andrew Tao, a Turing Award winner, said that the best you can possibly do is big O of 1 over delta. Kiban, the undergrad student who unintentionally came up with this optimized solution, created a solution that was big O of log 2 of 1 / delta. And if you can't just like magically picture that in your mind, don't worry. Just look at this graph. Now, the old original touring award-winning solution is in red and then the new undergraduate solution is in green. So, that's really, really cool. Going back over to our parking lot analogy, though, this is kind of like if you took all of the different parking spots and then you separated them into different individual sections. So, if you're a car coming in and you're trying to find a free parking spot and you see this one section is full, you can completely skip it and go on to the next section that might have an available slot. Heavy sigh. He doesn't care about what I'm saying. Now, in 2025, they finally published the formal mathematical proof for this particular concept. And I know what you might be thinking. You might be thinking, "Wow, well, if this optimized version is now out there and mathematically proven, well, why doesn't every different standard library out there start implementing this technique? " Now, unfortunately, mathematical proofs for optimizations don't always equate to actual real optimizations in practice in computer science. So, Google's Swiss table, for example, is extremely optimized for the L1 CPU cache. kind of big surprise that the real world is actually really complicated out there, but you can implement this methodology yourself in your own project for practice and I definitely highly encourage you to do that. Now, it's time for our next 2025 story. And this story is about game developers versus the Linux kernel. When I call it like this, it just gives me the strongest urge to throw it. I don't know why. So for years, historically, decades, the CPU scheduler has been built just directly inside of the kernel. And keep in mind, for reference, the CPUuler is the thing that decides what runs right here, right now. And so previously changing the algorithm for the CPUuler has been a really complicated process. Now, standard CPU scheduling algorithms tend to prioritize fairness. If you think of your CPU as like a CPU pie, it prioritizes everything getting a nice equal slice of the CPU pie. So, it's a really polite kind of process over here. Now, games do not care about politeness or fairness. They care about frame times. Ideally, what you would want is you'd want to have a separate CPU scheduler for your gaming processes than you do for your regular scientific computing processes, for example, since nobody wants to have a sudden lag spike inside of their game. But historically, this is extremely difficult to swap out these kinds of algorithms. But let me put my Google hat on for a second. Where is it? since I do work there after all. And now I get to brag a little bit that Google was actually the first one to solve this problem. They solved it back in 2021. They released the ghost framework which allows you to swap out CPU scheduling algorithms entirely in user space. Later
Segment 2 (05:00 - 10:00)
on, Meta created their own CPUuler called SC extuler extension in BPF. And this essentially allowed you to swap out the entire CPU scheduler without having to involve the Linux kernel developers or even having to reboot the machine, which just like absolutely blows my mind there. Let me just swap out the entire computer and still keep the whole thing running. That's cool. But Valve really cares about their gamers. Now, I don't have a Steam Deck in front of me right here, but imagine I did. Imagination. Now, Valve created their own CPU specifically for the Steam Deck. Their CPUuler is called SCX Labd D, which stands for latency critically aware virtual deadline scheduler. Please stop naming your project so long that I have to remember. But basically what this does is it prioritizes deadlines over throughput, which is kind of a fancy way of saying, "I don't care if I receive a notification a little bit late, but don't you dare drop a single frame of my game. " Now, most requests are going to come in pretty fast, but occasionally that 1% or that. 1% of the time, a frame is going to take way longer to load. Now, in programming terms, we call this tail latency, and it's exactly the culprit for the kinds of stutters that Valve is absolutely trying to prevent. Here's where it gets really interesting. If you think about it, a lag spike on a game and a slow request on a web server are mathematically the same kind of problem. And really interestingly, recently in December 2025 at the plumbers conference in Tokyo, Meta actually admitted that they're using Valve CPU scheduler on their own data centers. If you think about it, that means that the same algorithm that's being used to keep your game running smoothly on your Steam Deck is the same one that's actively being used to keep things like Instagram feeling super snappy. It's a really interesting talk and I'll link to the slides in the description of this video. Now, it's time for the next 2025 computer science paper. And I know this is the one you've actually been waiting for because you're like, "What's up with the quantum dog? " This one is what I would call an anti-hype paper and I love it. It's called replication of quantum factorization records with an 8-bit home computer, an abacus, and a dog. And it is kind of meant to be funny and like a joke paper, but it does also have some really good points inside of it. Now, like it or not, there's been kind of a lot of exaggeration in the news lately related to quantum computing. you'll just suddenly see a headline out there that's like, "New advancement in quantum computing. Suddenly all encryption is dead. " Well, this one is by Peter Gutman and Steven New House, and they attempt to replicate some of the mathematics of these multi-million dollar quantum computers with more primitive biological hardware. The name Gutman might sound a little familiar to you because he invented the Gutman method back in the '9s, which is that 35 pass secure white method for hard disk drives. And there's a lot of gold inside of this paper, but specifically the dog section tries to mock compiled shores algorithm. A lot of early on quantum computing records essentially cheated. The circuits were tailored to one specific number, not general compute machines. So, how do you replicate a compiled shores algorithm? Well, it's really simple. You have Gutman over here, and he is acting as the compiler, and he gives instructions over to his colleagueu's dog, Scribble, who is acting as the quantum computer. And what is the instruction? The instruction is to bark three times, which he would fail at currently. And that's it. And with that, they were able to successfully replicate IBM's 2019 quantum factorization record. But that wasn't even enough for them. They didn't just want to replicate it. They wanted to one up it. Now Gutman gets Scribble the dog to bark five times. And this factored 35. And that according to the paper is enough to exceed the IBM quantum computer that they were trying to beat. Who would win? A $40 million quantum computer or one good boy? But the main point of the paper is a lot less silly because they propose a new quantum factorization evaluation criteria for future research. The proposal basically states that until quantum records are based on random primes that are unknown to the experimentter without pre-processing, it's pretty much meaningless. And I'm not trying to say that quantum computing is fake or anything. It's definitely real and it's definitely cool out there. But this paper is basically just some mathematicians trying to poke fun at some of the scary headlines out there designed to make the general public nervous. This is kind of a good paper if you want to send it to your friends
Segment 3 (10:00 - 15:00)
periodically who think that RSA encryption is dead. He sees fork and he thinks I'm about to eat food and he wants some. He doesn't even have any food. For this next paper over here, we're going to be talking about forks. No, not these kinds of forks. Although look at this, perfectly balanced as things should be. But this particular paper won best paper award for one of the ACM categories. And it won it for a really good reason because it kind of creates like this time machine for RAM. Now, normally when you're running a fork call, you're cloning a process or something and all of this is happening on just one single machine. But CXL fork over here extends this concept to work across an entire cluster. Imagine you have a server and one of your really cool posts suddenly goes viral. Well, you need to spin up a ton of different nodes to be able to handle that large traffic spike that is suddenly happening. And that means a ton of different cold starts. You have to have a booting up process. You have to load a ton of libraries. All of these different kinds of processes take a ton of time. So what CXL fork does is it takes a snapshot of an already warmed up process and you have this shared memory across a ton of different machines. So that when you that means when you have a sudden network traffic spike, all of these machines can read the same underlying data using this really cool memory fabric called CXL. And they measured that about 72% of a process's memory never changes. So why would you need to copy that a million different times across a million different processes? And the neatest thing is that because they're basically snapshotting a process while it's in a clean state. If something goes wrong, your program crashes, well, whoop, you can just CXL fork revert and now you have the clean running process once again. I could use this in my life. Back when I worked in aerospace, I actually did something really similar to that using a tool called Creo, which stands for checkpoint restore in user space. And the entire idea behind that project was to be able to quickly restart mission critical processes that might have failed during execution without having to reboot the entire airframe. This isn't really like a weekend project you can kind of play around with, but if you do want to get your hands dirty, you can play with Cre since it is free and runs on any Linux box. Now, CXL Fork is designed to run on really expensive enterprise hardware and it's frankly a lot more interesting because it's also designed to run across multiple different compute nodes, but you can kind of get introduced to it by playing with Cre. And you will start to see this trickle down over the next few years. Now, on to our next 2025 computer science paper. Here's something that nobody talks about. Your CPU is bored. Like really bored. In fact, modern processors these days are so good and so fast that they spend like the majority of their time just waiting for memory. Now, computation is cheap, moving memory is expensive. And this mismatch is exactly where all of the interesting mathematical research and innovations are actively happening. Which brings me over to Dystra's algorithm. And you might have heard this. This is the really common algorithm for the shortest path. You probably heard about it in one of your computer science classes. It's extremely complex but extremely useful and optimized variants of this are used in really important things like computer networks or like mapping applications. Dyster's algorithm was actually mathematically beaten in 2025 and nobody noticed. Now, I personally think it's kind of a big deal, but it's probably going to take like approximately 15 years. I'm not going to lie, this paper is extremely complicated, but it is mathematically provably faster. The only problem is it's weird. It's recursive, and it does like a ton of different pointer chasing. So, in actuality, this is terrible for the cache on a regular CPU. So, it still doesn't look faster. But if you think about it in the theoretical complexity, Dyster's algorithm is m + n log n time complexity. And this particular one is big O of m log to the 2/3 n. So it is provably faster. The major catch here is just that the number of known needs to be astronomically large to be able to see the difference between dystra and the new algorithm and see it actually winning. So I'm not actually going to be able to show it winning with benchmarks. Think like billions of nodes astronomically large. But what I can do is I have implemented both different algorithms here in C++ and I'm using Google benchmark which automatically warms the cache for you and everything. And I'm going to be running the two algorithms against each other and comparing the ratio of their speed as
Segment 4 (15:00 - 17:00)
the number of nodes increases. So let me just pull up my terminal here and I'm going to run my benchmark. So we're increasing our nodes, increasing our nodes and you can see the ratio is indeed getting smaller as we do increase that number of nodes. So why does this matter? Well, historically it takes about 15 years for raw computer science theory to make it into the actual implementations of standard libraries. Delta stepping is a really good example of this in the supercomputing space. And not only that, but the pressure to reduce data movement is just growing and growing. And moving data across a bus is much more expensive than ondevice computations. So the real winner for this paper in my opinion isn't the full algorithm itself. It's the components that actually comprise it. And I think that there's going to be more derivations of this work as it goes on. Now you just watch. Someone's going to take the filtering logic from this paper and implement it on like computational memory like Samsung's HBM PIM architecture or something and then suddenly we're going to have an entirely new class of algorithms to play with, right? And if you're interested in hearing more of my theories on this, hi, you can follow me at loriwired onx where I have like a whole series about how these kinds of algorithms are going to become increasingly relevant over time and how memory taring is going to like end up eating the world. I think it's important to take a look at a lot of the positive things that are happening in computer science. Since there's a lot of negativity in this field, a lot of people don't focus on the amount of progress that we make in computer science every single year. So, go out there, go read some research papers, go write some research papers that might end up in next year's video. As always, thank you so much for watching everybody. Stay cozy and until next time, Lori wired out. Got to get those belly ropes. robes.