Will AI Replace Digital Forensics Experts?

14:12

Will AI Replace Digital Forensics Experts?

13Cubed 09.09.2025 5 643 просмотров 173 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Is AI going to replace digital forensic investigators? In this episode, we'll test a local instance of DeepSeek-R1 in Windows forensics to see how it compares to a human investigator. Let’s find out if AI can handle the job! *** If you enjoy this video, please consider supporting 13Cubed on Patreon at patreon.com/13cubed. *** 📖 Chapters 00:00 - Intro 01:23 - The Questions Begin 10:43 - Closing Thoughts 🛠 Resources #Forensics #DigitalForensics #DFIR #ComputerForensics #WindowsForensics #AI #DeepSeek

Оглавление (3 сегментов)

Intro

Welcome to 13 Cubed. In this episode, we're going to try something new. Let's answer the question, can AI replace our jobs as forensic investigators? Well, I think you probably already know the answer to that, which is going to be no. But I still thought it would be fun to take a look at a large local language model or LLM and see how good it can perform when asked Windows forensic questions. Now, we're running DeepSk R1, and this is the full 671 billion parameter model running locally on this Mac Studio. So, this is an M3 Ultra with 512 gigs of RAM. And because of that, I can fit the entire 4-bit quantized model for Deepseek R1 into memory, which is pretty awesome. If you're really interested in experimenting with LLMs locally, which I would highly recommend you do, then a Mac Studio or any kind of Mac is probably going to be your best choice just because of the ability to have that unified memory. Otherwise, you would be buying a 4090 or a 5090 or something like that. And the amount of power consumption used by those platforms is insane. Right now, as I'm speaking, this Mac Studio is just sipping power. The total system power is well under 200 W. You'll notice that I've said hello to get us started. So the model is loaded and now what I'd like to do is just say in the context of

The Questions Begin

Windows forensics, what is prefetch? So let's start with that. Now what I'm going to do is speed up the thinking part of this. But otherwise, I'm leaving the video unedited. So you're seeing this essentially in real time with the exception of the thinking, which like I said, I'm going to speed that up so we don't sit here all day waiting on it. And notice that it's starting to give us the response now. And so far what it's saying does make sense. It says that it monitors the first approximate 10 seconds of an application startup. It creates a prefetch file with the name of the application hash in Windows prefetch. All of that looks pretty spot-on really. It even talks about the run count, timestamps of recent executions. It gives some example file names here and that's pretty cool. It even shows us how to analyze it. It does in fact mention PECMD from Zimmerman which is cool. And it also has a little caveat section here saying that it's optional and can be disabled and a few other tidbits of information here. So this I would say looks mostly accurate. I just skimmed it and didn't read all of it. Feel free to pause the video if you want to check everything here. But now let's get a little bit more complex by saying how many last run timestamps are tracked in modern versions of Windows. So let's see what this says. And you can see it's answering right now. And it actually came back saying the correct number eight, which is correct in terms of the total number of last run timestamps. But it is showing us some different information here for Windows Vista showing us four timestamps per prefetch file. Although mostly it looks correct in terms of modern versions. It is saying eight. So that's pretty cool. So notice that it has finished answering, but you're going to notice there are some problems here. Like for example, why did it change from XP's 128 to only 8. So it's conflating some information here. XP would keep up to 128 prefetch files, wherein the oldest would be overwritten by the 129th and it would, you know, cycle over. But that has nothing to do with the number of total timestamps tracked within each prefetch file. That number is eight on modern versions of Windows. So, it's getting a little bit confused here. Kind of mixing up some information. Let me ask it a little bit more in-depth question. Let's say that we have a binary that executed 50 times. I should be able to determine the last eight times of execution based on the timestamps tracked in prefetch. How could I also determine the first time of execution? Now, what I'm going for here is something along the lines of using the file system timestamps, specifically the B or birth timestamp for the PF file itself as a means of determining the approximate time of the initial execution. Now, I would expect it to tell me that is something that you could do. So, in other words, you could determine the first time of execution and the last eight times of execution for my example of a 50 runtime count. But if I wanted to know, for example, the 302nd time the executable ran, well, I wouldn't be able to get that information, right? So, let's see if it understands what I'm asking it and how it responds. So, it's telling us that prefetch alone is not going to be sufficient because the PF file is only going to show us the last eight executions or the most recent eight executions. But in this case, let's see if it mentions the actual creation time of the file on the file system. And you can see it does mention it right here, the prefetch file creation time. You can see it right here in number one as it's answering. So that's actually fairly impressive. Now it is expanding on that and talking about things like AMC cache and recent file cache. bcf user assist, Windows event logs, and 4688s in particular. So yeah, not too bad. It's been mostly accurate in answering the questions so far. Now let's see if we can ask it something a little bit more tricky. And you probably know what artifact that I'm going to ask it about and that's going to be shim cache. I'm actually going to click stop generation at this point and say can we use shimcache to prove execution on modern versions of Windows. Now, of course, you know that the answer to this question is maybe or it depends because in the most recent shim cache video that I made, we talked about the fact that Eric Zimmerman determined that we can look at the last four bytes of a shim cache record and in some cases for non-native Windows binaries determine whether or not execution did or did not occur. And you can see that it's answering right now showing us that it cannot directly prove program execution on modern versions. Okay, that makes sense. Its evidentiary value is indirect and circumstantial, not conclusive proof of execution. Here's the breakdown. Crucial for forensic analysis. Let's just skim this and see what kind of information it's telling us. I do see the fact that it's referencing the time stamp as the last modified time. So, that's cool. It's telling us that there are no timestamps of runs or executions, which is correct. It's telling us that it remains valuable for program presence. That's pretty cool. File modification time, malware indicators, file deletion awareness, persistence mechanism detection. Okay, so this is not too bad. And it looks like it's actually creating a table here showing us shim cache versus prefetch. Let's see what it says. Purpose compatibility assessment. Purpose is performance optimization for prefetch. That makes sense. Proves execution. No. And then yes for prefetch. Key execution data. There's nothing here. The last eight times for execution for prefetch. Survives deletion often. And then usually it shows us OS changes. It shows us the location. Yeah. It's doing a better job than the last time I asked chat GPT these questions. I think I was using the 01 model at the time and it got a lot of these details incorrect. Remember this is DeepSeek R1 and yeah, it's actually doing a fairly decent job. Let's switch gears and ask it something completely different. How many total timestamps could exist on a Windows NTFS file system? and I'll just say answer tsly. I don't need a full explanation. Just tell me the total number of timestamps that could exist. Now, typically I expect the answer to be at least 20. And of course, the reasons for that we've talked about in previous episodes. I'll link some of those if you're not familiar with how NTFS timestamps work, but four would definitely not be the correct answer. Again, there could be at least 20 different timestamps tracked for a single file on a Windows NTFS file system. That's a question that I've seen AI models often get wrong. So, let's see what happens here. Okay, you can see that it's answering now and it's telling us that the standard information attribute is going to track four timestamps. It's also showing us that the file name attribute is going to track another set. So, it's telling us that the total core timestamps are eight. Uh, and that's not exactly wrong except for the fact that we also have another set of file name timestamps if we have a long file name in play as we know because we have a set for the short file name and the long file name. And of course, we also have the dollar i30 set of file name timestamps that are tracked. And again, for long file names, we'll have two sets of those, one for the short and one for the long name. It didn't mention anything about the US journal timestamp, which is another time stamp that could be associated with a given file. So it's telling us that the total core time stamps are eight. Now obviously that is incorrect. So it didn't do as well here. Let's ask one final question of the model. And I think I'm going to ask something related to shell bags. So let's say in the context of shell bags are files tracked or just folders. And once again I'm going to say answer tersly because I don't need a book. Just answer the question. Now, what I'm going for here is the fact that shell bags does track file names in terms of archive formats like zip files and with newer versions of Windows even other types of files like tar and tar. gz. Not the files within the archives, but the archive file names themselves may show up in shell bags if they've been interacted with in Windows Explorer. And that's of course because Windows Explorer treats those archives as folders. So, let's see how it answers this. And it says that shellbacks primarily tracks folders, not individual files. And that's absolutely true. Let's see if it mentions anything about archives in the response here. And no, it did not. And of course, that's not unexpected completely. That's exactly the way I thought it would answer. But I think you can see from this that it's doing a fairly decent job answering some basic questions about core Windows forensic artifacts. Like I said, this is not going to replace a human investigator. Not at least at this point. Now, the whole purpose of this

Closing Thoughts

video is to tell you that while everyone is talking about AI and it's literally everywhere, I think it's really, really important to number one, not panic, but number two, if you ignore AI and what's happening right now, you are going to quickly be left behind. You need to be playing with LLMs. these models, whether locally or even just in the cloud. And you need to understand exactly how these things work and what they do have or what they do offer in terms of capabilities. I use LLMs almost every day. And the way I use them is generally for mundane tasks that I don't want to manually perform. Like for example, maybe I'll give it the contents of a file and say please clean up this file and reformat it to show the data in this specific format. Something like that. Now the caveat here is that you should not use publicly hosted models at all unless the information you are presenting to those models is completely public. If you're taking information from your workplace and feeding it into chat GPT for example, that's not a very wise thing to do. So I would highly recommend not doing that. Consider running a completely local model as I'm doing here and you could try to bounce ideas or questions against that local model and see if it helps you out. Like I said, Apple silicon is a great way to do that. Or you could even consider something like the new Framework desktop PC that as I'm recording this is going to be coming out fairly soon that uses an AMD APU that contains integrated memory that can be shared with the video card. So in other words, you could actually load fairly large models. I believe the max spec is 128 gigs of RAM built into those APUs. So, you can do some pretty cool local LLM work using that. So, there are quite a few ways to do this without breaking the bank too much. I still maintain that the absolute best way to answer forensic questions is to lab it up. In other words, if you're trying to prove whether or not a specific artifact behaves a specific way given a very specific circumstance, then recreate that exact scenario using the same version of Windows, the software that you're testing, and just lab it up in an environment and see if you get the same results. Never take what the LLMs are telling you at face value. But of course, I would tell you that for any single source of information, you should always do your due diligence and trust, but verify. So the key takeaway of this episode is I would encourage you to play around with local language models if possible in your own home lab if you have the hardware to do so. But specifically in terms of AI replacing our jobs as forensic investigators, I really don't see that happening for the foreseeable future. I think there's always going to need to be the human element in terms of trusting your intuition or seeing something that doesn't quite look right and asking those curious questions to get to the bottom of exactly what happened in a given intrusion, for example. So, I think we're probably safe for now, but AI is not going away and you're going to quickly fall behind if you don't understand how this technology works and how it can supplement your work. Okay, so that wraps up this 13 Cubed episode. It was a bit different than normal. Again, completely unscripted and kind of doing it live, if you will. But I hope you enjoyed this quick look at this particular LLM. And as always, thanks for watching, thanks for subscribing, and I'll see you in the next 13 Cubed episode.

Другие видео автора — 13Cubed

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник