Paperless-ngx + Local AI (Optional): Better OCR, Self-Hosted, No Cloud
26:42

Paperless-ngx + Local AI (Optional): Better OCR, Self-Hosted, No Cloud

Techno Tim 27.01.2026 157 572 просмотров 4 264 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Build a complete Paperless-ngx stack in Docker and take control of your documents. We’ll get Paperless running first (works great on its own), then optionally add local AI with Ollama + Open WebUI and upgrade OCR using Paperless-GPT and Paperless-AI for more accurate, searchable text - no cloud required. Full Guide Here: https://technotim.com/posts/paperless-ngx-local-ai/ Code Here: https://github.com/timothystewart6/paperless-stack Paperless NGX: https://github.com/paperless-ngx/paperless-ngx Paperless AI: https://github.com/clusterzx/paperless-ai Paperless GPT: https://github.com/icereed/paperless-gpt Merch Shop 🛍️: https://l.technotim.com/shop Support me on Patreon: https://www.patreon.com/technotim Sponsor me on GitHub: https://github.com/sponsors/timothystewart6 Subscribe on Twitch: https://www.twitch.tv/technotim Become a YouTube member: https://www.youtube.com/channel/UCOk-gHyjcWZNj3Br4oxwh0A/join Gear Recommendations: https://l.technotim.com/gear Get Help in Our Discord Community: https://l.technotim.com/discord 2nd channel: https://www.youtube.com/@Technically-Tim (Affiliate links may be included in this description. I may receive a small commission at no cost to you.) 00:00 - Intro - Self-hosted, private, AI optional 00:30 - What's included in this Paperless-NGX Stack? 01:11 - Paperless Architecture (Paperless ↔ AI ↔ Ollama) 01:46 - Stack Overview - All services explained 02:45 - Folder layout + ports + .env files 03:10 - Docker Compose up 03:23 - Ollama + Open WebUI setup (port 3001) 03:59 - NVIDIA GPU quick test 04:22 - First login: Paperless-ngx setup (port 8000) 04:43 - Create Paperless API token 05:21 - Paperless-AI setup + settings (port 3000) 08:17 - Upload docs + baseline OCR and metadata results 11:30 - Processing documents with Paperless-AI 12:50 - Updated metadata from Paperless-AI 14:39 - RAG Chat with your documents 16:09 - OCR is still not good 16:31 - Paperless GPT replaces OCR with a Vision model 16:51 - Paperless-GPT setup (port 3002) 17:25 - minicpm-v - an open source multimodal vision model 18:05 - Download opencpm-v to Ollama 18:22 - Paperless-GPT walkthrough (port 3002) 19:10 - Vision OCR demo (before/after) 21:05 - Vision OCR with Watermarks (before/after) 22:32 - Automate with Paperless workflows (auto tags) 24:40 - Testing automated workflow with Vision OCR 26:18 - My thoughts about PaperlessNGX with Local AI Thank you for watching!

Методичка по этому видео

Структурированный конспект

Создание персональной системы управления документами с ИИ-аналитикой и локальным OCR

Развертывание самохостируемой экосистемы Paperless-ngx с интеграцией Ollama для автоматизации классификации и повышения качества распознавания текста (OCR) за 26 минут.

Оглавление (26 сегментов)

Intro - Self-hosted, private, AI optional

Paperless NGX is a self-hosted document inbox that lets you drop in PDF scans, automatically runs OCR, organizes with tags and correspondents, and then makes the whole thing searchable so when you need a specific page, you're not digging through folders like it's 2006. And because it's self-hosted, it puts you back in control of your data. Your documents stay on your hardware under your rules. No uploading personal paperwork into ChatGPT or some random cloud service just to get indexed.

What's included in this Paperless-NGX Stack?

In this video, we're doing a full repeatable setup with Docker. We'll get Paperless up and running first, and then we'll optionally add some local AI with Ollama and show what's actually worth using once everything's set up. Also, there's a full step-by-step guide linked in the description, including the exact compose files and. env files we're going to use today. So what are we setting up then? We're setting up PaperlessNG for document storage with some baseline OCR. We're going to use Ollama as our local AI engine. We'll then use paperless-ai for tagging, titles, metadata, and suggestions. Then we'll bring in paperless-gpt for a vision model OCR, and this is a huge upgrade for OCR accuracy. I know that was a lot of information, but think of

Paperless Architecture (Paperless ↔ AI ↔ Ollama)

it like this. Paperless NGX is the filing cabinet. It stores, indexes, and searches. Ollama is the local brain. It runs our models. paperless-ai and paperless-gpt are add-ons that plug into Paperless that improve metadata, and in the case of paperless-gpt, it improves OCR. And just to be clear, all of the AI is optional. Paperless NGX works perfectly fine on its own. If you just want local document management and basic OCR, you can stop there and you'll have a great setup. Alright, so let's jump into the Compose stack. Here's my Paperless stack and

Stack Overview - All services explained

and I'm using it with Postgres instead of SQLite. You might be wondering why I'm using Postgres. Well, it scales a little bit better if you start building a really big library. You'll notice that Paperless is also depending on a few other services in this stack. One of them is Redis, an in-memory database. Another one is Gotenberg, which is new to me, but this helps convert documents to PDFs. The next is Tika, which extracts metadata from those documents that you want to bring in to Paperless. Then we have Ollama, which helps manage our local LLMs. Then we have OpenWebUI, which helps us manage Ollama, our local LLMs. Then we have two of our AI services for paperless. We have paperless-ai. And then we have paperless-gpt. Last but not least is Dazzle. I usually include this in all of my stacks, but it's a web UI to see all of your container logs, which is super helpful when troubleshooting.

Folder layout + ports + .env files

You can see I've exposed a couple of ports on here. We have Paperless NGX on 8000, then we have paperless-ai on 3000, then we have open web UI on 3001, paperless-gpt on 3002, and then Dozzle running on port 8080. Also if you couldn't tell already, I have a. env per service so that not all services have access to all of the secrets.

Docker Compose up

To bring the stack up, all we need to do is docker compose up-d. To see if all the containers are running, you can run a docker ps and check there, or you can go out to Dazzle and see all of your containers are running. So now we want to go into OpenWebUI.

Ollama + Open WebUI setup (port 3001)

So once we get to OpenWebUI, we'll want to get started and then we'll need to create an account. So let's create our admin account. So to download a model, go to your profile, then go to admin panel, then go to settings, and then go to models. Here we want to pull a model from Ollama. com. I found that Llama 3. 2 3b works pretty good and it's a good starting point. So let's pull that one down. Alright, so we have that model pulled down. Let's test it really quick with chat just to confirm it responds. Tell me a joke.

NVIDIA GPU quick test

Okay, so we see a response. That's a good sign. That means the model is loaded and working. If you ever want to do a sanity check on your GPU and make sure that your server is using it, you can remote into your server and run nvtop. And if you go back to chat and have it output some more information, other than my UPS going off because it's drawing a lot of power, you can see that it's actually using my GPU here.

First login: Paperless-ngx setup (port 8000)

Now let's check on Paperless NGX. So when we get to the initial startup screen, we'll have to create an admin account and then we'll sign in. And congratulations, we now have Paperless NGX running. We'll come back to Paperless here in a second, but let's set up paperless-ai. And in order to set up paperless-ai, we need an API token from Paperless first.

Create Paperless API token

So to get our token from Paperless NGX, we can go into our profile. And if we click into here and we generate one, we can get one right here. Now let's copy this API token and let's paste it into our paperless-ai. env file right into the Paperless API token. You also want to update your username, mine's admin. And then while we're here, you just want to make sure that your Ollama model that you pull down within Ollama matches here. And I used llama 3. 23b. And the rest of this outside of time zone, you can leave the way it is.

Paperless-AI setup + settings (port 3000)

Let's restart our paperless stack now so we can get the new EMVs. Let's go to paperless API now and I'm checking here to see what port it's running on and it's running on port 3000. Okay, once we get here, we'll need to set up user count as well. So we'll set up our user account here. For connection settings, you'll want to enter the paperless API URL. And in this stack, it's going to be in the service paperless for 8000 and then slash API. And then for API token, you're going to grab that API token that we just got from paperless and you're going to want to paste it into here and then enter your paperless user in here. For AI settings, the same thing we talked about in that dot EMV, we're going to use Ollama. We're going to set the Ollama URL and it to the service name within our compose stack, which is just Ollama. The model that we're using, a token limit and token response. I'm not going to adjust any of these. The default values work fine for me. In the advanced settings, you can choose whether or not you want to use existing correspondents and tags, and that's related to tags within Paperless NGX. You can set your scan interval time. So how often paperless-ai is going to check for new documents. If you want to process only specific pre-tagged documents, add an AI process tag to the documents, so after these documents are processed, do you want to add a tag to these documents? Do you want to use specific tags and prompts, whether or not you want to include tags in your prompts that you'll see below? And then whether or not you want to disable automatic processing, and that will shut everything down without having to go and uncheck everything. This right here is just a toggle to basically say don't do anything to my documents. And then whether or not you want to use these AI features. So do we want to assign tags to it? I do. Do we want to detect correspondents within the data? Yeah, I do. Document type classification, title generation, and then custom fields if you want. You can add some additional fields, say for instance, total amount if you were scanning invoices and you could say that would be a integer and you would add an integer and then when scanned if the LLM can see a total amount in there it might be able to plug that value into that custom field last but not least there is a prompt and this is the prompt that we're going to give the LLM in order to process the documents right now it doesn't have one I would use the example prompt by clicking on that button and it's going to fill in a pretty good prompt that should be tailored exactly to paperless ng now if you need to tweak anything you definitely can in here this is a lot better prompt than I could ever write so I'm going to keep this one in here let's go to save and it saved my configuration and every save you do it's going to reset the ui but we're done for now and paperless-ai so now let's go back to paperless and upload a few

Upload docs + baseline OCR and metadata results

documents. So I've generated a few sample documents, some documents of say devices with serial numbers, some sample invoices along with some tax information and some receipts. I tried to generate a variety of documents to see how Paperless would handle these. To upload documents you can either click this button or you can drag and drop them into this area. So let's drag them into here. Now all of these documents are uploading and if we go into documents we can see them if we click into the document we can see some of the metadata about this document and this one hasn't been processed with ai at all it's only been imported into Paperless NGX and we've got some OCR so if we look at some of the metadata like title this is based on the file name and then we don't have a lot of information here we don't have the correspondent we don't have document types and we don't have any tags right now if we go into content we can see some of the information that was extracted out using OCR so you can see it did a pretty good job but this was a pdf and this information was text already you see it's selectable but it did a pretty good job outside of this right here refund your your our refund or er t balance I think it's getting really confused on this test data not a tax return right here and then we see xr here with zero dollars and then if we go into metadata this is pretty standard metadata this is just metadata it got from either the document or by doing a checksum off of it or looking at the document types but nothing interesting to see in here nothing in notes and nothing in history so let's look at one of these other documents where you can see that OCR isn't the greatest. Say for instance I go into this made-up sample image of this camera you can see you know we have a title and the title again is just the file name. If we go into content we can see where it actually did some OCR or didn't really do some OCR. I'm not poking fun at paperless itself this is just the OCR library I guess that it's using but you can see it didn't do the greatest job and this is a made-up image with this text printed over like this is as clear as text as it could possibly be but you can see the model is x2000 then it goes to serial number and screener and then who knows what it's doing here uh it just said nothing it gave up here and then I think on the fcc it did this symbol maybe uh oh that's probably the symbol right here uh it got made in japan and somehow it got a dollar sign ws so and then tilde dash where I have no idea where that came from so you can see that OCR itself isn't the greatest we're going to see if we can improve this um but just remember this OCR right here but now that we have paperless-ai running let's hop over to there and process some of these images and enhance our metadata. So if we go back to paperless-ai on our dashboard, we can see that

Processing documents with Paperless-AI

it's now seeing 14 of those documents. If you don't see your documents, you can say scan now, and it will go out to Paperless NGX and scan for those new documents. But now it has 14 documents ready to process. Now, if it's not processing any of your documents, it might be because you have this setting turned on in here where it says process only specific pre-tagged documents. And if you said yes, and you would give it a tag, like process AI documents, it would look for those tags and process those. I'm gonna say no, because I just wanted to process everything. And then let's save our configuration. And the UI is gonna refresh here in a second. We should see this start processing. So if we go back to documents, you could see it's starting to process these documents. So it's process two already. It's getting a new document. It's scanning it. I can actually hear my GPU, like the electrical noise when it scans this document. It's pretty wild because it's making a noise almost like it's scanning it. I know it's not, but just the electrical noise it's making is pretty awesome. Okay, so I processed all of my documents really fast. It did say one processed and I refreshed and it was already done. So all 14 have been processed and we can see we have some information about this document.

Updated metadata from Paperless-AI

So let's actually hop back to Paperless and GX really quick just to see how it enhanced this document and then we'll come back to some of these features here. So if we go back to Paperless NGX and we go into our documents now this is really lit up with a lot more information. So just going back into this camera picture you can see I have a lot more information about it. So it automatically added some tags like electronics, Japan, made in Japan, x2000. It determined that from this picture this is probably product information. If we go into content, you could see that it's still not the greatest. It's still using the OCR that it had because paperless-ai doesn't enhance the image content based on a vision model. If we go into metadata, that kind of looks the same. Notes history, we actually get some more information about what was happening and what got updated from paperless-ai, which is pretty cool. And you can see the title itself got updated too to X2000. I think all of these were based on the file name and then if we go back out to all of our documents you can see that hey now I have some titles so this minnesota you know tax return not really my tax return uh you can see it got a document type of tax return it recognized a correspondent uh in this document is minnesota state that's I guess who it would have been from the irs of minnesota but easy enough to fix or figure out um for some reason it thinks that the correspondent in this one is Amazon, which I guess could make sense. They might think that this is, I don't know, a receipt or something like that or product information. I don't know why I chose Amazon, but it was a good guess. And so this is a really good example of how paperless-ai just enhanced all of these documents just by scanning it and feeding it through an LLM. But one other cool

RAG Chat with your documents

thing you could do with paperless-ai is actually to chat about your documents, right? So if I wanted to look at where was my Minnesota tax return my fake one if I wanted to look at this and say uh how much did I earn this year uh and then here we go it actually answered it pretty quick it said my wages for tax year 2025 were $65,978 we go back to the document itself was that true according to this fake return and yeah it was did pretty good I mean yeah probably my adjusted gross but uh did really good so if I wanted to do RAG chat with all of my documents I just index them and now they're all indexed and so now it knows about all of those documents uh or at least it can go and retrieve information about those documents as I ask questions uh so I could say uh what is the total across all invoices because there were a few sample invoices in there I could say what's the total across all invoices and uh it's saying that the total is eight thousand two hundred and eleven dollars and nine cents now we could uh yeah i was gonna say we can go and look but they're right here and so it was able to pull up all of these you know fake invoices and look at them and look at the total so pretty cool you could do RAG chat here if you want paperless-ai did a pretty good job processing my documents and it did a good

OCR is still not good

job with titles and labels but it really didn't do anything for OCR text quality and for me that's top priority if my content is wrong well then search is wrong and then I'm back to hunting for documents like I did before so that's when I started looking for a better OCR option that's

Paperless GPT replaces OCR with a Vision model

I found paperless-gpt. The reason this stood out to me was it can use OCR using an LLM, especially vision models. So instead of traditional OCR guessing at pixels, the model can actually understand what it's looking at. On my test documents, the test extraction was dramatically better. Now paperless-gpt has a lot of options. I configured a

Paperless-GPT setup (port 3002)

lot of them already. You'll want to make sure that you have your API token, your LLM provider is going to be a llama you want to set your LLM model for this as well I'm using the same one llama 3. 2 3b then you'll want to make sure you set your OCR provider as LLM and then your vision LLM provider is a llama and then you'll want to choose a vision LLM model which is minicpm-v

minicpm-v - an open source multimodal vision model

minicpm-v is an LLM it's a multi-modal LLM meaning you can give it an image or text and get text back out and this one's really good I used it for a little while and I'm really impressed but it's really high performing it's pretty small it's pretty accurate for getting data or text out of images so this is the one that they recommend and so I kind of stuck with it a lot of these variables in here are the default I only put them here so I understood what I could change but while we're in here we actually need to download this model right here so the minicpm-v will need to download this in Ollama so back in open web ui we're going to go to the admin panel

Download opencpm-v to Ollama

go to settings and then go to models and then let's pull this new model and the model is mini cpm - v and it's the 8 billion parameters so let's pull this model once that's downloaded now let's

Paperless-GPT walkthrough (port 3002)

go out to paperless-gpt which should be on port 3002 once we get to paperless-gpt it's a pretty basic ui but you don't really need to use it all that much which is okay but you can see on the home page it's actually looking and scanning for all documents that have this tag of paperless-gpt manual which we don't have any in here we can do ad hoc analysis on some of our documents what we see with paperless-gpt we can see a log or history of the documents that it's processed we could see some additional settings which these are all of our prompts and if you want to adjust these prompts i've actually mapped them inside of our stack so if you want to manually edit these prompts you can. I'm not going to change any of the prompts nor have I yet. Here in OCR is where

Vision OCR demo (before/after)

you can do OCR on individual documents if you wanted to test this out. So let's test it out really quick and then we'll process some automatically. So this is asking for our document ID. If we go back to paperless and then we go into our documents, let's select this document right here. That was, you know, this camera image and let's select its document ID, which is right here in the URL. So this is document ID of five. And let's look at the content again, just to make sure, see the content. It's still that OCR content. Back in paperless-gpt, let's just paste that document ID of five. And then let's submit a scan job. It's actually really fast. And here's the combined OCR result. You can see now that this is a lot better than it was before. But let me actually save the content so we can go compare within Paperless NGX. So let's save the content back to that document. Let's go back to our document. It's going to say, hey, it's detected changes. So let's close out of this document and then open it back up. Now let's go into content. And you can see here, one, it's in Markdown, which is pretty awesome. It actually did a lot better. So now it was actually able to pull the serial number right off here, DC45678901, made in Japan and had before. But even this right here, it was able to see FC and then CE made in Japan. And then even made a note, "The FC and CE are likely abbreviations for regulatory compliance marks, but the full meanings of these acronyms were not provided within the image. " So this did a lot better when we fed this image to an LLM, a vision model, because it's able to understand what it's looking at, unlike OCR, which is just doing pixel detection.

Vision OCR with Watermarks (before/after)

So let's see if we can find one more. Remember we had that weirdness in the text stuff? Yeah, right here. So we had a lot of this weird text right here. And I wasn't really sure where it was coming from. Like I thought maybe it was reading some of that. And some of this was overlapped right here. So let's grab this document ID and feed it to paperless-gpt. And let's see what it comes up with. Let's save this content back to the document. So let's go back into that document. It was document ID 14, right? 14, yeah, it's right here. So contents, hey, look at this. This is pretty cool. It actually generated markdown too, which is pretty awesome because LLM is like markdown. So it actually found the title, created a subtitle. created body text created a table in markdown this is pretty cool a footer and then look at this additional information sample test data not a tax return and this is that text going all the way across there and then it says no at the bottom of the page uh synthetic sample document for upload slash OCR testing only that was me and that was here when I generated this document and even page one. So pretty cool, man, pretty, pretty cool stuff. This is way better than OCR. Having a vision model actually understand what it's looking at and then parse that out and even give us context around it. Like these tables and markdowns, pretty, pretty cool. So you're probably thinking, well

Automate with Paperless workflows (auto tags)

that's a lot of work. I don't want to grab the ID every time. And then every time I upload one, have to go there and, you know, give it the ID. You don't have to do that. And you can create a quick workflow just like this. So if you wanted to create a workflow, you could. I'm just going to call this on upload. And so a trigger of upload. So every time we added a new document, what are we going to do? We're going to take an action here. We're going to apply an action. And what we're going to do is we are going to assign tags. And we don't have these tags yet, but those tags in our dot env and so the tags I want to apply to this are paperless-gpt-ocr-auto and so if I apply this tag to those documents when they're uploaded then paperless-gpt will automatically do OCR on them now you can also do paperless-gpt-auto which we'll put in there too which is going to tell paperless-gpt to also process the document titles and stuff like that. So let's get the OCR out of the way. We actually need to create this title and be very... oh we can't create the title. Okay, okay. So let's actually create this tag first. Create this tag. There's one. Let's create the other tag while we're at it too. We want to create this tag also. All right, so now let's go back into our workflow and on a workflow I'm going to say on document add totally developer name trigger uh and then what's going to happen document added and then our action is going to be to assign a tag we'll assign two of these tags so we'll do gpt auto and OCR so this is going to process tags titles and also OCR too so let's save oh sort order it should just set one for me anyways sort order is one it's the only one we have okay so let's go into documents let's

Testing automated workflow with Vision OCR

upload one more test document let's upload this random diagram you can see had one of the tags and now both tags are gone so let's find this document again let's clear out of here Their menuing is a little bit weird. The only problem is when it converts the document and even the title you're kind of left like figuring out what you just uploaded. I just searched for diagram and it came up which was pretty interesting because I couldn't find there. "Designing a scalable web service architecture" that's actually exactly what I was diagramming. That's pretty interesting that that's exactly what I was diagramming and trying to show uh to someone in discord so designing a scalable web service architecture that's exactly what I was designing and I didn't even say what it was pretty cool uh let's see in content here we go it's a diagram some web service uh arrow pointing down labeled user yeah this is pretty good uh arrows connecting various components so database clustered three boxes connected by arrows in the database cluster Yeah, that is kind of right. Object storage cluster. Yep, two circles and one box connected by arrows. It's actually three circles, but this is pretty good. So this is what it pulled out now using both Vision and the LLM. So we got titles. We got some tags of IT services. Correspondent is Amazon. I don't know, AWS likes Amazon for some reason. And then we got some content out of it too. So really, really cool stuff.

My thoughts about PaperlessNGX with Local AI

So that's the full setup. Paperless NGX is the core, Ollama and OpenWebUI for local AI, paperless-ai for metadata suggestions, and paperless-gpt for vision model OCR you can trust. If you want to copy this exactly, check out the description for links to all of the documentation. I hope you enjoyed this video on Paperless NGX. I'm Tim, thanks for watching.

Другие видео автора — Techno Tim

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник