By Akash Munshi — 12 Sep 2025

Why Every PDF Wants to Kill Me

If there’s one file format that has consistently tried to ruin my sanity as a builder, it’s the humble PDF.

The supposed “Portable Document Format” that was invented to preserve structure has ironically turned into the greatest source of chaos in my life.

I’ve built AR try-ons, AI Agents, regression testing platforms, data extraction tools, and even dabbled with AI marketplaces. But none of them come close to the sheer pain of PDFs. Let me explain why PDFs are basically out to kill me.

PDFs Are Not Files, They’re Time Bombs

When you open a CSV, life is good. Columns, rows, neat data. When you open a JSON, it’s a tree—predictable.

When you open a PDF? Surprise! It’s either:

Texts floating like lost balloons
Tables pretending they’re paintings
Images pretending they’re tables
Passwords you didn’t set but now magically exist
Bank statements where page one looks like a bio-data, page two like a menu card, and page three like a financial horror story

Every PDF is essentially a different religion with its own commandments. And I’m expected to be multilingual in all of them.

OCR: The Shiny Trap

“Oh, just use OCR,” they said.

“It’ll solve your problems,” they said.

OCR is like that friend who confidently tells you they can fix your laptop, and the next thing you know, the screen is upside down and the keyboard types only emojis.

Sure, OCR works for big, bold, clean text. But ask it to read 6pt font in a bank statement and suddenly it thinks your balance is B@1nce=∞. Tables? OCR thinks every row is a haiku.

OCR is not the savior—it’s the trap that makes you think you’re progressing, until you realize you’ve just been extracting gibberish at scale.

The Illusion of “Structure”

The cruelest thing about PDFs is how they trick you into believing they’re structured. You see a table, your human eyes are like, “Yes, clearly a table.”

But the PDF itself is like:

“This number is actually an image.”
“This line of text is actually six separate characters floating on coordinates you’ll never understand.”
“Oh, and this box? Yeah, it’s decoration, not data.”

Basically, every PDF is cosplaying as Excel, while deep down it’s a chaotic scrapbook.

My Current Coping Mechanisms

I’ve tried everything:

PDFMiner: Great until you realize it returns text in the order of a drunk toddler reading left-to-right, top-to-bottom.
Tabula: Perfect for tables—until it suddenly thinks the entire PDF is one giant merged cell.
Vision LLMs: Very promising, but then I find myself validating outputs line by line like a paranoid accountant.
Regex: Works like black magic… until the day the format changes by one whitespace and your entire pipeline collapses.

So now my approach is part AI, part brute force, part therapy session.

The Grand Plan: Make AI Fear the PDF

I don’t want to just “parse PDFs.” That’s too small a dream. I want to bend PDFs to my will.

The plan looks something like this:

Extract everything—text, tables, images, bounding boxes.
Reconstruct the PDF in an editable format, where each element is a lego piece I can move, edit, or delete.
Apply AI agents that can reason about what’s a name, what’s an address, what’s a balance.
Validate automatically so I don’t lose my mind cross-checking.
Rebuild PDFs so clean, Adobe will cry.

If it works, I’ll basically have created the Avengers of PDF processing: OCR-Man, Regex-Hulk, Vision-LLM-Woman, and Tabula-Ironman—fighting together instead of sabotaging each other.

Closing Thoughts

Every time I think I’ve solved PDFs, a new one shows up—more evil, more creatively broken, more determined to end me.

But here’s the truth: if I can tame PDFs, I can tame anything.

And maybe, just maybe, one day we’ll live in a world where PDFs behave like normal files instead of cursed relics from 1993.

Until then, I’ll keep fighting this war. If you don’t hear from me, just assume I opened another bank statement PDF.