I'm currently trying to transition from a fairly rigid day job into working as an independent developer. The goal is to build useful online tools and hopefully create a sustainable income stream doing something I find more engaging.
One consistent annoyance in my professional work has been dealing with PDFs – specifically, extracting information into editable formats without losing structure. Copy-pasting often creates a mess.
I've focused heavily on trying to maintain good formatting for headings, text flow, formulas, and especially table structure (getting rows/columns right in Markdown). It also has an online editor for quick modifications after conversion.
A key aspect for me was privacy: the application explicitly does not save the content of uploaded PDFs or the generated Markdown files. It only stores minimal metadata (email, filename, page count) for registered users' plan limits.
It's very much a "scratching my own itch" project born out of that PDF frustration. Early days, but hoping it proves useful for others too.
The need for batch processing to pull out targeted data points from PDFs (rather than converting the whole document) is a valuable insight.
While the current tool focuses on full conversion to Markdown, enhancing https://pdftomarkdown.pro/ to handle specific data extraction tasks like yours is definitely something I'll consider carefully for the future roadmap. Thanks for highlighting it!
One consistent annoyance in my professional work has been dealing with PDFs – specifically, extracting information into editable formats without losing structure. Copy-pasting often creates a mess.
So, my first project tackling this is an online PDF to Markdown converter: https://pdftomarkdown.pro/
I've focused heavily on trying to maintain good formatting for headings, text flow, formulas, and especially table structure (getting rows/columns right in Markdown). It also has an online editor for quick modifications after conversion.
A key aspect for me was privacy: the application explicitly does not save the content of uploaded PDFs or the generated Markdown files. It only stores minimal metadata (email, filename, page count) for registered users' plan limits.
It's very much a "scratching my own itch" project born out of that PDF frustration. Early days, but hoping it proves useful for others too.