Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have you tried Azure AI Document Intelligence?

In theory it's exactly this...



I second this, that or have you tried GPT-4 Vision or Donut?


Still waiting for GPT4V but doubt it will do this. Yes I’ve tried Donut and other options but this is a very gnarly problem.

One option is to extract text blocks along with their coordinates (unstructured.io gives this, probably based on another pkg because it’s basically a container for many pigs). Then do the same with a blank template, and you then have an algorithmic problem of matching the filled values spatially with the key locations from the template.


I'm fairly confident GPT-4V will do this just fine, tbh.

You just need to extract each of the elements into a structured JSON or something, right?

I'll try with your example later today.


Exactly, the form has filled values in named cells, so we need a JSON of cellName -> filledValue mappings.

Let me know how GPT-4V does!


I second trying GPT-4 Vision, though they have dumbed it down a bit since launch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: