Have you tried Azure AI Document Intelligence? In theory it's exactly this...

brianjking · on Dec 25, 2023

I second this, that or have you tried GPT-4 Vision or Donut?

d4rkp4ttern · on Dec 25, 2023

Still waiting for GPT4V but doubt it will do this. Yes I’ve tried Donut and other options but this is a very gnarly problem.

One option is to extract text blocks along with their coordinates (unstructured.io gives this, probably based on another pkg because it’s basically a container for many pigs). Then do the same with a blank template, and you then have an algorithmic problem of matching the filled values spatially with the key locations from the template.

brianjking · on Dec 25, 2023

I'm fairly confident GPT-4V will do this just fine, tbh.

You just need to extract each of the elements into a structured JSON or something, right?

I'll try with your example later today.

d4rkp4ttern · on Dec 26, 2023

Exactly, the form has filled values in named cells, so we need a JSON of cellName -> filledValue mappings.

Let me know how GPT-4V does!

qingcharles · on Dec 25, 2023

I second trying GPT-4 Vision, though they have dumbed it down a bit since launch.