Google has a distributed embedding matching service in preview: https://cloud.go...

hintymad · on Aug 12, 2021

> I match the query's embedding to the document embeddings,

I assume the doc size is relatively small, otherwise a document may contain too many different topics that make it hard to differentiate different queries?

rpedela · on Aug 12, 2021

For my search use case, documents are mostly single topic and less than 10 pages. However I have found embeddings still work surprisingly well for longer documents with a few topics in them. But yes, multi-topic documents can certainly be an issue. Segmentation by sentence, paragraph, or page can help here. I believe there are ML-based topic segmentation algorithms too, but that certainly starts making it less simple.