I have been using it extensively during the last few weeks. I've very thankful for such a clean and practical API, and I think it will become the central solution for ingesting heterogeneous text in the Python ecosystem.
However, I'm afraid it is not there yet. Other libraries like PDFMiner give higher quality outputs and specialized libraries like Camelot are still needed to extract tables as reasonably well formatted text. It also needs a lot of extra tooling for web scraping. Sure it can read plain HTML from a URL, but it cannot run JavaScript, or control things like User Agent. It could be argued that such features are not within the scope, but it is rather bothersome for a library that presents a magic `partition` function for most standard text sources.
I'm sure it will get there soon though. It shouldn't be hard to integrate with state-of-the-art parsers and tooling, and the simple API undoubtedly brings a lot of peace of mind.
However, I'm afraid it is not there yet. Other libraries like PDFMiner give higher quality outputs and specialized libraries like Camelot are still needed to extract tables as reasonably well formatted text. It also needs a lot of extra tooling for web scraping. Sure it can read plain HTML from a URL, but it cannot run JavaScript, or control things like User Agent. It could be argued that such features are not within the scope, but it is rather bothersome for a library that presents a magic `partition` function for most standard text sources.
I'm sure it will get there soon though. It shouldn't be hard to integrate with state-of-the-art parsers and tooling, and the simple API undoubtedly brings a lot of peace of mind.