If you have pdfs that are scans of resumes (as in his example), then PDF Text extraction is the least of your problems. It's actually extremely useful to automatically generate an index of the words in resumes if you have a lot of them, but you'll need OCR to do it.