Having recently rewritten Linux's TLB code, this is quite wrong. For an ordinary page fault, there's no flush at all -- changing a page from not present to present doesn't require a flush on x86. Removing a page from page cache can be done with INVLPG, which had been around for a long, long time.
From 4.14 on, Linux has used PCID to improve context switches, independently of PTI. While writing that code, I did a bunch of benchmarking. INVPCID is not terribly useful, even with PCID. In fact, Linux only uses INVPCID on user pages to assist with a PTI corner case. It's not entirely clear to me what Intel had in mind when INVPCID was added
I think he means page fault every time a page is not present.
They're slower, because kernel needs to be mapped in and out of virtual address space, just like for syscalls.
If the access pattern is sufficiently local, perhaps this could be mitigated by using large (2MB) pages. A bad idea for a random access pattern, of course.