In my experience working with IMGUI-style libraries, it's definitely feasible to perform full layout and rendering for complex UI in 1-2ms at most. For simpler applications it should be basically free. It's depressing that people are willing to accept complex, slow layout APIs at this point considering it's been possible for stuff to be fast for a long time.
Perhaps surprisingly, the most expensive part is usually text layout and rendering - text shaping is just really expensive and in the industry standard libraries it can take a long time to lay out a string, so you have to aggressively cache and do fancy things to get good performance. ASCII text is fast, though.
Whenever people come up with low numbers like that their mental model is really what they see as sticking some ASCII on the screen with a couple of colors blitted in. Doing text rendering that is actually good, transparency, shadows, the things that make a UI worth using makes things somewhat more expensive. It doesn't have to be overbearingly expensive–modern computers are fast, after all–but it's definitely going to be more than what you think an imgui thing is going to take. And this is pretty important when you're making an OS, otherwise you're really making something that a lot of people will not want to or be able to use.
I've been doing complex UI with a few different IMGUI libraries for years now and they can all lay out and rasterize complex scenes in under 2ms, even when written in C# instead of performance-tuned C++. Modern hardware is just plain fast, in particular drop shadows and transparency are effectively free if you're drawing them using the GPU. (I say 'effectively' because if you opt out of transparency and antialiasing, you can do front-to-back rendering to skip drawing stuff entirely... but I don't know of many cases where people do that since it's not that much faster.)
Well, text layout is not trivial. But computers are incredibly fast when dealing with small amount of data.
I don't see why all of text layout calculations could not be done at maximum speed with everything in L1. Probably with quite a few branch miss-prediction, but still.
The amount of work you can achieve in 1ms on a modern CPU is astonishing.
My intuition is that there are probably a bunch of O(NNN) complexity algorithms in typesetting for things like ligatures, and that's where the time goes... but I haven't written a shaping engine.
Perhaps surprisingly, the most expensive part is usually text layout and rendering - text shaping is just really expensive and in the industry standard libraries it can take a long time to lay out a string, so you have to aggressively cache and do fancy things to get good performance. ASCII text is fast, though.