But that's the core of the problem -- the code has to be architecture specific b...

But that's the core of the problem -- the code has to be architecture specific because the constant nature of integer multiplication depends on the processor architecture. Once you're writing a multiple for an XX-generation Intel processor, you might as well write it as assembly, rather than have a MUL_XX macro that will probably do the right thing.