"93x faster" sounds roughly like a 46.5x improvement in marketing.

stcredzero · on June 2, 2014

It's not only possible, it's even not uncommon for a C programmer to get a 90X improvement in speed in their own C program. If you have naive memory management, or incorrectly implemented concurrency or parallelism, you can easily lose 2 orders of magnitude speed.

pling · on June 2, 2014

This. In my case a 1Mbyte memcpy in the middle of a loop this morning. Enough to blow the CPU cache out of the water...

300x improvement instantly by moving it out of the loop.

dbaupp · on June 2, 2014

Are you sure it wasn't just because you were then no longer doing a large memcpy repeatedly?

pling · on June 2, 2014

Yes it was entirely covered by that :)

I think it was covered by "naive memory management" and "shitty outsourcing". I'm paid to fix their stuff.

tomp · on June 3, 2014

Haha :) Maybe the shipped a better product, but the management said "No, it's not possible that this could run that fast. Something must be wrong.", so they put in some "waiting".

dbaupp · on June 3, 2014

If the problem was just the time taken to do a 1MB copy inside a loop, why did you say the problem was clearing the CPU caches?

pling · on June 3, 2014

Because the CPU has 32k of cache in this case (ARM) so the memcpy was evicting the entire cache several times in the loop as a side effect of doing the work. The actual function of the loop had good cache locality as the data was 6 stack vars totalling about 8k.

dbaupp · on June 3, 2014

So? Copying a megabyte is a really expensive thing to do inside a loop, even ignoring caches. (A full speed memcpy would take 40 microseconds, based on a memory bandwidth of 24 GB/s, which is a long time.)

cjslep · on June 2, 2014

My most painful personal experience dealing with this exact problem was with CUDA warps, during my undergrad research work.

_deh · on June 2, 2014

Marketing are claiming a 91.3x.