Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"93x faster" sounds roughly like a 46.5x improvement in marketing.


It's not only possible, it's even not uncommon for a C programmer to get a 90X improvement in speed in their own C program. If you have naive memory management, or incorrectly implemented concurrency or parallelism, you can easily lose 2 orders of magnitude speed.


This. In my case a 1Mbyte memcpy in the middle of a loop this morning. Enough to blow the CPU cache out of the water...

300x improvement instantly by moving it out of the loop.


Are you sure it wasn't just because you were then no longer doing a large memcpy repeatedly?


Yes it was entirely covered by that :)

I think it was covered by "naive memory management" and "shitty outsourcing". I'm paid to fix their stuff.


Haha :) Maybe the shipped a better product, but the management said "No, it's not possible that this could run that fast. Something must be wrong.", so they put in some "waiting".


If the problem was just the time taken to do a 1MB copy inside a loop, why did you say the problem was clearing the CPU caches?


Because the CPU has 32k of cache in this case (ARM) so the memcpy was evicting the entire cache several times in the loop as a side effect of doing the work. The actual function of the loop had good cache locality as the data was 6 stack vars totalling about 8k.


So? Copying a megabyte is a really expensive thing to do inside a loop, even ignoring caches. (A full speed memcpy would take 40 microseconds, based on a memory bandwidth of 24 GB/s, which is a long time.)


My most painful personal experience dealing with this exact problem was with CUDA warps, during my undergrad research work.


Marketing are claiming a 91.3x.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: