Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would be surprised if prefetching helped at all, given that the memory access pattern is completely linear. But yes, it's always a problem that the work in a paper is tuned to the particular machine the benchmark is run on, while the things that are compared against are not. Also agreed that it's suspicious that Haskell is using AVX while the C isn't. However compared to the second benchmark, the first one is very fair ;-)

Perhaps somebody with an AVX capable processor can redo the two benchmarks with correct compiler options (i.e. -mavx) and/or with ICC, and replacing the C code of the second benchmark with code that is equivalent to the Haskell code:

    double s=0;
    for(int i=0; i<n; i++) s += pow(a[i]-b[i],2);


I made a quick test for the C versions to try out flags and such: https://gist.github.com/floodyberry/5335542

I found out gcc supports -fprefetch-loop-arrays, although it is not guaranteed to have a positive effect. In this case, it does appear to run faster than without prefetching. AVX versions also run faster than the standard SSE counterparts even with 128 bit registers. ymm regs are faster than xmm.

gcc is faster than icc actually, and icc runs faster with the plain c version than the vectorized versions. I can't figure out how to make icc prefetch either, the switch that controls it doesn't seem to have an effect.


Nice! What were the results?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: