Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have recently needed a decently performing FFT. Instead of doing Cooley-Tukey, I have realized the bruteforce version essentially computes two vector×matrix products, so I have interleaved and reshaped the matrices for sequential full-vector loads, and did bruteforce version with AVX1 and FMA3 intrinsics. Good enough for my use case of moderately sized FFT where matrices fit in L2 cache.


I'm curious why you wouldn't just use a library like FFTW or Intel's IPP (or NVidia's cuFFT if applicable) ?


For FFTW the showstopper was GPL license. For IPP, 200 MB of binary dependencies, also I remember when Intel was caught testing for Intel CPUs specifically in their runtime libraries instead or CPUID feature bits, deliberately crippling performance on AMD CPUs. I literally don’t have any Intel CPUs left in this house. For cuFFT, the issue is vendor lock-in to nVidia.

And the problem is IMO too small to justify large dependencies. I only needed like 200×400 FFT as a minor component of a larger software.


It would be interesting to see how it compares to https://gitlab.mpcdf.mpg.de/mtr/pocketfft. The c++ branch is header only. I believe this is what scipy uses by default




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: