I don't think Julia belongs into the same list as C++, C and Fortran. It is true that for some algorithms it is almost the same speed as C++ out of the box, but for many others it is still factors of 10s or 100s of. Also it often requires significant tweaking to get to it's best performance (e.g. don't use abstract types), so it is almost like saying Python is the a fast language, because you can use Cython or Pythran. I really wish Julia fans would stop overstating the language capabilities, it really does a disservice to an otherwise great language.
There the "naive" Julia code, simply implementing the code like I would in Fortran is a factor of 10 or 15 slower than the optimised cython version (which would be the same as a regular C version), the optimised Julia version is still a factor of 5 slower than the cython and pythran version. Can you show me how to optimise it so that Julia performs on par with pythran or cython?
The naive Julia code made a few pretty fundamental mistakes (Complex vs Complex{Float64}, and row vs column major). The following is non-optimized Julia code that is roughly 6x faster (and much simpler) than the "optimized" code in the blogpost. Some further optimizations would give another 2-4x over this (like using StaticArrays), but I'll leave that as an exercise for the reader.
apply_filter(x, y) = vec(y)' \* vec(x)
function cma!(wxy, E, mu, R, os, ntaps)
L, pols = size(E)
N = (L ÷ os ÷ ntaps - 1) \* ntaps # ÷ or div are integer division
err = similar(E) # allocate array without initializing its values
@inbounds for k in axes(E, 2) # avoid assuming 1-based arrays. Just need a single inbounds macro call
@views for i in 1:N # everything in this block is a view
X = E[i*os-1:i*os+ntaps-2, :]
Xest = apply_filter(X, wxy[:,:, k])
err[i,k] = (R - abs2(Xest)) \* Xest # abs2 avoids needless extra work
wxy[:,:,k] .+= (mu \* conj(err[i,k])) .\* X # remember the dots!
end
end
return wxy, err # note order of returns, seems more idiomatic
end