I don't think Julia belongs into the same list as C++, C and Fortran. It is true...

leephillips · on April 7, 2022

I was talking about reality: https://www.hpcwire.com/off-the-wire/julia-joins-petaflop-cl...

Nothing overstated; just the circumstance in the real world that these four languages are the only ones used in the most demanding HPC.

C and Fortran are also not as fast as they could be if you use them incorrectly. Using concrete types is not “significant tweaking”.

adgjlsfhk1 · on April 7, 2022

this is just false. you can not find reasonably well written Julia code that runs 10x slower than equivalent fortran.

cycomanic · on April 7, 2022

https://jochenschroeder.com/blog/articles/DSP_with_Python2/

There the "naive" Julia code, simply implementing the code like I would in Fortran is a factor of 10 or 15 slower than the optimised cython version (which would be the same as a regular C version), the optimised Julia version is still a factor of 5 slower than the cython and pythran version. Can you show me how to optimise it so that Julia performs on par with pythran or cython?

adgjlsfhk1 · on April 7, 2022

The naive Julia code made a few pretty fundamental mistakes (Complex vs Complex{Float64}, and row vs column major). The following is non-optimized Julia code that is roughly 6x faster (and much simpler) than the "optimized" code in the blogpost. Some further optimizations would give another 2-4x over this (like using StaticArrays), but I'll leave that as an exercise for the reader.

    apply_filter(x, y) = vec(y)' \* vec(x)

    function cma!(wxy, E, mu, R, os, ntaps)
        L, pols = size(E)
        N = (L ÷ os ÷ ntaps - 1) \* ntaps  # ÷ or div are integer division
        err = similar(E)  # allocate array without initializing its values
        @inbounds for k in axes(E, 2)  # avoid assuming 1-based arrays. Just need a single inbounds macro call
            @views for i in 1:N   # everything in this block is a view
                X = E[i*os-1:i*os+ntaps-2, :]
                Xest = apply_filter(X, wxy[:,:, k])
                err[i,k] = (R - abs2(Xest)) \* Xest  # abs2 avoids needless extra work
                wxy[:,:,k] .+= (mu \* conj(err[i,k])) .\* X  # remember the dots!
            end  
        end
        return wxy, err  # note order of returns, seems more idiomatic
    end