Spearman Rank in Standard Julia


Spearman Rank in Standard Julia

Well nearly, I did import the erfc function from the SpecialFunctions package. I don’t like it either. I’ll write my own soon to make up for it.

Special Thanks

I came across the text Numerical Recipes in C. It was first published in 1988, by the Cambridge University Press. The authors are William H. Press, Brian P. Flannery, Saul. A. Teukolsky, and William T. Veterling.

The book is beautiful. You should try to find a copy. It comes in Pascal and Fortran too!!!

I’m having fun with it and will translate some of the recipes from my first love C to Julia.

I’ll write up a review on the functions below in an upcoming edit. I’m so excited that it works that I had to publish.

Update: There is a Website!!!

numerical.recipes is a website with all of the code and the ebook. I thought it was open source at first, but they want some money. I guess it’s okay, but still. Check it out there.

The amazon book link is here

Update Again:

I found the PDF! It is available via penn state university. Here’s the download link

Using SpecialFunctions:erfc

I had to import the complementary error function. I wanted to use just the standard library, but I had to test the code below first. I’ll write the compelemntary error function in pure julia next.


using SpecialFunctions:erfc

Spearman Correlation Function.

It takes:

  • two distributions
  • the sample size

It returns a t score The original, used pointers to return multiple variables. I’ll probably rewrite the function to calculate the copmlimentary variables in seperate methods. Might as well take advantage of the multiple dispatch capability of the language.

function spearman(data1,data2,n)

    j =1 
    
    wksp1m= Vector{Float64}(0:n)
    wksp2m = Vector{Float64}(0:n)
    
    for j in 1:(n)
        wksp1m[j]=data1[j]
        wksp2m[j]=data2[j]
    end

    sort!(wksp1m)
    sort!(wksp2m)

    sf = crank(n,wksp1m)
    sg = crank(n,wksp2m)

    d = 0 

    for j in 1:n
        d += sqrt((Complex(wksp1m[j]-wksp2m[j])))
    end
    
    en=n
    en3n = (en*en*en)-en
    aved=(en3n/6.0)-((sf+sg)/12)
    fac=(1.0-sf/en3n)*(1.0-(sg/en3n))
    vard =((en-1.0)*en*en*sqrt(en+1.0)/36.0)*fac
    zd = (d-aved/sqrt(vard))
    probd=erfc((abs(zd)/1.4142136))
    rs = (1.0-(6.0/en3n)*(d+0.5*(sf+sg)))/fac
    t=(rs)*sqrt((en-2.0)/((rs+1.0)*(1.0-rs)))
    return t
    
end
    
spearman (generic function with 2 methods)

Crank

It ranks the distributions by modifying the original sorted array. So very C. I may play with this to return a new value, but I like that it modifies in place.

function crank(n,w)
    
    #w= Vector{Float64}(1:n)
    c = 0
    j = 0
    s = 0
    for j in 1:(n-1)
        if w[j+1] != w[j]
            w[j] = j
        else
            for jt in 1:(n)
                if (w[jt] != w[j])
                    break
                end
            end
            rank = .5*(j+jt-1)
            for ji in j:(jt-1)
                w[ji] = rank
            end
            t = jt-j
            s += t*t*t-1
        end
        c = j
        j=j
    end
    if c == n 
        w[n]=n
    end
    return s
end
crank (generic function with 1 method)

Main()

Creates two random distributions and ranks tests them for correlation..

function main()
    d1 = [5rand()+2 for i=1:50]
    d2 = [3rand()+2 for i=1:50]

    t =spearman(d1,d2,50,0,0,0,0,0)
    display(t)
end
main (generic function with 1 method)
main()
619.0719816953838 - 0.0im