CoCalc -- 2018-03-07.ipynb

Download

📚 The CoCalc Library - books, templates and other resources

📚 The Library - Shared Public Version

cocalc-examples / math157 / lectures / 2018-03-07 / 2018-03-07.ipynb

¹³²⁹²⁸ views
License: OTHER

Kernel: Julia

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

March 7, 2018: Julia (part 3 of 3)

Administrivia:

CAPE evaluations are available! They close Monday, March 19 at 8am. Since this course is highly experimental, your feedback will be very helpful for shaping future offerings.
Thomas's office hours (usually Tuesday 11am-12pm) are moved to Friday 11:30am-12:50pm.
Peter's office hours (usually Wednesday 3-5pm) are moved to Wednesday 5-7pm.
There will be an extra virtual office hour Thursday 6-7pm.

Advance notice for week 10:

No lectures on Monday, March 12 or Wednesday, March 14. You may wish to use this time to meet your final project group.
There will be a lecture on Friday, March 16, on the topic of "Where to go from here?" This lecture will not be counted for course attendance; that is, the last lecture for which attendance counts is Friday, March 9.
My office hours on Thursday, March 15 are cancelled. All other sections and office hours meet as scheduled.

Final project news

Both parts will be collected Sunday, March 18 at 8pm.
The groups for part 2 have been assigned. See the file final_project_groups.md in the shared project; then contact your group members as soon as possible. (I have created workspaces and chat rooms which may help with this; these are described in the same file.)
For part 1, problem 4b, instead of the exhaustive search, do random sampling with $10^4$ samples. Ditto for 4c.
A significant portion of the lecture hour on Friday, March 9 will be devoted to questions about Part 1 of the final project. I would recommend having a look at Part 1 before Friday if you have not yet done so.

The Nemo project

Since Julia was designed to be a general language for scientific computation, it does not include much of the functionality of a computer algebra system. In Python, we can remedy this by using Sage. Recently, the Nemo project was created to provide some (although for now only a little) of the same functionality in Julia.

In [45]:

using Nemo

Out[45]:

Welcome to Nemo version 0.7.3

Nemo comes with absolutely no warranty whatsoever

In Nemo, mathematical objects are generally constructed by first constructing a parent object which represents the collection of all objects of a particular type. So for example, Nemo big integers are constructed using the special object ZZ which stands for the collection of all integers.

In [46]:

ZZ

Out[46]:

Integer Ring

In [48]:

23455678^15

Out[48]:

6270690073961791488

In [47]:

ZZ(23455678)^15

Out[47]:

357865526951036277776151046050141172644498244064981619523667001448806919564814697587707949626796007199526060032

Some more examples from this list:

In [50]:

QQ(23/45)^12

Out[50]:

MethodError: no method matching (::Nemo.FlintRationalField)(::Float64)
Closest candidates are:
  FlintRationalField(::Nemo.fmpq) at /ext/julia/julia-0.6.2/share/julia/site/v0.6/Nemo/src/flint/fmpq.jl:776
  FlintRationalField(::Nemo.fmpz, ::Nemo.fmpz) at /ext/julia/julia-0.6.2/share/julia/site/v0.6/Nemo/src/flint/fmpq.jl:774
  FlintRationalField(::Nemo.fmpz) at /ext/julia/julia-0.6.2/share/julia/site/v0.6/Nemo/src/flint/fmpq.jl:766
  ...

Stacktrace:
 [1] include_string(::String, ::String) at ./loading.jl:522

In [49]:

# Rational number, specified by numerator and denominator
QQ(23, 45)^12

Out[49]:

21914624432020321//68952523554931640625

In [51]:

# Integers mod n
ResidueRing(ZZ, 37)(2)^15 # Computing 2^15 mod 37

Out[51]:

23

In [52]:

# Real numbers with specified precision
sqrt(RealField(200)(2))

Out[52]:

[1.41421356237309504880168872420969807856967187537694807317668 +/- 1.63e-60]

By the way, real numbers in Nemo automatically handle interval arithmetic; that is, they keep track of error bounds as you carry out operations. This functionality is provided by the Arb library.

In [53]:

# Matrices with integer entries
S = MatrixSpace(ZZ, 10, 10)
S(rand(0:1,10,10))

Out[53]:

[1 0 0 0 0 0 0 1 1 1]
[0 0 1 0 1 1 0 0 1 0]
[0 1 0 1 0 1 1 0 0 0]
[0 1 0 1 0 1 1 1 1 1]
[0 1 1 1 1 0 1 0 1 0]
[1 1 1 0 0 0 1 1 0 0]
[1 1 0 0 0 0 0 1 1 0]
[0 1 1 0 0 1 1 0 1 0]
[0 1 0 0 0 0 0 0 0 1]
[1 1 1 0 0 1 0 1 1 0]

In [57]:

# Polynomials with integer coefficients
S, x = PolynomialRing(ZZ, "x")
(x+1)^4

Out[57]:

x^4+4*x^3+6*x^2+4*x+1

Performance comparisons

Here is an example from the book Julia High Performance by way of Stack Overflow. Switch to the Python 3 (Ubuntu Linux) kernel now.

In [1]:

import scipy as sp
import time

# Sum of 1/t^2 from t = 1 to n by using loops:
# --------------------------------------------
def pisum(w,n):
    u_sum = 0
    for vi in range(w):
        u_sum = 0
        for vj in range(1,n+1,1):
            u_sum += 1.0/(vj*vj)
    return u_sum

# Sum of 1/t^2 from t = 1 to n by using scipy functions:
# ------------------------------------------------------
def pisum1(w,n):
    for vi in range(w):
        vj = sp.arange(1,n+1,1)
        vj = sp.multiply(vj,vj)
        u_sum_sp = (sp.divide(sp.ones(n),vj)).sum()
    return u_sum_sp

# Sum of 1/t^2 from t = 1 to n by using scipy functions & calculating 
# the sum via pure python:
# -------------------------------------------------------------------
def pisum2(w,n):
    for vi in range(w):
        vj = sp.arange(1,n+1,1)
        vj = sp.multiply(vj,vj)
        u_sum_py = sum(sp.divide(sp.ones(n),vj))
    return u_sum_py

In [2]:

# Benchmarking the methods :
# ==========================   

w = 500 
n = 10000

# 1) Loops:
# ---------    
ta = time.clock()
u_sum_loops = pisum(w,n)
eltime_loops = time.clock() - ta

# 2) scipy:
# ---------
ta = time.clock()
u_sum_sp = pisum1(w,n)
eltime_sp= time.clock() - ta


# 3) scipy & sum via python:
# --------------------------
ta = time.clock()
u_sum_py = pisum2(w,n)
eltime_py= time.clock() - ta

In [3]:

row_format = '{:<35} {:<5} {:<}'
print("Overview calculation time:")
print("-"*50)
print(row_format.format("elapsed time using loops:","%.5f" % eltime_loops,"sec."))
print(row_format.format("elapsed time using scipy:","%.5f" % eltime_sp,"sec."))
print(row_format.format("elapsed time using python:","%.5f" % eltime_py,"sec."))

line1 = "sum loops:",u_sum_loops
line2 = "sum scipy:",u_sum_sp
line3 = "sum scipy and sum via python:",u_sum_py
row_format = '{:<29} {:<18}'
print("Overview Sum:")
print("-"*50)
print(row_format.format(*line1))
print(row_format.format(*line2))
print(row_format.format(*line3))

Out[3]:

Overview calculation time:
--------------------------------------------------
elapsed time using loops:           0.86448 sec.
elapsed time using scipy:           0.03202 sec.
elapsed time using python:          0.61134 sec.
Overview Sum:
--------------------------------------------------
sum loops:                    1.6448340718480652
sum scipy:                    1.6448340718480603
sum scipy and sum via python: 1.6448340718480652

Now switch the kernel back to Julia.

In [6]:

# Wrong way

function pisum(w,n)
    u_sum = 0;
    for vi = 1:w
        u_sum = 0;
        for vj = 1:n
            u_sum += 1.0/(vj*vj);
        end
    end
    u_sum
end

@time u_sum = pisum(500,10000)

Out[6]:

  0.172473 seconds (15.00 M allocations: 228.994 MiB, 9.09% gc time)

1.6448340718480652

In [3]:

# Right way

function pisum(w,n)
    u_sum = 0.0;
    for vi = 1:w
        u_sum = 0.0;
        for vj = 1:n
            u_sum += 1.0/(vj*vj);
        end
    end
    u_sum
end
@time u_sum = pisum(500,10000)

Out[3]:

  0.021794 seconds (2.21 k allocations: 113.106 KiB)

1.6448340718480652

Note that the second approach is 2x faster than scipy and about 100x faster than basic Python; however, the first approach is no faster than basic Python. Why the discrepancy?

In [4]:

n = 10
u_sum = 0;
println(typeof(u_sum))
for vj = 1:n
   u_sum += 1.0/(vj*vj);
   println(typeof(u_sum))
end

Out[4]:

Int64
Float64
Float64
Float64
Float64
Float64
Float64
Float64
Float64
Float64
Float64

Note that u_sum is originally created as an integer 0, and then at the first step it is converted to a floating-point 0. That doesn't seem like a big penalty to pay, and at runtime it isn't.

But remember that Julia's performance benefits derive from it having a just-in-time (JIT) compiler. The issue here is that the Julia compiler sees that this conversion is necessary and is forced to compile in a different (and less efficient) way because of this. This phenomenon is known as type instability.

The moral is that, while Julia attempts to make it easier to write efficient code, this is not entirely automatic; one must still pay some attention to the underlying structure of the language in order to make things not just run, but run efficiently.

Here is an example of a different issue.

In [7]:

w = 500
n = 10000
tic()
u_sum = 0.0;
for vi = 1:w
    u_sum = 0.0;
    for vj = 1:n
        u_sum += 1.0/(vj*vj);
    end
end
eltime = toc()

Out[7]:

elapsed time: 0.992005493 seconds

0.992005493

What is going on here? I avoided type instability and I still have slow code.

This time, the issue is that the global scope does not allow type specificity. That is, anything time-sensitive should be done inside a function call.

See this blog post for other examples of easy traps to fall into when trying to write Julia code.

More examples

The page Julia by Example has some basic examples which might be of use when getting started with the language. Here are a few of them.

Printing

In [9]:

# available number format characters are f, e, g, c, s, p, d:
# (pi is a predefined constant; however, since its type is 
# "MathConst" it has to be converted to a float to be formatted)
@printf "fix trailing precision: %0.3f\n" float(pi)
@printf "scientific form: %0.6e\n" 1000pi
# g is not implemented yet
@printf "a character: %c\n" 'α'
@printf "a string: %s\n" "look I'm a string!"
@printf "right justify a string: %50s\n" "width 50, text right justified!"
@printf "a pointer: %p\n" 100000000
@printf "print a integer: %d\n" 1e50

Out[9]:

fix trailing precision: 3.142
scientific form: 3.141593e+03
a character: α
a string: look I'm a string!
right justify a string:                    width 50, text right justified!
a pointer: 0x0000000005f5e100
print a integer: 100000000000000007629769841091887003294964970946560

Error handling

In [5]:

a=[]
# try, catch can be used to deal with errors as in Python
try
    pop!(a, 1) # Try changing this to create an error...
catch err
    println("hello")
    showerror(STDERR, err, backtrace())
    println()
end
println("Continuing after error")

Out[5]:

hello

Continuing after error

MethodError: no method matching pop!(::Array{Any,1}, ::Int64)
Closest candidates are:
  pop!(::IntSet, ::Integer) at intset.jl:77
  pop!(::IntSet, ::Integer, ::Any) at intset.jl:80
  pop!(::ObjectIdDict, ::ANY) at associative.jl:445
  ...
Stacktrace:
 [1] backtrace() at ./error.jl:46
 [2] include_string(::String, ::String) at ./loading.jl:522
 [3] include_string(::Module, ::String, ::String) at /ext/julia/julia-0.6.2/share/julia/site/v0.6/Compat/src/Compat.jl:174
 [4] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /ext/julia/julia-0.6.2/share/julia/site/v0.6/IJulia/src/execute_request.jl:154
 [5] (::Compat.#inner#16{Array{Any,1},IJulia.#execute_request,Tuple{ZMQ.Socket,IJulia.Msg}})() at /ext/julia/julia-0.6.2/share/julia/site/v0.6/Compat/src/Compat.jl:496
 [6] eventloop(::ZMQ.Socket) at /ext/julia/julia-0.6.2/share/julia/site/v0.6/IJulia/src/eventloop.jl:8
 [7] (::IJulia.##14#17)() at ./task.jl:335

Plotting

Plotting is provided using an external package called Winston. (I don't know where the name comes from.)

In [6]:

using Winston

In [7]:

# plot some data
pl = plot(cumsum(rand(500) .- 0.5), "r", cumsum(rand(500) .- 0.5), "b")

Out[7]:

In [8]:

# save the current figure
savefig("winston.svg")
# .eps, .pdf, & .png are also supported

DataFrames

The DataFrames package does what you might expect. Let's get the irises dataset as a CSV from here, then try the following.

In [9]:

using DataFrames

In [10]:

iris = readtable("iris.csv")
head(iris)

Out[10]:

In [11]:

# Check the names and element types of the columns of our new DataFrame
names(iris)

Out[11]:

6-element Array{Symbol,1}:
 :x           
 :Sepal_Length
 :Sepal_Width 
 :Petal_Length
 :Petal_Width 
 :Species     

In [12]:

eltypes(iris)

Out[12]:

6-element Array{Type,1}:
 Int64  
 Float64
 Float64
 Float64
 Float64
 String 

In [14]:

# Subset the DataFrame to only include rows for one species
head(iris[iris[:Species] .== "virginica", :])

Out[14]:

In [15]:

# Count the number of rows for each species
by(iris, :Species, df -> size(df, 1))

Out[15]:

In [16]:

# Discretize entire columns at a time
iris[:Sepal_Length] = round.(Integer, iris[:Sepal_Length])
iris[:Sepal_Width] = round.(Integer, iris[:Sepal_Width])
head(iris)

Out[16]:

In [17]:

# Tabulate data according to discretized columns to see "clusters"
tabulated = by(
    iris,
    [:Species, :Sepal_Length, :Sepal_Width],
    df -> size(df, 1)
)
head(tabulated)

Out[17]:

In [18]:

# you can setup a grouped dataframe like this
gdf = groupby(iris,[:Species, :Sepal_Length, :Sepal_Width])

# and then iterate over it
for idf in gdf
    println(size(idf,1))
end

Out[18]:

In [19]:

# insert random numbers at col 5:
insert!(iris, 5, rand(nrow(iris)), :randCol)
head(iris)

Out[19]:

In [20]:

# remove it
delete!(iris, :randCol)
head(iris)

Out[20]:

For more on Julia...

The page Julia by Example has a few very basic examples which might be of use when getting started with the language.
The page Learning Julia has a number of links to tutorials, videos, sample Jupyter notebooks, and even books on the topic.

In [0]: