Julia – a Fresh Approach to Numerical Computing and Data Science
The Julia programming language was created in 2009 by Jeff Bezanson, Stefan Karpinski, and Viral B Shah. It was broadly announced in 2012 and has had a growing community of contributors and users ever since. There are over 700 packages in the official registry and the base language has had over 400 contributors.
Julia aims to address the “two language problem” that is all too common in technical computing. Interactive data exploration and algorithm prototyping is incredibly convenient and productive in high level dynamic languages like Python, Matlab, or R, but scaling initial prototypes to handle larger data sets quickly runs into performance limitations in those environments. December conventional approach has been to rely on compiled C (or C++, or Fortran) extensions to optimize performance-critical calculations. For common operations on standard arrays and dataframes, there are mature extension libraries – for example NumPy, SciPy, Pandas etc in Python. There is however a tension since existing libraries do not implement all the functionality that you need. Dropping into a lower level statically compiled language for performance imposes an error prone and productivity draining context switch, and increases the barrier to entry for contributing to or writing new libraries. There are recent projects in existing languages that help improve this situation – Cython and Numba for Python, improved just-in-time (JIT) compilation in Matlab, Rcpp for R, and others. However the compatibility requirements of operating within the same language semantics as an existing ecosystem mean these approaches are either limited to an easily-optimized restrictive subset of the language, or not suitable for day-to-day interactive use.
Julia takes a different (“greedy”) approach. Designing a new language without needing to maintain source (and binary, for compiled extensions to continue to work) compatibility with languages designed decades ago allows us to incorporate modern techniques that are now well established for how to make a dynamic language perform well. Type inference and careful design of the standard library to take advantage of type information allows optimization and aggressive specialization of fully generic code. Julia code is JIT compiled to native instructions via the LLVM compiler framework, which enables portability to multiple platforms and architectures. Thanks to LLVM, Julia code can leverage hardware vectorization via SIMD instructions on modern processors.
As a simple example, we can look at a basic implementation of Dual numbers. These are numbers of the form `a + b ε`, similar to complex numbers but rather than `i² = -1`, dual numbers are defined by `ε² = 0`. This is a number-like type and should have immutable value semantics rather than mutable reference semantics, so we define them in Julia using the `immutable` keyword. The value component and epsilon component should be the same numeric type, which we represent in Julia with a parametric type `{T}` as follows:
``` immutable DualNumber{T} value::T epsilon::T end ```
To define basic arithmetic operations on this new type, we first import the addition and multiplication operators, then extend them with new methods defined for the `DualNumber` type:
``` import Base: +, * # one-liner function definition: +(x::DualNumber, y::DualNumber) = DualNumber(x.value + y.value, x.epsilon + y.epsilon) # longer form function definition: function *(x::DualNumber, y::DualNumber) return DualNumber(x.value * y.value, x.value * y.epsilon + y.value * x.epsilon) end ```
We can introspect on the various stages of Julia compilation through LLVM to optimized native code for different specializations of
`DualNumber`, such as `DualNumber{Int64}` and `DualNumber{Float64}`, using the `@code_llvm` and `@code_native` macros:
``` julia> x_i64 = DualNumber(1, 2) DualNumber{Int64}(1,2) julia> y_i64 = DualNumber(3, 4) DualNumber{Int64}(3,4) julia> @code_llvm x_i64 * y_i64 define void @"julia_*_21313"(%DualNumber* sret, %DualNumber*, %DualNumber*) { top: %3 = load %DualNumber* %1, align 8 %4 = extractvalue %DualNumber %3, 0 %5 = extractvalue %DualNumber %3, 1 %6 = load %DualNumber* %2, align 8 %7 = extractvalue %DualNumber %6, 0 %8 = extractvalue %DualNumber %6, 1 %9 = mul i64 %7, %4 %10 = insertvalue %DualNumber undef, i64 %9, 0 %11 = mul i64 %8, %4 %12 = mul i64 %7, %5 %13 = add i64 %11, %12 %14 = insertvalue %DualNumber %10, i64 %13, 1 store %DualNumber %14, %DualNumber* %0, align 8 ret void } julia> @code_native x_i64 * y_i64 .text Filename: none Source line: 3 pushq %rbp movq %rsp, %rbp Source line: 3 movq (%rsi), %r8 movq 8(%rdx), %rax Source line: 3 imulq %r8, %rax Source line: 3 movq (%rdx), %rcx movq 8(%rsi), %rdx Source line: 3 imulq %rcx, %rdx imulq %r8, %rcx movq %rcx, (%rdi) addq %rax, %rdx movq %rdx, 8(%rdi) movq %rdi, %rax popq %rbp ret julia> x_f64 = DualNumber(1.0, 2.0) DualNumber{Float64}(1.0,2.0) julia> y_f64 = DualNumber(3.0, 4.0) DualNumber{Float64}(3.0,4.0) julia> @code_llvm x_f64 * y_f64 define void @"julia_*_21344"(%DualNumber.12* sret, %DualNumber.12*, %DualNumber.12*) { top: %3 = load %DualNumber.12* %1, align 8 %4 = extractvalue %DualNumber.12 %3, 0 %5 = extractvalue %DualNumber.12 %3, 1 %6 = load %DualNumber.12* %2, align 8 %7 = extractvalue %DualNumber.12 %6, 0 %8 = extractvalue %DualNumber.12 %6, 1 %9 = fmul double %4, %7 %10 = insertvalue %DualNumber.12 undef, double %9, 0 %11 = fmul double %4, %8 %12 = fmul double %5, %7 %13 = fadd double %11, %12 %14 = insertvalue %DualNumber.12 %10, double %13, 1 store %DualNumber.12 %14, %DualNumber.12* %0, align 8 ret void } julia> @code_native x_f64 * y_f64 .text Filename: none Source line: 3 pushq %rbp movq %rsp, %rbp Source line: 3 vmovsd (%rsi), %xmm1 Source line: 3 vmulsd 8(%rdx), %xmm1, %xmm0 Source line: 3 vmovsd (%rdx), %xmm3 Source line: 3 vmulsd 8(%rsi), %xmm3, %xmm2 vmulsd %xmm3, %xmm1, %xmm1 vmovsd %xmm1, (%rdi) vaddsd %xmm2, %xmm0, %xmm0 vmovsd %xmm0, 8(%rdi) movq %rdi, %rax popq %rbp ret ``` Custom printing for a type can be achieved by extending the `show` function: ``` julia> function Base.show(io::IO, x::DualNumber) print(io, string(x.value, "+", x.epsilon, "ε")) end show (generic function with 98 methods) julia> DualNumber(1, 2) 1+2ε ```
Once addition and multiplication are defined for a type, the only other method that is needed to use this new type in linear algebra operations such as matrix multiplication is defining the `zero` value, since Julia has fully generic implementations of many linear algebra operations.
``` julia> Base.zero{T}(::Type{DualNumber{T}}) = DualNumber(zero(T), zero(T)) zero (generic function with 14 methods) julia> values1 = rand(1:10, 6, 4); julia> epsilon1 = rand(1:10, 6, 4); julia> values2 = rand(1:10, 4, 3); julia> epsilon2 = rand(1:10, 4, 3); julia> dualmat1 = map(DualNumber, values1, epsilon1) 6x4 Array{DualNumber{Int64},2}: 9+10ε 4+7ε 6+3ε 4+4ε 10+3ε 7+9ε 4+4ε 7+2ε 2+7ε 9+4ε 10+8ε 5+8ε 10+4ε 2+8ε 3+6ε 6+2ε 3+2ε 7+10ε 5+2ε 5+4ε 3+4ε 4+2ε 7+3ε 8+1ε julia> dualmat2 = map(DualNumber, values2, epsilon2) 4x3 Array{DualNumber{Int64},2}: 6+2ε 2+8ε 4+3ε 5+3ε 5+2ε 10+1ε 3+7ε 5+6ε 3+3ε 1+10ε 6+8ε 2+6ε julia> dualmat1 * dualmat2 6x3 Array{DualNumber{Int64},2}: 96+220ε 92+242ε 102+200ε 114+216ε 117+257ε 136+209ε 92+245ε 129+256ε 138+183ε 85+191ε 81+240ε 81+195ε 73+184ε 96+196ε 107+183ε 67+191ε 109+177ε 89+129ε ```
Dual numbers have a very useful application to optimization, where they can be used for automatic differentiation. See http://julialang.org/blog/2015/10/auto-diff-in-julia for a Summer of Code report about the ForwardDiff.jl package, which contains a comprehensive and high performance implementation of dual numbers and generalizations to higher derivatives.
Julia is intended to be a best of both worlds language, offering both the productivity of Python and performance comparable to C++, and allowing a wide range of code styles spanning from C-like loops to high level functional abstractions. But there are millions of lines of existing code in an increasing number of other languages that it would be foolish not to leverage. Julia’s designers recognize this and have incorporated features to facilitate interoperability with other languages. Calling functions from C or Fortran shared libraries can be done with no extra boilerplate glue code, the `ccall` foreign function interface is simple to use even from Julia’s interactive read-eval-print-loop (REPL) prompt.
``` julia> ccall((:printf, "libc"), Void, (Cstring, Cstring), "%s\n", "Hello from C printf!") Hello from C printf! ```
See http://docs.julialang.org/en/release-0.4/manual/calling-c-and-fortran-code for more details. The PyCall.jl (https://github.com/stevengj/PyCall.jl) and JavaCall.jl (https://github.com/aviks/JavaCall.jl) packages allow calling directly into Python or Java libraries in a similar way. A foreign function interface for C++ libraries via embedding the LLVM Clang compiler library is under development (https://github.com/Keno/Cxx.jl).
Julia also supports Lisp-like macros which can be used for programmatic code generation and domain specific languages through metaprogramming. Julia syntax can be quoted and represented as an expression data structure.
Julia has built in support for distributed memory parallel programming using a message passing multiprocessing model (http://docs.julialang.org/en/release-0.4/manual/parallel-computing). Experimental support for shared memory multithreading was recently merged into the development master branch thanks to a collaboration with Intel Labs.
Julia version 0.4.0 was released in October, with many new features such as caching of compiled packages and a rewritten higher performance garbage collector. Point releases are being tagged on a monthly schedule bringing bugfixes to the stable branch, while new features are developed on master for 0.5.0 which is planned to be released in early 2016. The language is progressing towards a 1.0 release by finalizing standard library interfaces and features, and developing capabilities like an integrated LLDB (http://lldb.llvm.org) based debugger and Atom (https://atom.io) based editor plugins and development environment. The language is already very usable for data analytics and algorithm development, and libraries are being rapidly developed for database connections, user interface development, statistical analysis, numerical optimization, and visualization. Packages for large scale parallel data analysis using HDFS, Hive, and MPI libraries such as Elemental are among the highlights of the growing Julia ecosystem, see http://pkg.julialang.org/ for a listing of registered packages.
You can try Julia in your browser at https://www.juliabox.org, or download the latest release from http://julialang.org/downloads. The language is developed on GitHub at https://github.com/JuliaLang/julia, and the mailing list is at https://groups.google.com/forum/#!forum/julia-users.
Reference: | Julia – a Fresh Approach to Numerical Computing and Data Science from our JCG partner Tony Kelman at the Mapr blog. |