This report is the continuation of the work done in:
Basic Comparison of Python, Julia, R, Matlab and IDL
Here we:
- Add new versions of languages
- Add JAVA
- Add more test cases.
- For each language, consistantly use the same method to measure the elapsed time.
- Provide source codes for all the test cases.
- Present all the timing results to the fourth digit accuracy (any number less tha 0.0001 is rounded to 0).
While reading this report, be mindful of the following:
- Our intention is not to claim that one language is better than the other.
- In our work, we are often asked to address users’ issues on the computing languages Python, Matlab, IDL, etc. We only have few hours to understand the coding principles of those languages and quickly write codes that resolve users’ issues. We present results in the point of view of a novice programmer.
- If you are an advanced programmer or a language developer and you have results (obtained with optimization techniques) you want to share, feel free to contact us (with a web link) and we wil provide a link to your results here.
All the experiments presented here were done on Intel Xeon Haswell processor node. Each node has 28 cores (2.6 GHz each) and 128 Gb of available memory. We consider the following versions of the languages:
Language | Version | Free? |
---|---|---|
Python | 2.7.1 | Yes |
Julia | 0.6.2 | Yes |
JAVA | 1.8.0_92 | Yes |
IDL | 8.5 | No |
Matlab | R2017a | No |
R | Yes | |
GNU Compilers | 7.3 | Yes |
Intel Compilers | 18.0.1.163 | No |
Scala | 2.12.4 | Yes |
Problem 1: Fibonacci Number
The Fibonacci numbers are the sequence of numbers defined by the linear recurrence equation:
|
Fibonacci numbers find applications in the fields of economics, computer science, biology, combinatoric, etc.
We implemented both the iterative method and the recursive one, and we record the elepased time for generating the Fibonacci numbers for a given n.
Language | Option | n=25 | n=35 | n=45 |
---|---|---|---|---|
Python | 0 | 0 | 0 | |
Python + Numba | 0 | 0 | 0 | |
Julia | 0 | 0 | 0 | |
IDL | 0 | 0 | 0 | |
Matlab | 0.0098 | 0.0032 | 0.0025 | |
R | ||||
JAVA | 0 | 0 | 0 | |
Scala | 0 | 0 | 0 | |
Fortran | gfortran | 0 | 0 | 0 |
gfortran -O3 | 0 | 0 | 0 | |
ifort | 0 | 0 | 0 | |
ifort -O3 | 0 | 0 | 0 | |
C | gcc | 0 | 0 | 0 |
gcc -Ofast | 0 | 0 | 0 | |
icc | 0 | 0 | 0 | |
icc -Ofast | 0 | 0 | 0 |
Table 1.1: Elapsed times (in seconds) obtained by computing Fibonacci numbers using then iterative method.
Language | Option | n=25 | n=35 | n=45 |
---|---|---|---|---|
Python | 0.0211 | 2.5284 | 311.2046 | |
Python + Numba | 0.03 | 0.1 | 8.82 | |
Julia | 0 | 0.0335 | 4.130 | |
IDL | 0.0301 | 2.2573 | 304.2285 | |
Matlab | 0.0128 | 0.5149 | 58.9283 | |
R | ||||
JAVA | 0.0016 | 0.0414 | 4.8609 | |
Scala | 0.001 | 0.045 | 5.289 | |
Fortran | gfortran | 0 | 0.0840 | 10.4326 |
gfortran -O3 | 0 | 0.0280 | 3.4602 | |
ifort | 0 | 0 | 0 | |
ifort -O3 | 0 | 0 | 0 | |
C | gcc | 0 | 0.04 | 5.07 |
gcc -Ofast | 0 | 0.01 | 1.66 | |
icc | 0 | 0.02 | 3.15 | |
icc -Ofast | 0 | 0.02 | 3.07 |
Table 1.2: Elapsed times (in seconds) obtained by computing Fibonacci numbers using the recursive method.
Problem 2: Copy Arrays
This test case is meant to show how fast languages access non-contiguous memory locations.
Consider an arbitrary nxnx3 matrix A. We want to perform the following operations on A:
A(i,j,1) = A(i,j,2)
A(i,j,3) = A(i,j,1)
A(i,j,2) = A(i,j,3)
For instance, in Python the code looks like:
for i in range(n):
for j in range(n):
A[i,j,1] = A[i,j,2]
A[i,j,3] = A[i,j,1]
A[i,j,2] = A[i,j,3]
The above code segment uses loops. We are also interested on how the same operations are done using vectorization:
A[:,:,1] = A[:,:,2]
A[:,:,3] = A[:,:,1]
A[:,:,2] = A[:,:,3]
The problem allows us to see how each language handles loops and vectorization. We record the elapsed time needed to do the array assignments.
Language | Option | n=5000 | n=7000 | n=9000 |
---|---|---|---|---|
Python | 18.6055 | 37.1279 | 61.0172 | |
Python + Numba | 0.26 | 0.26 | 0.34 | |
Julia | 0.0907 | 0.1386 | 0.2274 | |
IDL | 6.8773 | 13.2422 | 21.9349 | |
Matlab | 0.2787 | 0.5223 | 0.8437 | |
R | 19.750 | 38.635 | 63.820 | |
JAVA | 0.1420 | 0.2680 | 0.4350 | |
Scala | 0.204 | 0.349 | 0.51 | |
Fortran | gfortran | 0.1760 | 0.3480 | 0.5760 |
gfortran -O3 | 0.0720 | 0.1360 | 0.2200 | |
ifort | 0.0680 | 0.1360 | 0.2120 | |
ifort -O3 | 0.0680 | 0.1320 | 0.2120 | |
C | gcc | 0.1700 | 0.3400 | 0.5700 |
gcc -Ofast | 0.0900 | 0.1800 | 0.2900 | |
icc | 0.0900 | 0.1800 | 0.3000 | |
icc -Ofast | 0.0900 | 0.1800 | 0.3000 |
Table 2.1: Elapsed times (in seconds) obtained by copying a matrix using loops.
Language | Option | n=5000 | n=7000 | n=9000 |
---|---|---|---|---|
Python | 0.4953 | 0.9689 | 1.5962 | |
Python + Numba | 0.834 | 1.29 | 1.96 | |
Julia | 0.2926 | 0.5471 | 0.8964 | |
IDL | 0.4091 | 0.8093 | 1.3315 | |
Matlab | 0.2845 | 0.5841 | 0.9193 | |
R | 2.956 | 5.785 | 9.566 | |
Fortran | gfortran | 0.0960 | 0.2480 | 0.3080 |
gfortran -O3 | 0.0920 | 0.1840 | 0.3040 | |
ifort | 0.1200 | 0.2320 | 0.3760 | |
ifort -O3 | 0.1200 | 0.2320 | 0.3880 |
Table 2.2: Elapsed times (in seconds) obtained by copying a matrix using vectorization.
Problem 3: Matrix Multiplication
We multiply two randomly generated nxn matrices A and B:
C=AxB
This problem shows the importance of taking advantage of built-in libraries available in each language. The elapsed times presented here only measure the times spent on the multiplication (as the size of the matrix varies).
Language | Option | n=1500 | n=1750 | n=2000 |
---|---|---|---|---|
Python | intrinsic | 0.58 | 0.96 | 0.97 |
Python + Numba | (loop) | 3.64 | 6.33 | 13.57 |
Julia | intrinsic | 0.1494 | 0.2391 | 0.3497 |
IDL | intrinsic | 0.3028 | 0.3613 | 0.4797 |
Matlab | intrinsic | 0.9567 | 0.2575 | 0.2943 |
R | 0.920 | 1.158 | 0.951 | |
JAVA | (loop) | 6.8530 | 13.4700 | 29.2320 |
Scala | (loop) | 9.258 | 14.482 | 23.363 |
Fortran | gfortran (loop) | 17.2450 | 31.2299 | 60.1837 |
gfortran -O3 (loop) | 3.3202 | 5.3043 | 12.3367 | |
gfortran (matmul) | 0.3520 | 0.5600 | 0.8280 | |
gfortran -O3 (matmult) | 0.3480 | 0.5560 | 0.7840 | |
ifort (loop) | 1.1400 | 1.8081 | 3.1001 | |
ifort -O3 (loop) | 0.5200 | 0.8240 | 1.2760 | |
ifort (matmul) | 1.1400 | 1.8121 | 2.9001 | |
ifort -O3 (matmul) | 1.1400 | 1.8121 | 2.9881 | |
ifort (DGEMM) | 0.2120 | 0.2280 | 0.3320 | |
C | gcc (loop) | 13.4900 | 20.9600 | 31.4800 |
gcc -Ofast (loop) | 1.3500 | 2.3900 | 4.3700 | |
icc (loop) | 1.2100 | 2.1600 | 4.0200 | |
icc -Ofast (loop) | 1.1500 | 1.7000 | 2.6600 |
Table 3.1: Elapsed times (in seconds) obtained by multiplying two randomly generated matrices.
Problem 4: Gauss-Legendre Quadrature
The Gauss-Legendre quadrature formulas approximate the integral of a functionby a weighted sum of function-values. When m function-values are used, the formula is exact for polynomials of degree zero through 2m — 1.
Language | Option | n=50 | n=75 | n=100 |
---|---|---|---|---|
Python | 0.1345 | 0.0183 | 0.0186 | |
Julia | 1.2962 | 1.3553 | 1.3556 | |
IDL | 0.0006 | 0.0009 | 0.0014 | |
R | ||||
JAVA | ||||
Matlab | 0.7739 | 0.7197 | 0.0853 | |
Fortran | gfortran | 0 | 0.004 | 0.008 |
gfortran -O3 | 0 | 0.004 | 0.008 | |
ifort | 0 | 0.004 | 0.008 | |
ifort -O3 | 0 | 0.004 | 0.008 | |
C | gcc | |||
gcc -Ofast | ||||
icc | ||||
icc -Ofast |
Table 4.1: Elapsed times (in seconds) obtained by performing the Gauss-Legendre qudrature.
Problem 5: Numerical Approximation of the 2D Laplace Equation
We find the numerical solution of the 2D Laplace equation:
We use the Jacobi iterative solver. We are interested in fourth-order compact finite difference scheme (Gupta, 1984):
The Jacobi iterative solver stops when the difference of two consecutive approximations falls below 10^{-6}.
Language | Option | n=100 | n=150 | n=200 |
---|---|---|---|---|
Python | 142.7886 | 705.268 | 2188.007 | |
Python + Numba | 1.2764 | 5.4262 | 16.396 | |
Julia | 1.0309 | 5.1724 | 16.1657 | |
optimized | 1.0987 | 5.5039 | 17.1473 | |
optimized_smind | 0.6215 | 3.0289 | 9.4964 | |
IDL | 83.6360 | 416.5523 | 1298.777 | |
Matlab | 1.8199 | 4.9914 | 9.1465 | |
R | 22.921 | 108.305 | 363.53 | |
JAVA | 0.4850 | 2.0210 | 5.5980 | |
Scala | 0.545 | 2.289 | 6.202 | |
Fortran | gfortran | 0.840 | 3.800 | 10.945 |
gfortran -O3 | 0.668 | 3.068 | 8.881 | |
ifort | 0.5360 | 2.4680 | 7.1520 | |
ifort -O3 | 0.5360 | 2.4640 | 7.1520 | |
C | gcc | 0.500 | 2.4200 | 7.7000 |
gcc -Ofast | 0.2100 | 1.0400 | 3.1800 | |
gcc -fPIC -Ofast -O3 -xc -shared | 1.1410 | 5.5953 | 17.3381 | |
icc | 0.4500 | 2.2300 | 6.7900 | |
icc -Ofast | 0.3200 | 1.6000 | 4.8700 |
Table 5.1: Elapsed times (in seconds) obtained by numerically solving the Posson equation using a Jacobi iterative solver with loops.
Language | Option | n=100 | n=150 | n=200 |
---|---|---|---|---|
Python | 2.3209 | 10.7638 | 41.2477 | |
Python + Numba | 3.5021 | 12.5186 | 36.1285 | |
Julia | optimized_vectorized | 2.3787 | 14.0944 | 42.1255 |
IDL | 1.9159 | 10.1320 | 32.2211 | |
Matlab | 3.5102 | 6.4710 | 16.4999 | |
Fortran | gfortran | 0.876 | 3.948 | 11.329 |
gfortran -O3 | 0.3560 | 1.7880 | 5.0880 | |
ifort | 0.3000 | 1.5440 | 4.4400 | |
ifort -O3 | 0.2840 | 1.5680 | 4.4520 |
Table 5.2: Elapsed times (in seconds) obtained by numerically solving the Posson equation using a Jacobi iterative solver with vectorization.
Problem 6: Belief Propagation
The Belief Propagation can be applied to fields such as speech recognition, computer vision, image processing, medical diagnostics, parity check codes, etc. Its calculations involve a repeated sequence of matrix multiplications, followed by normalization. The Matlab, C and Julia codes are shown in the Justin Domke's weblog (Domke 2012). We report the computing times for various values of the number of iterations (N) when the matrix dimension is 5000x5000.
Language | Option | n=250 | n=500 | n=1000 |
---|---|---|---|---|
Python | 4.8186 | 7.3240 | 13.9176 | |
Julia | 3.957 | 7.684 | 14.855 | |
IDL | 18.3229 | 35.8977 | 71.0820 | |
Matlab | 2.6299 | 4.0708 | 6.8691 | |
R | 25.463 | 46.985 | 92.654 | |
JAVA | 321.403 | 642.395 | 1284.106 | |
Fortran | gfortran | 22.5574 | 39.9224 | 89.9696 |
gfortran -O3 | 5.1603 | 9.5885 | 18.7051 | |
ifort | 4.6082 | 8.8605 | 17.3810 | |
ifort -O3 | 4.6322 | 8.7325 | 17.4130 | |
C | gcc | 2.6400 | 5.2800 | 10.5700 |
gcc -Ofast | 2.3500 | 4.7200 | 9.4400 | |
icc | 1.4500 | 2.9000 | 5.8000 | |
icc -Ofast | 1.4400 | 2.9000 | 5.8100 |
Table 6.1: Elapsed times (in seconds) obtained by doing the Belief Propagation computations.
Problem 7: Metropolis-Hastings Algorithm
The Metropolis–Hastings (M–H) algorithm is a method for obtaining random samples from a probability distribution. We perform calculations for the implementation of a Metropolis-Hastings algorithm using a two dimensional distribution (Domke 2012). Results are shown when the number of iterations (N) varies.
Language | Option | n=5000 | n=10000 | n=15000 |
---|---|---|---|---|
Python | 0.02642 | 0.0637 | 0.0937 | |
Julia | 0.0002 | 0.0004 | 0.0006 | |
IDL | 0.0058 | 0.0219 | 0.0291 | |
Matlab | 0.0164 | 0.0194 | 0.0276 | |
R | 0.105 | 0.166 | 0.24 | |
JAVA | 0.006 | 0.007 | 0.009 | |
Scala | 0.009 | 0.012 | 0.014 | |
Fortran | gfortran | 0 | 0.0040 | 0.0040 |
gfortran -O3 | 0 | 0 | 0 | |
ifort | 0 | 0 | 0 | |
ifort -O3 | 0 | 0.0040 | 0 | |
C | gcc | 0 | 0 | 0 |
gcc -Ofast | 0 | 0 | 0 | |
icc | 0 | 0 | 0 | |
icc -Ofast | 0 | 0 | 0 |
Table 7.1: Elapsed times (in seconds) obtained by doing the Metropolis algorithm computations.
Problem 8: Manipulation of netCDF Files
We have a set of daily NetCDF files (7305) covering a period of 20 years (1990-2009). The files for a given year are in a sub-directory labeled YYYY (for instance 1990, 1991, 1992, etc.). We want to write a script that opens each file, reads a three-dimensional variable (longitude/latitude/level), and manipulates it. A pseudo code for the script reads:
Loop over the years
Obtain the list of NetCDF files
Loop over the files
Read the variable (longitude/latitude/level)
Compute the zonal mean average (new array of latitude/level)
Extract the column array at latitude 86 degree South
Append the column array to a "master" array (or matrix)
The goal here is to be able to do a generate the data to do a contour plot that looks like (obtained with Python):
This is the kind of problems that a typical user we support faces: a collection of thousands of files that needs to be manipulated to extract the desired information. Having tools that allow us to quickly read data from files (in formats such as NetCDF, HDF4, HDF5, grib) is critical for the work we do.
Note that unlike in Problem 4 of the previous report (where the daily files are in directories associated with months), the daily files to be read in in this case are stored in directories associated with the years. The access to the files is easier in this current problem and we expect the timing numbers to be reduced.
We report in Table 8.1 the elapsed times it took to solve Problem 8 with the various languages.
Language | Elapsed Time (s) |
---|---|
Python | 558.4496 |
Julia | 580.5683 |
IDL | 504.5634 |
Matlab | 646.2261 |
Table 8.1: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files on a single processor.
All the above runs were conducted on a node that has 28 cores. Basically, only one core was used. We want to take advantage of all the available cores by spreading the reading of the files and making sure that the data of interest are gathered in the proper order. We use the multi-processing capabilities of the various languages to slightly modify the scripts. For each year, the daily files are read in by different threads (cores).The results are shown in Table 8.2.
Language | numThreads=2 | numThreads=4 | numThreads=8 | numThreads=16 |
---|---|---|---|---|
Python | 352.7964 | 238.1065 | 170.9945 | 105.3949 |
Table 8.2: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files using multiple threading.
Problem 9: Function Evaluations
We create an array x of length n and loop several times to perform the six operations:
y = sin(x)
x = asin(y)
y = cos(x)
x = acos(y)
y = tan(x)
x = atan(y)
Language | n=80000 | n=90000 | n=100000 |
---|---|---|---|
Python | 52.1014 | 58.4591 | 64.8276 |
Julia | 55.5550 | 62.3450 | 69.2350 |
IDL | 37.4798 | 42.0187 | 34.8829 |
Matlab | 5.1866 | 5.6523 | 4.6116 |
R | 89.500 | 101.439 | 112.269 |
Problem 10: Simple FFT
We create a nxn random complex matrix M and compute the following:
r = fft(M)
r = abs(r)
Language | n=10000 | n=15000 | n=20000 |
---|---|---|---|
Python | 10.5087 | 25.5764 | 45.1959 |
Julia | 3.916 | 11.489 | 20.632 |
IDL | 16.6154 | 36.5711 | 73.3394 |
Matlab | 2.6606 | 6.0293 | 10.7011 |
R | 60.722 | 157.626 | 269.651 |
Problem 11: Square Root of a Matrix
We consider an nxn matrix A with 6s on the diagonal and 1s everywhele else. We are lloking for the matrix B such that BxB = A. We record the time for determining B.
Language | Option | n=1000 | n=2000 | n=4000 |
---|---|---|---|---|
Python | SciPy sqrtm | 2.2227 | 5.2814 | 45.7643 |
Julia | sqrtm | 0.4129 | 2.511 | 19.111 |
Matlab | sqrtm | 0.9683 | 1.3916 | 2.3767 |
R | 1.057 | 3.602 | 19.122 |
Problem 12: Look and Say Sequence
We write codes to determine the look and say number of order n. Instead of starting with a single digit, we begin with 1223334444.
This test case highlights how languages manipulate strings of arbitrary length.
Language | Options | n=40 | n=45 | n=48 |
---|---|---|---|---|
Python | 2.2921 | 37.4429 | 224.4362 | |
Julia | 2.769 | 44.333 | 345.069 | |
IDL | 19.9563 | 296.4768 | 1570.4234 | |
Matlab | 412.5993 | 4501.6751 | ||
R | 0.509 | 1.678 | 3.611 | |
Java | 0.0487 | 0.0947 | 0.1582 | |
Scala | 0.0390 | 0.1020 | 0.1720 | |
Fortran | gfortran | 0.0160 | 0.0200 | 0.0200 |
gfortran -O3 | 0.0200 | 0.0240 | 0.0240 | |
ifort | 0.0120 | 0.0160 | 0.0120 | |
ifort -O3 | 0.0160 | 0.0200 | 0.0080 | |
C | gcc | 0.0800 | 0.2600 | 0.5300 |
gcc -Ofast | 0.0400 | 0.2500 | 0.5000 | |
icc | 0.0700 | 0.2600 | 0.4800 | |
icc -Ofast | 0.0700 | 0.2100 | 0.4600 |
References
- Justin Domke, Julia, Matlab and C, September 17, 2012.
- Michael Hirsch, Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL, June 2016.
- Murli M. Gupta, A fourth Order poisson solver, Journal of Computational Physics, 55(1):166-172, 1984.
- Jean Francois Puget, A Speed Comparison Of C, Julia, Python, Numba, and Cython on LU Factorization, January 2016.
- Alex Rogozhnikov, Log-likelihood benchmark, September 2015.
- Sebastian Raschka, Numeric matrix manipulation - The cheat sheet for MATLAB, Python Nympy, R and Julia, June 2014.
- Yousef Saad, Iterative Methods for Sparse Linear Systems (2 ed.), SIAM, ISBN 0898715342, 200366
Source Files
All the source files for the problems presented here are in the attached file: sourceFiles2018.tar.gz
If you have a comment/suggestion/question, contact Jules Kouatchou (Jules.Kouatchou@nasa.gov)gcc