Journey

The beginning of a thousand-mile journey begins with a single step. (more)

Julia

Interesting project.. Julia!

The benchmark table (for more info visit the homepage of the project):

julia

Tags:

Science And Python

This is an interesting video where Fernando Pérez talks about uses of Python in science. He covers many topics like Matplotlib, NumPy, SciPy, Cython, IPython, wrappers for R, etc. I highly recommend watching it!

 

Tags: , , , ,

One of my friends is reading this book and encountered a problem when he tried to separate the definition from declaration of a template class.

To illustrate this assume we have this header file, foo.h, with declaration of template class Foo:

template<typename T>
class Foo {
public:
  Foo();
  void someMethod(T x);
private:
  T x;
};

, with the following definition of member functions in separate file foo.cpp:

template<typename T>
Foo<T>::Foo()
{
  ...
}

template<typename T>
void Foo<T>::someMethod(T x)
{
  ...
}

Now, if you try to use this template class and instantiate T to be actually a double or int, then the compiler will throw an error, that it doesn't recognize Foo<double>::Foo() or Foo<int>::Foo(). Problem is when processing Foo.cpp compiler would see template definition and when processing our new .cpp file it would see Foo<int>.

There are more solutions to this problem. One is to put this line at the end of header file Foo.h:

#include "Foo.cpp"

And the other one is to put explicit instantiations into the Foo.cpp file:

template class Foo<int>;
template class Foo<double>;

None of these solutions is perfect but I prefer the first one just because you are not limited just to those instantiations you put into .cpp file in the second solution.

You can find more on this here, here and here.

Tags: , , ,

If you are having errors when saving text into your MySQL database like:

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 3832: ordinal not in range(256),

you should make sure your database and tables are set to UTF-8 and when connecting to the database you should specify the charset as well:

connect(host, user, password, dbname, charset='utf8')

 

 

Tags: , , ,

A friend of mine was submitting some information on one of Bloomberg's websites and sent me this screenshot:

Bloomberg

It's 2012 and not pre 1991..

Tags:

I read the details about non-farm payrolls and other employment statistics number on BLS website and I found an interesting number. This is the quote from the site:

 

 For example, the confidence interval for the monthly change in
total nonfarm employment from the establishment survey is on the order
of plus or minus 100,000. Suppose the estimate of nonfarm employment
increases by 50,000 from one month to the next. The 90-percent confidence
interval on the monthly change would range from -50,000 to +150,000 
(50,000 +/- 100,000).

 

Which means that for the latest number which was 114,00, the true value is in interval 14,000 to 214,000 with 90% probability. Is it reliable? But even though the markets reacts so heavily on even a small deviation from expectations..

Tags: , ,

Some of my colleagues didn't know that you can use mathematical constants that are part of "cmath". Here is the small snippet that shows how to use PI from cmath library. Be aware that you need to write "#define _USE_MATH_DEFINES" before you include cmath.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
#define _USE_MATH_DEFINES
 
#include 
#include 
 
using namespace std;
 
int main() {
 
  cout &lt;&lt; M_PI &lt;&lt; endl;
 
  return 0;
}

 

The full list of constants is:

 

Symbol Expression Value
M_E e 2.71828182845904523536
M_LOG2E log2(e) 1.44269504088896340736
M_LOG10E log10(e) 0.434294481903251827651
M_LN2 ln(2) 0.693147180559945309417
M_LN10 ln(10) 2.30258509299404568402
M_PI pi 3.14159265358979323846
M_PI_2 pi/2 1.57079632679489661923
M_PI_4 pi/4 0.785398163397448309616
M_1_PI 1/pi 0.318309886183790671538
M_2_PI 2/pi 0.636619772367581343076
M_2_SQRTPI 2/sqrt(pi) 1.12837916709551257390
M_SQRT2 sqrt(2) 1.41421356237309504880
M_SQRT1_2 1/sqrt(2) 0.707106781186547524401

Tags: ,

In my previous post about rewriting my code to run in parallel part one I mentioned that we will make a small change to adfTest() function as well. In this post we will perform this small but performance-dramatic change.

When you take a closer look at the source code of this particular function from fUnitRoots package you will surely spot this part of code:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
if (lags &gt; 1) {
        y.diff.lag = z[,2:lags]
        if (type == "nc"){
            res = lm(y.diff ~ y.lag.1 - 1 + y.diff.lag)
        }
        if (type == "c"){
            res = lm(y.diff ~ y.lag.1 + 1 +  y.diff.lag) }
        if (type == "ct") {
            res = lm(y.diff ~ y.lag.1 + 1 + tt + y.diff.lag) }
    } else {
        if (type == "nc") {
            res = lm(y.diff ~ y.lag.1 - 1)
        }
        if (type == "c"){
            res = lm(y.diff ~ y.lag.1 + 1) }
        if (type == "ct") {
            res = lm(y.diff ~ y.lag.1 + 1  + tt)  }
    }

 

As we have discussed earlier the lm() is inferior to fastLm() function in terms of performance. So why not use this function instead? The code snippet after the change looks like this:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
if (lags &gt; 1) {
        y.diff.lag = z[,2:lags]
        if (type == "nc"){
            res = fastLm(cbind(y.lag.1,y.diff.lag), y.diff)
        }
        if (type == "c"){
            res = fastLm(cbind(1, y.lag.1, y.diff.lag), y.diff) }
        if (type == "ct") {
            res = fastLm(cbind(1, y.lag.1, tt, y.diff.lag), y.diff) }
    } else {
        if (type == "nc") {
            res = fastLm(y.lag.1, y.diff)
        }
        if (type == "c"){
            res = fastLm(cbind(1, y.lag.1), y.diff) }
        if (type == "ct") {
            res = fastLm(cbind(1, y.lag.1, tt), y.diff) }
    }

 

Important note: do not forget to add this line before you use fastLm() function:

 

View Code RSPLUS
1
require(RcppArmadillo)

 

Using this modified function, call it adfTest2(), in the parallel test we performed previously, the code finishes in 386.57 seconds. In comparison, the fastest code finished in 820.143 seconds before! That's an amazing improvement.

To sum it up, we started with our serial code using old adfTest() function what took 1454.935 seconds. Using parallel code and improved adfTest2() it took 386.57 seconds. We saved 1068.365 seconds, i.e. we reduced the computational time by roughly 73.4%!

Huraaa

Huraaa!

Tags: , ,

Following my previous post about rewriting my code to run in parallel I have modified the code for downloading the S&P 500 prices from Yahoo to run i parallel as well. To be honest, I quite enjoy writing the code to run in parallel. It's fun for various reasons, but some theoretical background is highly recommended (e.g. here, here or here).

The good news is that it takes very short time (148 seconds) to download the data from Yahoo, but on the other hand the merge function still takes way too much time to complete. To be more specific, on average, 80% of time is spent on merge and 20% on downloading the actual data from Yahoo. It's faster than the original code but I don't like the idea of spending so much time on merge function.

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
rm(list = ls(all = TRUE))
require(doMC)
require(quantmod)
require(tseries)
require(timeDate)
 
registerDoMC(20)
 
symbols = read.csv("/home/robo/Dropbox/work/FX/PairTrading/sp500.csv", header = F, stringsAsFactors = F)
nrStocks = length(symbols[,1])
 
dateStart = "2007-01-01"
 
z = foreach(i = 1:nrStocks, .combine = merge.zoo) %dopar% {
    cat("Downloading ", i, " out of ", nrStocks , "\n")
    x = get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T)
    colnames(x) = symbols[i,1]
    x
}
 
z = as.xts(z)
 
registerDoMC()

 

One intuitive solution is to preallocate the memory and save the results there. However, I could't find a way how to modify a variable that is out of foreach scope when run in parallel. I understand that we could corrupt the data, but locking the modify/update would solve this issue (updating doesn't take much time). I tried to google/yahoo/duckduckgo/bing the solution but without luck. Do you know the answer?

This solution has jet another drawback.. missing data.

Then I saw one line of code where I change my data from type "zoo" into "xts". Xts is written in C, whereas zoo is written in pure R (I read some articles about intentions to merge this packages but who knows when it will be). So why not to change the variable into xts right after the download? Simple..

And the result? On average, 43 seconds!

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
rm(list = ls(all = TRUE))
require(doMC)
require(quantmod)
require(tseries)
require(timeDate)
 
registerDoMC(20)
 
symbols = read.csv("/home/robo/Dropbox/work/FX/PairTrading/sp500.csv", header = F, stringsAsFactors = F)
nrStocks = length(symbols[,1])
 
dateStart = "2007-01-01"
 
z = foreach(i = 1:nrStocks, .combine = merge.xts) %dopar% {
    cat("Downloading ", i, " out of ", nrStocks , "\n")
    x = get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T)
    colnames(x) = symbols[i,1]
    x = as.xts(x)
}
 
registerDoMC()

 

 

Tags: , ,

« Older entries