The beginning of a thousand-mile journey begins with a single step. (more)
Tags: julia
This is an interesting video where Fernando Pérez talks about uses of Python in science. He covers many topics like Matplotlib, NumPy, SciPy, Cython, IPython, wrappers for R, etc. I highly recommend watching it!
Tags: ipython, matplotlib, numpy, Python, scipy
One of my friends is reading this book and encountered a problem when he tried to separate the definition from declaration of a template class.
To illustrate this assume we have this header file, foo.h, with declaration of template class Foo:
template<typename T> class Foo { public: Foo(); void someMethod(T x); private: T x; };
, with the following definition of member functions in separate file foo.cpp:
template<typename T> Foo<T>::Foo() { ... } template<typename T> void Foo<T>::someMethod(T x) { ... }
Now, if you try to use this template class and instantiate T to be actually a double or int, then the compiler will throw an error, that it doesn't recognize Foo<double>::Foo() or Foo<int>::Foo(). Problem is when processing Foo.cpp compiler would see template definition and when processing our new .cpp file it would see Foo<int>.
There are more solutions to this problem. One is to put this line at the end of header file Foo.h:
#include "Foo.cpp"
And the other one is to put explicit instantiations into the Foo.cpp file:
template class Foo<int>; template class Foo<double>;
None of these solutions is perfect but I prefer the first one just because you are not limited just to those instantiations you put into .cpp file in the second solution.
Tags: c++, declaration, defintion, template class
If you are having errors when saving text into your MySQL database like:
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 3832: ordinal not in range(256),
you should make sure your database and tables are set to UTF-8 and when connecting to the database you should specify the charset as well:
connect(host, user, password, dbname, charset='utf8')
A friend of mine was submitting some information on one of Bloomberg's websites and sent me this screenshot:
It's 2012 and not pre 1991..
Tags: bloomberg
I read the details about non-farm payrolls and other employment statistics number on BLS website and I found an interesting number. This is the quote from the site:
For example, the confidence interval for the monthly change in total nonfarm employment from the establishment survey is on the order of plus or minus 100,000. Suppose the estimate of nonfarm employment increases by 50,000 from one month to the next. The 90-percent confidence interval on the monthly change would range from -50,000 to +150,000 (50,000 +/- 100,000).
Which means that for the latest number which was 114,00, the true value is in interval 14,000 to 214,000 with 90% probability. Is it reliable? But even though the markets reacts so heavily on even a small deviation from expectations..
Tags: BLS, nonfarm, nonfarm payrolls
Some of my colleagues didn't know that you can use mathematical constants that are part of "cmath". Here is the small snippet that shows how to use PI from cmath library. Be aware that you need to write "#define _USE_MATH_DEFINES" before you include cmath.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #define _USE_MATH_DEFINES #include #include using namespace std; int main() { cout << M_PI << endl; return 0; } |
The full list of constants is:
Symbol | Expression | Value |
---|---|---|
M_E | e | 2.71828182845904523536 |
M_LOG2E | log2(e) | 1.44269504088896340736 |
M_LOG10E | log10(e) | 0.434294481903251827651 |
M_LN2 | ln(2) | 0.693147180559945309417 |
M_LN10 | ln(10) | 2.30258509299404568402 |
M_PI | pi | 3.14159265358979323846 |
M_PI_2 | pi/2 | 1.57079632679489661923 |
M_PI_4 | pi/4 | 0.785398163397448309616 |
M_1_PI | 1/pi | 0.318309886183790671538 |
M_2_PI | 2/pi | 0.636619772367581343076 |
M_2_SQRTPI | 2/sqrt(pi) | 1.12837916709551257390 |
M_SQRT2 | sqrt(2) | 1.41421356237309504880 |
M_SQRT1_2 | 1/sqrt(2) | 0.707106781186547524401 |
Tags: c++, math constants
In my previous post about rewriting my code to run in parallel part one I mentioned that we will make a small change to adfTest() function as well. In this post we will perform this small but performance-dramatic change.
When you take a closer look at the source code of this particular function from fUnitRoots package you will surely spot this part of code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | if (lags > 1) { y.diff.lag = z[,2:lags] if (type == "nc"){ res = lm(y.diff ~ y.lag.1 - 1 + y.diff.lag) } if (type == "c"){ res = lm(y.diff ~ y.lag.1 + 1 + y.diff.lag) } if (type == "ct") { res = lm(y.diff ~ y.lag.1 + 1 + tt + y.diff.lag) } } else { if (type == "nc") { res = lm(y.diff ~ y.lag.1 - 1) } if (type == "c"){ res = lm(y.diff ~ y.lag.1 + 1) } if (type == "ct") { res = lm(y.diff ~ y.lag.1 + 1 + tt) } } |
As we have discussed earlier the lm() is inferior to fastLm() function in terms of performance. So why not use this function instead? The code snippet after the change looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | if (lags > 1) { y.diff.lag = z[,2:lags] if (type == "nc"){ res = fastLm(cbind(y.lag.1,y.diff.lag), y.diff) } if (type == "c"){ res = fastLm(cbind(1, y.lag.1, y.diff.lag), y.diff) } if (type == "ct") { res = fastLm(cbind(1, y.lag.1, tt, y.diff.lag), y.diff) } } else { if (type == "nc") { res = fastLm(y.lag.1, y.diff) } if (type == "c"){ res = fastLm(cbind(1, y.lag.1), y.diff) } if (type == "ct") { res = fastLm(cbind(1, y.lag.1, tt), y.diff) } } |
Important note: do not forget to add this line before you use fastLm() function:
1 | require(RcppArmadillo) |
Using this modified function, call it adfTest2(), in the parallel test we performed previously, the code finishes in 386.57 seconds. In comparison, the fastest code finished in 820.143 seconds before! That's an amazing improvement.
To sum it up, we started with our serial code using old adfTest() function what took 1454.935 seconds. Using parallel code and improved adfTest2() it took 386.57 seconds. We saved 1068.365 seconds, i.e. we reduced the computational time by roughly 73.4%!
Tags: adfTest, R, Rcpp Armadillo
Following my previous post about rewriting my code to run in parallel I have modified the code for downloading the S&P 500 prices from Yahoo to run i parallel as well. To be honest, I quite enjoy writing the code to run in parallel. It's fun for various reasons, but some theoretical background is highly recommended (e.g. here, here or here).
The good news is that it takes very short time (148 seconds) to download the data from Yahoo, but on the other hand the merge function still takes way too much time to complete. To be more specific, on average, 80% of time is spent on merge and 20% on downloading the actual data from Yahoo. It's faster than the original code but I don't like the idea of spending so much time on merge function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | rm(list = ls(all = TRUE)) require(doMC) require(quantmod) require(tseries) require(timeDate) registerDoMC(20) symbols = read.csv("/home/robo/Dropbox/work/FX/PairTrading/sp500.csv", header = F, stringsAsFactors = F) nrStocks = length(symbols[,1]) dateStart = "2007-01-01" z = foreach(i = 1:nrStocks, .combine = merge.zoo) %dopar% { cat("Downloading ", i, " out of ", nrStocks , "\n") x = get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T) colnames(x) = symbols[i,1] x } z = as.xts(z) registerDoMC() |
One intuitive solution is to preallocate the memory and save the results there. However, I could't find a way how to modify a variable that is out of foreach scope when run in parallel. I understand that we could corrupt the data, but locking the modify/update would solve this issue (updating doesn't take much time). I tried to google/yahoo/duckduckgo/bing the solution but without luck. Do you know the answer?
This solution has jet another drawback.. missing data.
Then I saw one line of code where I change my data from type "zoo" into "xts". Xts is written in C, whereas zoo is written in pure R (I read some articles about intentions to merge this packages but who knows when it will be). So why not to change the variable into xts right after the download? Simple..
And the result? On average, 43 seconds!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | rm(list = ls(all = TRUE)) require(doMC) require(quantmod) require(tseries) require(timeDate) registerDoMC(20) symbols = read.csv("/home/robo/Dropbox/work/FX/PairTrading/sp500.csv", header = F, stringsAsFactors = F) nrStocks = length(symbols[,1]) dateStart = "2007-01-01" z = foreach(i = 1:nrStocks, .combine = merge.xts) %dopar% { cat("Downloading ", i, " out of ", nrStocks , "\n") x = get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T) colnames(x) = symbols[i,1] x = as.xts(x) } registerDoMC() |