Why Clean Data is a Mandatory Pre-Analysis Step
Futures Magazine's September '99 issue featured a
cover story about the accuracy of the top end-of-day data providers for
investors. We are pleased to report that CSI was the undeniable champion in
terms of data accuracy and in other ways that might surprise you. The full text
of the Futures story, written by Sheldon Knight of K-Data Inc., is available as
a mailed reprint directly from CSI by completing the on-line visitor
information request below. Please see the comparative rankings of
US data firms shown below that was compiled from information supplied in the
Sheldon Knight study.
Although we are admittedly biased, we found the results of
Mr. Knight's analysis very interesting, even compelling: The largest market
data firms in the nation just didn't stack up next to CSI's stellar
performance. Not only did CSI dramatically outdistance all of the competition
with the least number of errors overall, but we did so with zero omissions.
According to Sheldon Knight, "The data management functions of [CSI's] Unfair
Advantage are by far the most flexible tested, and the database is one of the
most comprehensive." Great data and great software; what more can we say? We
would like to disclose a little more about the differences that make CSI the
best data source in the industry. Its all in the details.
On Data Accuracy
In the study, there were collectively 1,203 errors and
omissions noted from among the ten firms tested. The bottom line for CSI was
the committal of 27 errors and omissions in the 1,506-day test. Dividing the
remaining errors among the nine competing firms, we find that they had an
average of 131 errors each in the same time period, which demonstrates an error
rate of the average CSI competitor that is 385% higher than CSI's.
CSI's 18 errors in the soybean futures test were the least
of all vendors. The average error size was less than half that of the second
place firm, and an insignificant fraction of most of the other firms. In the
S&P 500 analysis, CSI's error rate of 9 tied for the lowest with one other
vendor. Data sources were varied and sometimes overlapping, but CSI's record of
minimal errors probably has much more to do with procedure, pride, commitment,
diligence, and customer participation than source. It is very rare for an error
to get past the many data scrubbers on the CSI staff.
On Data Presentation
This was briefly noted in the Sheldon Knight study, but it
deserves additional comment. Data presentation refers to the handling of
after-the-close settlements that can result in exchanges quoting settlement
prices that are outside the days trading range (above the high or below the
low). It is common (but not necessarily correct) for summary day-end data
vending firms to expand the high-low range to accommodate the assigned
settlement price, even though settlement prices do not necessarily represent
prices where actual trading took place. CSI delivers actual trading statistics
to customers and gives the option of presenting data 1) in actual form, based
on exchange statistics, 2) with highs and lows expanded to include the
settlement, or 3) with the settlement price modified so that it lies within the
actual highs and lows. According to the article, only CSI has recorded the
historical statistics on all markets so that they can be presented in any one
of these ways. It is clear that CSI's competitors have forever lost the ability
to present an unaltered historical record.
On Analytical Validity
The Futures study clearly demonstrates that technical
analysis requires accurate data. In the study, S&P 500 data from CSI, Omega
Research, and Bridge were used on the same simple breakout system with
strikingly different results. The profit scenario varied from 20% to much more
than 100% over the full period of study. This should offer substantial proof
that the derived effects of a flawed database can lead to a useless result and
a wasted effort because parameter settings determined from flawed data cannot
be expected to work with the same efficiency in the market on which they will
be applied. Unfair Advantage's software and database are designed so that every
user is equipped with exactly the same data set at all times, forcing any
common analytical tool that is derived from past information to produce
equivalent results on different machines.
Building a trading model based upon flawed past data is
certain to degrade system effectiveness into the future. This truth, learned
decades ago by CSI's founder Bob Pelletier, is the driving force behind CSI's
policies. Before CSI was incorporated in 1970, studies done by Pelletier, a
General Electric mathematician at that time, were inevitably tripped up by some
obscure error that dominated parameter settings and falsely influenced the
outcome of simulation exercises by forcing undeserved profits from the flawed
data. It may seem that a small error here or there would not be important, but
that was not the result in the work. Experiences like this made it abundantly
clear that errors must be forbidden if any fruitful benefit was to be derived
from hindsight testing.
Several of the data vendors included in the study are either
allied with or directly tied to very expensive analysis programs, but they are
not necessarily the required data sources. Although CSI is explicitly excluded
from the data download screens and menus of most of those programs,
discriminating users of the industry's most powerful software tools still come
to CSI for data. They know that it is pure folly to accept the suggestion that
an average data firm can deliver the accuracy needed to create an exceptional
trading system. Now that the importance of data accuracy has been revealed,
perhaps even more traders will come directly to CSI, whether or not their
software producer steers them in that direction. Software companies with whom
CSI data products are compatible include: Equis Int'l (MetaStock®), Omega
Research, Windows on Wall Street, ProfiTaker and many others.
Putting it All Together
It should be mentioned that the errors measured in the
Sheldon Knight study were discovered in hindsight, based upon each company's
one-time historical submission of their global data reserves. An even more
telling result might have emerged had the study been conducted on an ongoing
basis by observing each contributors performance one day at a time over an
extended period. With an ongoing study, the reader could have a better
understanding of each company's performance when it means the most: immediately
after each day's database update is posted. This way, a firm's timing of
delivery on all stock and world futures markets, diligence in avoiding
omissions, and ability to stay on top of information gathering in spite of
unpredictable obstacles could be studied.
Many factors contributed to CSI's impressive performance
reported in the Futures article, and most of them might be dismissed as
insignificant details. Back-up electrical power, multiple information sources,
a large experienced staff competent in applying checks and balances, and
rewards for diligent customers reporting questionable data items are a few of
the details CSI attends to each day. They seem to make the difference.
|