Please email solutions to data-science-challenge (AT) tworoads (dot) co (dot) in with a short-name like “mithrandir”.
A majority of data science applications also emphasize speed of computation along with accuracy. There is anecdotal evidence of the fact that non linear decision tree methods have performed well in a variety of data science applications. The challenge of using it in the domain of HFT will be, among others, implementing it efficiently. On most events, there are some “knee-jerk” reactions where the response function is largely simple and computing the short term effect of the event is easy. Then there are more complex responses. To model them we need more sophisticated perhaps non-linear models that take more time. Using a sophisticated model for an elementary prediction would leave it vulnerable to be too slow at the task. This problem is very prevalent in the domain of finance. In finance, one encounters relationships that hold in even very small durations, like in ten seconds after an event has occurred. Then there are relationships that seem to not hold consistently over small durations but show up more often when one looks at longer periods like months and years.Input:
You can download a collection of data files each of which is a data set of indicator values which have been snapped at regular intervals. The structure of the data file is further explained in README.txt in the instruction files. We have written a wrapper file, process_data.cpp, that reads the data and calls the function OnInputChange on the TertiaryRandomForest class. The two arguments of the function are the index of the input variable that has changed, and the new value. For instance if indicator 5 has changed and the new value of the indicator is -1, process_data.cpp will call OnInputChange ( 5, -1 ) on the TertiaryRandomForest.
To measure correctness, at every ‘samplingrate’ number of function calls, the predicted value is printed. We will try to compare our benchmark solution to yours, and as long as every prediction differs by not more than 1%, we will consider the values to be correct. We allow a margin of error to account for any floating point errors in computation as well as allow any optimizations that might be possible with approximate computations. In this domain a very small difference in predicted price should not affect the outcome. If that margin of error allows one to reduce latency the benefit is often more than the cost.
Q: What do each of the files mean in the instruction files ?
A: Please look at the README.txt in the instruction files.