Pattern Recognition by Neural Network Ensemble

Background

Neural Networks are machine learning "black box" programs that first undergo a training phase by feeding a set of inputs with known outcome, and then back-propagating the known results. After many iterations of training, the NN is able to detect subtle patterns in large data sets and make predictions based on what it has learned by past observations. Ensemble techniques couple the output of a collection of multiple NNs together to form one collective decision. What we hope to accomplish is to create an ensemble of nets that are each trained individually on different sets of historical stock market data, and then coupled to create a collective decision regarding buying, selling, or holding shares of stock.

References

An introduction to neural networks
Neural Network FAQ
An ensemble of neural networks for weather forecasting
Programming Collective Intelligence (with NN examples)
NNs + Stock Market + good references

Project

The vision for this project is to construct of ensemble of Multilayer Perceptron (MLP) Neural Networks. Each member of the ensemble will take unique subset of inputs from a larger set of total inputs, with overlap. We will target the stock market for a number of reasons:
  1. There is unlimited access to historical data on the web.
  2. Inputs that effect the stock market are almost limitless- some obvious, like previous day's closing price, or volume of shares traded in the previous day, some not-so-obvious, perhaps the daily weather forecast on Wall Street coupled with other factors slightly influence brokers' selling habits.
  3. There is real motivation to develop a system that makes money in the stock market.

The first thing to do, after getting up to speed on MLP NNs, is to create a list of inputs. Anything you think could possibly have an effect, great or small, should be included. Be creative, and don't worry if you pick an input that later proves to not have any effect on the price of the stock- theoretically, through training, that input's coefficient will be reduced to a low level and it will not harm the outcome. Our initial ensemble will consist of 8 nets. Try to come up with 25 different inputs (or do you have a good argument for why this number should be higher or lower?) We will divide the inputs into constants, that every net will be trained with, and then some variables that only a few nets will use. Ultimately each net will receive a unique set of inputs. Keep in mind that for training purposes, we will need to collect historical data on all of our inputs. So anything that is tracked via a website or available via electronic archives is fair game.

We will start off focusing on the Dow Jones Industrial Average, and attempt to apply our system to other stocks in the future. The outputs, to start off with, will be a simple BUY, SELL, HOLD decision. So a graphical representation of one of our 8 nets would look something like this:
>

Programming

Before doing any programming, try to find neural network software available on the internet or from textbooks that can be adapted for this project.

If programming is necessary, the customer would prefer to use Python as the language is rich in text processing tools, it is easily extensible, and supports object-oriented, procedural, and functional styles of programming. There are also some good code examples for Neural Networks available on the web. It's also easy to integrate libraries written in C/C++, so if you find code written in either of these, we can use the Python C API to make use of it.

Once inputs have been decided upon, we will need to code a way to automate the collection of the data- either through screen-scraping or Database queries, if available. Hopefully, the data will be easily collected and parsed and we should be able to automate training for perhaps the previous year or more. Each training session should include all the inputs as if it were the beginning of the trading day. The feedback output used for training will be based on the actual closing price for that day. If a particular piece of data does not change daily, as in the Federal Reserve Rate, just use the current rate for that day. The decision to BUY, SELL, or HOLD will be based on the direction (positive or negative) the stock went that day, and the percentage of change. A slight degree of change either way would result in a HOLD. The definition of "slight" will be determined as we go.