Other Issues

How should we set Alpha, the Learning Rate Parameter?
Use a tuning set or cross-validation to train using several candidate values for alpha, and then select the value that gives the lowest error.
How should we estimate the Error?
Use cross-validation (or some other evaluation method) multiple times with different random initial weights. Report the average error rate.
How many Hidden Layers and How many Hidden Units per Layer should there be?
Usually just one hidden layer is used (i.e., a 2-layer network). How many units should it contain? Too few => can't learn. Too many leads to poor generalization. Determine this experimentally using a tuning set or cross-validation to select number that minimizes error.
How many examples should there be in the Training Set?
Clearly, the larger the training set the better the generalization, but the longer the training time required. But to obtain 1 - e correct classification on the testing set, training set should be of size approximately n/e, where n is the number of weights in the network and e is a fraction between 0 and 1. For example, if e=.1 and n=80, then a training set of size 800 that is trained until 95% correct classification is achieved on the training set, should produce 90% correct classification on the testing set.