Other Issues
- How should we set Alpha, the Learning Rate Parameter?
Use a tuning set or cross-validation to train using several candidate
values for alpha, and then select the value that gives the lowest error.
- How should we estimate the Error?
Use cross-validation (or some other evaluation method) multiple
times with different random initial weights. Report the average
error rate.
- How many Hidden Layers and How many Hidden Units per Layer
should there be?
Usually just one hidden layer is used (i.e., a 2-layer network).
How many units should it contain? Too few => can't learn.
Too many leads to poor generalization. Determine this experimentally using
a tuning set or cross-validation to select number that minimizes
error.
- How many examples should there be in the Training Set?
Clearly, the larger the training set the better the generalization,
but the longer the training time required. But to obtain 1 - e
correct classification on the testing set, training set should be
of size approximately n/e, where n is the number of weights in the
network and e is a fraction between 0 and 1. For example, if e=.1
and n=80, then a training set of size 800 that is trained until
95% correct classification is achieved on the training set, should
produce 90% correct classification on the testing set.