• ALVINN project

  • ALVINN paper

  • ALVINN slides

  • ALVINN is a robot car. It has a camera, and is programmed to follow the road by staying in the middle of its lane.

  • The control software is a 2-layer feedforward network using backpropagation for learning.

  • Raw input is a 480 x 512 pixel image 15 times per second. Output is one of 30 discrete steering positions from hard-left to hard-right.

  • The input color image is preprocessed to obtain a reduced resolution image containing 30 x 32 pixels, where each pixel is an integer from 0 to 255 (i.e., one byte per pixel) representing, approximately, the brightness of the pixel. These 960 pixels values define the input to the 960 input units.

  • There iscComplete connectivity from each input unit to each of four (4) hidden units in the single hidden layer.

  • There is complete connectivity from each hidden unit to each of 30 output units, where output unit 1 represents steering sharp-left, output unit 15 represents steering straight ahead, and output unit 30 represents steering sharp-right.

  • Teacher output for an example image is a set of 30 output values at the 30 output units, Gaussian distributed centered on the desired steering direction d. More specifically, given the desired direction d and a Gaussian distribution with variance 10, the desired output at output unit i is

    Oi = e[ -(i - d)2 / 10]

  • Given actual output values from the 30 output units, a least-squares best fit of a Gaussian with variance 10 is computed. The peak of this Gaussian corresponds to the output steering direction computed by the network. The difference between this output steering direction and the teacher steering direction is the error.

  • ALVINN continuously learned on the fly as the vehicle traveled, initially "observing" a human driver and later observing its own driving. The current steering position for each input image is saved as the teacher value for that image. One major problem with training using "real data" --- no "negative" examples are presented to the system assuming the human driver and later the neural network driver never veer off the road. Another major problem is that continuous training may cause the network to overfit the data in recent images at the expense of forgetting old images. So, for example, driving on a long straight road may cause the network to forget how to follow curvy roads.

    To solve both of these problems, ALVINN takes each input image and computes other views of the road by performing various perspective transformations (shift, rotate, and fill in missing pixels) so as to simulate what the vehicle would be seeing if its position and orientation on the road was not correct. For each of these synthesized views of the road, a "correct" steering direction is approximated. The real and the synthesized images are then used for training the network.

    To avoid overfitting using just the most recent images captured, ALVINN maintains a buffer pool of 200 images (both real and synthetic). When a new image is obtained, it replaces one of the images in the buffer pool so that the average steering direction of all 200 examples is straight ahead. In this way, the buffer pool always keeps some images in many different steering directions.

    Initially, a human driver controls the vehicle for about 5 minutes while the network learns weights starting from initial random weights. After that one epoch of training using the 200 examples in the buffer pool is performed approximately every 2 seconds.