Learning Obstacle Avoidance by Example
To help us introduce additional forms
of learning in artificial neural networks, we are going to turn to a
different robot task: obstacle avoidance. Few moments are as
embarrassing as having your robot run into something during the
middle of a demonstration, so obstacle avoidance is usually near the
top of the list of "things to get right". There are many
approaches to obstacle avoidance depending on the sensors your robot
has, the specifics of its drive train (e.g. walking versus wheels)
and even the size of the robot. But our goal in this section is to
use a neural network to learn how to avoid obstacles.
Our robot is going to need some new
sensors so that it can actually "see" the obstacles it is
to avoid. A good place to start is to add some infrared (IR) and
sonar range sensors. The picture below shows our setup:

The robot now has three IR sensors
underneath the lower platform (circled in red) and four sonar sensors
around mid height (circled in yellow). The particular IR sensors used
here (Sharp GP2D12) have a maximum range of 31
inches while the sonar sensors (Ping) have a range of 133 inches
(about 11 feet). It is often a good idea to use at least two
different types of sensors when trying to measure something about the
real world. The reason is that different sensors will have different
"blind spots" as well as different ranges and resolution.
For example, IR sensors tend to have a narrower beam than sonar which
gives them better resolution but it also means that the beam can pass
right through a small gap in an obstacle. Sonar is better at
detecting objects with holes but it can fail to return an echo from a
flat surface such as a wall when approaching it at a shallow enough
angle. It can also fail to return an echo from softer surfaces such
as a pant leg on a person. On the other hand, the longer range of
sonar gives the robot a better chance of reacting to an obstacle
before it is too late. Using both types of sensors gives us the best
of both worlds.
The IR and sonar readings are used as
the values of the input units in our artificial neural network. The
input units are then fully cross connected to our two output units
that control the robot's motor signals as usual. The resulting
network looks like the following. (Only a few of the connections are
labeled for clarity.)

As you can imagine, it would be a bit
of a challenge to directly program this network's connections. In
other words, it would be difficult to write down a set of if-then
conditions mapping combinations of sonar and infrared readings into
motor outputs. It is this kind of situation where neural networks can
really show their strength. But the question now becomes: what is the
best way for the network to learn a good set of connections? Standard
supervised learning is not very practical as we'd have to intervene
every time the robot ran into an object, then show it the correct
maneuver for avoiding the collision. Learning by trial and
error—i.e., unsupervised reinforcement learning—could be
employed and will we return to that possibility later. However, for
this occasion, we are going to try a form of guided learning
instead.
Guided Learning
The idea behind guided learning is
simple: we ask someone already expert in the skill we are trying to
learn to control our movements for us while we simply relax and
experience the sensations. One then hopes that your brain can
associate your sensations with the movements so that you can better
execute the actions yourself.
In the case of our robot, we will
control its motion using a joystick while steering around obstacles.
In the meantime, our robot will record the corresponding sensor
readings and motor control signals and use them to train its neural
network. The hope is that the resulting network connections will then
allow our robot to avoid future obstacles on its own. Note that our
goal is not to learn a specific path around a specific arrangement of
obstacle; rather, we want our robot to learn a general skill for
avoiding obstacles regardless of their position.
How should we use the recorded samples
to train the network? Fortunately, we have already seen the answer
since guided learning is really just a form of supervised exemplar
learning in disguise. The difference is that we collect all the
examples first, and then we train the network all at once, rather
than training it one sample at a time. The process is often referred
to as batch or offline learning. Since the number of
recorded input-output samples could be very large, one might wonder
about memory storage issues. However, with today's computers, the
storage requirements for several minutes or even hours of guided
training are not a problem. For example, if we sample our sensor
readings and motor control signals five times per second and collect
the data for five minutes, we will have to store an array of 7 x 5 x
60 x 5 = 10500 numbers. Assuming 1 byte per number, that amounts to
only 10k bytes which is almost insignificant by today's standards. As
it turns out, we will only need about 60 seconds worth of data anyway
so our storage requirements are very small.
Robot Demonstration
The video below shows the robot under
joystick control by the human operator. As you can see, the operator
is careful to guide the robot close to obstacles without running into
them which is the behavior we want our robot to learn to do on its
own. Note also that we collect our data using a simplified obstacle
placement—just one obstacle lies near the robot at a time. We
do the same thing when we want to teach someone a new skill: isolate
different aspects of the skill so that the learner can focus on one
key element at a time. As it turns out, even though our robot is
trained in a simplified environment, we will see that it can then
apply what it has learned to avoid complicated obstacle arrangements.
The readings from the IR and sonar
sensors as well as the two wheel speeds are sampled five times per
second during recording. We only need about 60 seconds of such
training to collect enough data. Next we use the recorded input-output
samples to train the neural network controller using the most
excellent AForget.NET neural network package which can be found at
http://www.aforgenet.com. The process uses the same delta rule
algorithm employed earlier (see details below), only this time we use
offline batch learning to modify the connections. In this case it took
only 10 passes through the data, also known as training
epochs, before the network
connections converged to their final values. And these 10 epochs took
a total of only 3 milliseconds of computing time on a desktop
PC.
Once learning is complete, we place our
robot under the control of the network and set it loose among a
collection of newly placed obstacles. The following video shows the
entire sequence from recording, to training, to autonomous obstacle
avoidance:
As you can see, the robot does
remarkably well at avoiding obstacles using its neural network
controller. This underscores the power of using neural networks to
learn complicated input-output relations rather than trying to
program all the possible if-then scenarios ourselves. All we had to
do was guide the robot around a few obstacles for 60 seconds, train
the network with the sampled data, then let it roam on its own. It is
also worth bearing in mind that while seven sensors might seem like a
lot of input, it also means that at any given moment of time, all the
robot "sees" is seven numbers representing seven distance
measurements. So while watch the video and can see with our eyes that
the robot avoided "the wall" or went around "the
ball", the robot will be lucky to get one or two numbers
bouncing off these objects and on that sparse information has to make
its decision to turn or not.
Viewing the Network Connections
After the network was trained with the
recorded data, what were the resulting connection strengths between
input and output units? For the demonstration shown above, the
resulting connection matrix and biases had the following values,
shown to two decimal places:


The first 2x7 matrix represents the
connections between the seven input units and two output units while
the second 2x1 matrix holds the two biases on the output units. The
connections to the left motor are on the top row and the right motor
connections on the bottom row, with order of inputs as follows: left
IR, middle IR, right IR, left sonar, left-front sonar, right-front
sonar and right sonar.
Another way to visualize the
connections is to use the most excellent Matrix2PNG program from the
Bioinformatics department at UBC. The program represents connection
strengths with different colors as shown below:

In this image, green represents
positive values, red negative and black represents numbers near zero.
(Note that some of the darker greens and reds look almost black in
the image.)
Let's look first at the six circles on the left of the image
representing the connections between the three IR sensors and the two
motors. We see that a reading on the left IR sensor activates the
left motor and inhibits the right motor and vice versa for the right
IR sensor, just as we would hope if we want the robot to turn away
from obstacles. The middle IR sensor inhibits both motors when
activated which means "slow down" if an obstacle is straight ahead.
Looking now at the connections for the sonar sensors, we see a similar pattern, though it is easier to see for the two front sonar sensors than for the two laterally pointing sensors. Both the left sonar sensors activate the left motor and inhibit the right motor or activate it less. The opposite pattern holds for the two right sonar sensors.
Finally, the last two connections are
the biases and the positive values give us the "all clear"
behavior of our robot—when all sensors are not detecting
obstacles, the bias units drive both motors forward. Note that these
values were not programmed in—they arose naturally through
learning since our guided training included some driving straight
ahead with no nearby obstacles.
Overall, the neural network has learned
a set of appropriate connections for the task at hand. What's more,
the network does not care about the particular arrangement of
obstacles in its path—as the video above shows, even cul de
sacs are handled smoothly as the robot turns away from the nearest
wall or obstacle at any given moment.