Learning in Artificial Neural Networks
In the previous article, we developed a light following robot
controlled by an artificial neural network where we manually set the
connection weights between inputs and outputs. Since the number of
connections was small and the behavior was simple, it was fairly easy
to guess a set of weights that would work. However, this will rarely
be the case with bigger networks and more complex behaviors. In those
situations, we will need a way for the robot to learn
the connection strengths between input and output units.
Here is a picture of our robot showing the two light sensors
mounted on the front. Note that the sonar sensors also visible in
this picture are not used in this experiment.
There are a number of learning algorithms that can be applied to
artificial neural networks depending on the architecture of the
network and whether or not we are teaching the network the
correct responses or it has to figure it out on its own. When there
is a teacher is involved, the process is called supervised
learning. Otherwise, it is called unsupervised learning.
In both cases, there are different types of feedback that the network
can use to modify the connections. The following table illustrates
the various learning situations our robot might encounter. In each
cell of the table, we have briefly described a situation that
illustrates the relevant learning mode.
Learning
Mode

Supervised

Unsupervised

Exemplar

Learning
a language with correct pronunciation provided by a teacher.

N/A

Reinforcement

A
child learns what it is allowed to touch using only "yesno"
feedback from the parent.

Touching
a hot stove element causes pain.

Guided

A
golf instructor moves your arm through an example of a good swing.

N/A

Statistical

The
living room is always called the "living room"
regardless of where we are standing and what we see.

Over
time we observe that most houses have a living room, most apples
are red, etc.

Observational

You
observe your tennis instructor perform a correct swing and try to
emulate their behavior.

When
other people cut in line, people treat them badly.

With supervised exemplar learning, the correct pattern of activity
across the output units is known and can be compared to the current
output pattern. In reinforcement learning, both supervised and
unsupervised, we only know if our current output pattern is correct
or incorrect without knowing any more details. And with guided
learning, the output units are essentially "remote controlled"
through the correct sequence of values for a given set of input
values. Statistical learning does not involve outputs per se,
but simply a collection of input patterns whose general trend or
summary properties we want to determine. Observational learning
involves watching another individual—usually of the same
species—perform some action, then extracting the information
required from those observations for performing the action yourself.
We will have occasion to examine all these learning modes in
greater depth in future articles, but for our light following robot,
we are going to begin with the easiest learning mode highlighted in
gray which is supervised exemplar learning.
It's All About The Connections
In all cases of learning, we need a method for changing the
connection weights over time so that performance improves. The
particular method of modifying the weights is called the update
rule for that network. In general, we are looking for a process
that will strengthen the connection between units that are active at
the same time while weakening the connection when they are activated
out of sync or in opposite directions; i.e., one positive, the other
negative.
The choice of update rule is usually determined by the
architecture of the network and the type of learning involved. We
will introduce different update rules in this series of articles as they become applicable including Hebbian learning,
the delta rule, back
propagation, selforganizing maps, and genetic
algorithms. Since learning takes place in the connections between
units, artificial neural networks are also referred to as
connectionist networks.
Supervised Exemplar Learning
When teaching a child to speak, we often engage a kind of trial
and error learning that is at the heart of many artificial neural
networks. In the case of language, the child will often make an
utterance when trying to name an object. We will then correct the
child with an utterance of our own that we hope the child will better
mimic in the future. How might we apply a similar approach to teach
our robot to follow a beam of light?
Suppose we initialize the four connection weights in our neural
controller to random values. If we shine a light on the two sensors,
the random weights will map the sensor readings to a pair of motor
control signals and our robot we move in some direction. Of course,
the movement will be just as likely to be directed away from the
light as towards it, so at this point our robot will not follow the
path of the beam.
Now suppose that for each pair of readings from the light sensors,
we let the robot respond with a motion, but then we give it the
correct pair of motor signals that it should have produced to
follow the light. This would be analogous to asking a child to name a
cat and when the child says "dog", we provide the
correction by saying "cat". The question is, how do we
modify the connections in the neural network based on this
information so that the robot is more likely to make the correct
movement in the future?
Hebbian Learning
Fortunately, this problem has been worked on since the 1940s and
now has many different solutions depending on your goals and the
complexity of the network involved. Many of the solutions are
extensions of an early update rule called Hebbian learning.
Hebbian learning was first formulated in 1949 by Canadian
psychologist Donald Hebb and is often summarized by the phrase "cells
that fire together, wire together". The idea is that if
our current output unit values perform the correct motor response for
the current input unit values, then we should strengthen the
connections such that this mapping is more likely to occur in the
future. This can be achieved by using the following update rule:
w_{ij}(t
+ 1) = w_{ij}(t) + α(y_{i}
· x_{j})
The Hebbian update rule tells us that the connection weight w_{ij}
between the j^{th} input unit and i^{th}
output unit at time t+1
should equal the old weight at time t plus the product of the
input and output values, scaled by a learning constant α
that is usually a number between 0 and 1. The update rule can also be
written as:
∆w_{ij}
=
α(y_{i}
·
x_{j})
where
∆w_{ij}
=
w_{ij}(t
+ 1)  w_{ij}(t)
The Hebbian update rule can also be written in vector form as
follows:
∆w
=
α·y·x^{T}
where x^{T}
is the transpose of x.
The reason we multiply the input and output activities is simple:
if both units are highly active, then the product of their values is
a large number and we strengthen the connection between them by a
large amount. On the other hand, if one of the units has a low
activity while the other is large, then their product will be smaller
and the connection between them will be strengthened to a smaller
degree. Furthermore, if our units can take on negative values as well
as positive, then the product of a positive value and a negative
value will result in a decrease of the connection strength
between them. But the product of two large negative values will again
result in a strengthening of the connection. In essence, the Hebbian
update rule is a coincidence detector between input activities
and output activities. In fact, it is very closely related to the
process of detecting statistical correlations across the input
values.
The Delta Rule
We will explore Hebbian learning in greater detail in the next
section on unsupervised reinforcement learning. For now, we need a
variation on the Hebbian update rule known as the delta rule.
The idea is that if we already produce nearly the correct behavior,
then we should not modify the weights very much. On the other hand,
if we are wildly incorrect, then we should change the weights a lot.
This can be accomplished most simply by using the following update
rule:
∆w
=
α(τ
– y)·
x^{T}
In this equation, ∆w is the change in connection
strengths, x is the vector of activity across the input units,
y is the current response in the output units, and τ
is the target response—the output we want our network to
have when the input is x. The difference, τ –
y, is then a measure
of how far off the current response is when compared to the target
response. Once we compute this difference, the update rule becomes
the same as Hebbian learning, only now we are using the difference
between the target and current output values in the product with x^{T}
instead of just the output values.
We are nearly ready to test the learning rule on our robot. But
before we do, we need to figure out how we are going to transmit the
target motor signals to the neural network. Of course, we could do it
manually, perhaps using a joystick for input, but this would be very
slow and cumbersome. Instead, we will use our neural controller from
the previous section to provide the teaching signals. We can think of
it as a kind of "left brain/right brain" setup, where the
right brain of our robot is the neural network we are trying to
train, and the left brain already has a network that knows how to do
the job.
After initializing the connections in our network to random values
between 1 and +1, our learning algorithm will proceed through the
following steps:
Take a reading from the two light sensors.
Generate a movement by feeding these signals through our
current network ("right brain").
Generate a "virtual movement" by feeding the same
input signals through our working network ("left brain").
These are the target values.
Update the connection strengths in our "right brain"
network based on the current output values and the target output
values using the delta rule.
It is important to note that the left brain controller that we are
using as a teacher is not actually generating any movement. Nor is it
directly updating the weights in the right brain controller. Instead,
it is simply providing the correct output values for each pair of
light readings—correct in the sense that these output values
have been shown to work in an earlier demonstration. Our right brain
network still has to learn its own set of connections to move the
robot in the correct way. However, by using this virtual teacher
instead of manually providing the correct target values, we can run
the whole learning scenario much faster and watch the robot progress
in real time.
Normalization
Before testing our learning algorithm, there is one last book
keeping issue we must deal with. If one is not careful, Hebbianstyle
update rules can lead to connection weights that grow without bound
over the course of learning. The easiest way to keep the weight
values manageable, is to use normalization. There are a number
of popular normalization techniques one can use. Here will we
describe three of them: vector normalization, scaling, and
thresholding.
The first method divides each element of a vector or matrix by the
overall length or magnitude of the vector or matrix. For instance,
instead of using the raw input vector [x_{1}, x_{2}],
we use its normalized version:
where
is just the Euclidean norm or length of the vector [x_{1},
x_{2}]. A similar process is used to normalize the
output vector [o_{1}, o_{2}].
Normalization turns both input and output vectors into unit
vectors pointing in the direction of the original data.
Similarly, after each update of the weight matrix, we renormalize
the matrix by dividing each element by the new matrix magnitude:
As a concrete example, recall that the connection matrix we used
in the light following robot was as follows:
The magnitude of this matrix isas
you can figure out in your head by adding up the squares of all four
elements, then taking the square root. Dividing each element by this
number yields the following normalized matrix (rounded to two decimal
places):
Note how normalization does not change the sign of the elements
nor their relative sizes.
The second method scales each element by the maximum value
that the element can take on. For example, if we were measuring
temperature where the highest temperature we expect to see is 100º,
then we would divide each element by 100 to get its normalized value.
The third method uses a threshold
function to convert the original data into discrete values, typically
0 and 1. For example, when using a sonar sensor to measure distance,
one might set the threshold to three feet (36 inches) and assign a
value of 1 to any reading below this number and a 0 to any reading
above it. In this way, the sensor's original data is converted into a
kind of alert wherein objects closer than three feet generate a
warning whereas objects further away do not. It might seem that
thresholding the inputs would result in noticeably "discrete"
behavior from the robot and it would if we only had one input. But
imagine the more realistic situation where we have ten sensors. Then
the thresholded data is the equivalent of a 10digit binary number
that can take on 2^{10}
or 1024 values. Having over a thousand
possible graded responses would generally be sufficient for almost
any situation.
Depending on the situation, one
normalization method may be preferred over the other. Thresholding
throws away the most information but is often simpler to implement in
dedicated hardware. Vector normalization also tends to lose more
information since all vectors pointing in a given direction are
mapped into the same normalized vector. For example, the vectors [1,
0] and [10, 0]] would both be mapped into [1, 0] by the first method.
But if the maximum value the elements can take on is 10, the second
method would yield the two different vectors [0.1, 0] and [1, 0]. As
we shall see in a later section on obstacle avoidance, one can even
use a combination of methods with good results.
Computer Simulations
Computer simulations are frequently used to test a learning
algorithm before it is applied to real data, or, in our case, used on
a real robot. You can save yourself a lot of time by debugging the
algorithm in a simulation.
To simulate the delta rule learning algorithm for our light
following robot, we start by picking two random numbers between T
and 1024 to represent the readings on the two light sensors where T
is the minimum light value we want our robot to respond to. After
normalizing the resulting vector, we multiply the values by the
normalized "right brain" connection matrix to obtain the
output values that the network would currently send to the motors.
Then we multiply the same normalized input values by the "left
brain" connection matrix to see what the target output values
should be. These are the target values. Finally, we use the delta
rule to update the right brain connection strengths and renormalize
the matrix. This process is then repeated for as many learning
trials as we like.
One of the advantages of a simulation is that we can play with
different values of the learning parameter α
to see what effect it has on learning. The following graph shows the
results for a value of α
= 0.3:
The
colored lines are plots of the four connection weights over learning
cycles. Note that the initial weight values at the far left of the
graph are the random values selected at the start of the simulation.
Since our target output values are generated by a known matrix with
diagonal elements 0.32 and offdiagonal elements 0.63, we hope that
w_{11} and w_{22} will converge to
0.32 and that w_{12} and w_{21} will
converge to 0.63. Using the legend at the top of the graph to
identify which curve goes with each weight, you can see that after
200 learning trials, our weights are close to the correct values.
We can speed up learning by increasing α.
The next graph shows the results with α
= 0.5:
As
you can see, our connections converge toward the correct values much
more quickly this time. However, one can make the learning parameter
too large. Here is the result when α
= 0.9:
In
this case, all four connection weights simply bounce around the
correct values but never settle down. This results illustrates the
need to balance speed of learning against the stability of the
result.
Robot Demonstration
At last we are ready to try our learning algorithm on the robot.
Based on our simulation results, we will use a learning parameter
of α = 0.5. The robot's
behavior is then governed by the following steps:
First, we set the robot's network weights to random real
values between 1 and 1, followed by normalization.
Our robot reads the values from its two light sensors. If
both values are below the minimum threshold, we wait 100 msec and
read them again. This simply means that we only want the robot to
begin learning when it senses a bright light.
When at least one light reading is above threshold we
normalize the input vector and multiply it by our current weight
matrix to produce an output vector which we also normalize.
The values of the output vector are scaled by our maximum
motor speeds, then we apply the control signals to the motors
causing the robot to make a movement.
Initially, this movement will not be correct—i.e. the
robot will be just as likely to turn away from the light as toward
it. We need a way to teach the robot what the correct move should
have been. There are at least two ways we could provide the teaching
signal. Either we could use a joystick or other form of manual
control to input the correct motor signals, or, like the simulation
above, we can simply use the "left brain" network to map
the light sensor values into the correct output values. Since this
latter option is much easy to implement, that is the one we will
use.
So we multiply the input vector by the left brain network and
normalize the output values. We then use the delta rule to update
the control weights based on the difference between this output and
the robot's real outputs.
The learning cycle is then repeated.
The following video shows the result. Note that since the light
following task is fairly simple and we are using an essentially
perfect realtime teacher, learning is very fast. Within 1015 seconds of
learning, our robot is following the light beam as desired.
What were the network connections after learning? In this
particular case, the connections converged to the values:
which is very close to the matrix we used to provide the teaching
signals. Different values can be obtained under different training
conditions and because the initial random values of the connections
can change. However, in all cases, the general pattern of connections
will be the same; namely, the diagonal elements will inhibit the
motors and the off diagonal elements will turn them on, thus causing
the robot to turn in the correct direction. Also, the diagonal
elements will have a smaller absolute value than the offdiagonal
elements which is what allows the robot to move forward when the
light is roughly the same intensity at both sensors.
It is important to note that the initial random connections
between our input and output neurons are actually instrumental in
getting the robot to move in response to light. There is no need for
an external "movement generator": the connection matrix is
both the source of the initial exploratory movements as well as the
medium in which learning takes place.
Summary
We have seen that a robot operating in a realworld environment
can learn a simple light following behavior using a supervised
learning algorithm called the delta rule. In this case, the "teacher"
was an internalized version of a connection matrix we already knew
could produce the behavior. The same results could have been obtained
by manually sending the teaching signals using a joystick or other
such method, but the learning would have taken much longer since we'd
have to manually correct the robot after every small motion. A more
interesting version of such "guided learning" will be the
subject of a forthcoming article.
Further Reading