Pi Robot

Home

New Site

Blog

ROS

Code

Contests

Hardware

Media

Photos

Videos

Learning in Artificial Neural Networks

In the previous article, we developed a light following robot controlled by an artificial neural network where we manually set the connection weights between inputs and outputs. Since the number of connections was small and the behavior was simple, it was fairly easy to guess a set of weights that would work. However, this will rarely be the case with bigger networks and more complex behaviors. In those situations, we will need a way for the robot to learn the connection strengths between input and output units.

Here is a picture of our robot showing the two light sensors mounted on the front. Note that the sonar sensors also visible in this picture are not used in this experiment.

There are a number of learning algorithms that can be applied to artificial neural networks depending on the architecture of the network and whether or not we are teaching the network the correct responses or it has to figure it out on its own. When there is a teacher is involved, the process is called supervised learning. Otherwise, it is called unsupervised learning. In both cases, there are different types of feedback that the network can use to modify the connections. The following table illustrates the various learning situations our robot might encounter. In each cell of the table, we have briefly described a situation that illustrates the relevant learning mode.

Learning Mode	Supervised	Unsupervised
Exemplar	Learning a language with correct pronunciation provided by a teacher.	N/A
Reinforcement	A child learns what it is allowed to touch using only "yes-no" feedback from the parent.	Touching a hot stove element causes pain.
Guided	A golf instructor moves your arm through an example of a good swing.	N/A
Statistical	The living room is always called the "living room" regardless of where we are standing and what we see.	Over time we observe that most houses have a living room, most apples are red, etc.
Observational	You observe your tennis instructor perform a correct swing and try to emulate their behavior.	When other people cut in line, people treat them badly.

With supervised exemplar learning, the correct pattern of activity across the output units is known and can be compared to the current output pattern. In reinforcement learning, both supervised and unsupervised, we only know if our current output pattern is correct or incorrect without knowing any more details. And with guided learning, the output units are essentially "remote controlled" through the correct sequence of values for a given set of input values. Statistical learning does not involve outputs per se, but simply a collection of input patterns whose general trend or summary properties we want to determine. Observational learning involves watching another individual—usually of the same species—perform some action, then extracting the information required from those observations for performing the action yourself.

We will have occasion to examine all these learning modes in greater depth in future articles, but for our light following robot, we are going to begin with the easiest learning mode highlighted in gray which is supervised exemplar learning.

It's All About The Connections

In all cases of learning, we need a method for changing the connection weights over time so that performance improves. The particular method of modifying the weights is called the update rule for that network. In general, we are looking for a process that will strengthen the connection between units that are active at the same time while weakening the connection when they are activated out of sync or in opposite directions; i.e., one positive, the other negative.

The choice of update rule is usually determined by the architecture of the network and the type of learning involved. We will introduce different update rules in this series of articles as they become applicable including Hebbian learning, the delta rule, back propagation, self-organizing maps, and genetic algorithms. Since learning takes place in the connections between units, artificial neural networks are also referred to as connectionist networks.

Supervised Exemplar Learning

When teaching a child to speak, we often engage a kind of trial and error learning that is at the heart of many artificial neural networks. In the case of language, the child will often make an utterance when trying to name an object. We will then correct the child with an utterance of our own that we hope the child will better mimic in the future. How might we apply a similar approach to teach our robot to follow a beam of light?

Suppose we initialize the four connection weights in our neural controller to random values. If we shine a light on the two sensors, the random weights will map the sensor readings to a pair of motor control signals and our robot we move in some direction. Of course, the movement will be just as likely to be directed away from the light as towards it, so at this point our robot will not follow the path of the beam.

Now suppose that for each pair of readings from the light sensors, we let the robot respond with a motion, but then we give it the correct pair of motor signals that it should have produced to follow the light. This would be analogous to asking a child to name a cat and when the child says "dog", we provide the correction by saying "cat". The question is, how do we modify the connections in the neural network based on this information so that the robot is more likely to make the correct movement in the future?

Hebbian Learning

Fortunately, this problem has been worked on since the 1940s and now has many different solutions depending on your goals and the complexity of the network involved. Many of the solutions are extensions of an early update rule called Hebbian learning. Hebbian learning was first formulated in 1949 by Canadian psychologist Donald Hebb and is often summarized by the phrase "cells that fire together, wire together". The idea is that if our current output unit values perform the correct motor response for the current input unit values, then we should strengthen the connections such that this mapping is more likely to occur in the future. This can be achieved by using the following update rule:

w_ij(t + 1) = w_ij(t) + α(y_i · x_j)

The Hebbian update rule tells us that the connection weight w_ij between the j^th input unit and i^th output unit at time t+1 should equal the old weight at time t plus the product of the input and output values, scaled by a learning constant α that is usually a number between 0 and 1. The update rule can also be written as:

∆w_ij = α(y_i · x_j)

where

∆w_ij = w_ij(t + 1) - w_ij(t)

The Hebbian update rule can also be written in vector form as follows:

∆w = α·y·x^T

where x^T is the transpose of x.

The reason we multiply the input and output activities is simple: if both units are highly active, then the product of their values is a large number and we strengthen the connection between them by a large amount. On the other hand, if one of the units has a low activity while the other is large, then their product will be smaller and the connection between them will be strengthened to a smaller degree. Furthermore, if our units can take on negative values as well as positive, then the product of a positive value and a negative value will result in a decrease of the connection strength between them. But the product of two large negative values will again result in a strengthening of the connection. In essence, the Hebbian update rule is a coincidence detector between input activities and output activities. In fact, it is very closely related to the process of detecting statistical correlations across the input values.

The Delta Rule

We will explore Hebbian learning in greater detail in the next section on unsupervised reinforcement learning. For now, we need a variation on the Hebbian update rule known as the delta rule. The idea is that if we already produce nearly the correct behavior, then we should not modify the weights very much. On the other hand, if we are wildly incorrect, then we should change the weights a lot. This can be accomplished most simply by using the following update rule:

∆w = α(τ – y)· x^T

In this equation, ∆w is the change in connection strengths, x is the vector of activity across the input units, y is the current response in the output units, and τ is the target response—the output we want our network to have when the input is x. The difference, τ – y, is then a measure of how far off the current response is when compared to the target response. Once we compute this difference, the update rule becomes the same as Hebbian learning, only now we are using the difference between the target and current output values in the product with x^T instead of just the output values.

We are nearly ready to test the learning rule on our robot. But before we do, we need to figure out how we are going to transmit the target motor signals to the neural network. Of course, we could do it manually, perhaps using a joystick for input, but this would be very slow and cumbersome. Instead, we will use our neural controller from the previous section to provide the teaching signals. We can think of it as a kind of "left brain/right brain" setup, where the right brain of our robot is the neural network we are trying to train, and the left brain already has a network that knows how to do the job.

After initializing the connections in our network to random values between -1 and +1, our learning algorithm will proceed through the following steps:

Take a reading from the two light sensors.
Generate a movement by feeding these signals through our current network ("right brain").
Generate a "virtual movement" by feeding the same input signals through our working network ("left brain"). These are the target values.
Update the connection strengths in our "right brain" network based on the current output values and the target output values using the delta rule.

It is important to note that the left brain controller that we are using as a teacher is not actually generating any movement. Nor is it directly updating the weights in the right brain controller. Instead, it is simply providing the correct output values for each pair of light readings—correct in the sense that these output values have been shown to work in an earlier demonstration. Our right brain network still has to learn its own set of connections to move the robot in the correct way. However, by using this virtual teacher instead of manually providing the correct target values, we can run the whole learning scenario much faster and watch the robot progress in real time.

Normalization

Before testing our learning algorithm, there is one last book keeping issue we must deal with. If one is not careful, Hebbian-style update rules can lead to connection weights that grow without bound over the course of learning. The easiest way to keep the weight values manageable, is to use normalization. There are a number of popular normalization techniques one can use. Here will we describe three of them: vector normalization, scaling, and thresholding.

The first method divides each element of a vector or matrix by the overall length or magnitude of the vector or matrix. For instance, instead of using the raw input vector [x₁, x₂], we use its normalized version:

where

is just the Euclidean norm or length of the vector [x₁, x₂]. A similar process is used to normalize the output vector [o₁, o₂]. Normalization turns both input and output vectors into unit vectors pointing in the direction of the original data.

Similarly, after each update of the weight matrix, we re-normalize the matrix by dividing each element by the new matrix magnitude:

As a concrete example, recall that the connection matrix we used in the light following robot was as follows:

The magnitude of this matrix isas you can figure out in your head by adding up the squares of all four elements, then taking the square root. Dividing each element by this number yields the following normalized matrix (rounded to two decimal places):

Note how normalization does not change the sign of the elements nor their relative sizes.

The second method scales each element by the maximum value that the element can take on. For example, if we were measuring temperature where the highest temperature we expect to see is 100º, then we would divide each element by 100 to get its normalized value.

The third method uses a threshold function to convert the original data into discrete values, typically 0 and 1. For example, when using a sonar sensor to measure distance, one might set the threshold to three feet (36 inches) and assign a value of 1 to any reading below this number and a 0 to any reading above it. In this way, the sensor's original data is converted into a kind of alert wherein objects closer than three feet generate a warning whereas objects further away do not. It might seem that thresholding the inputs would result in noticeably "discrete" behavior from the robot and it would if we only had one input. But imagine the more realistic situation where we have ten sensors. Then the thresholded data is the equivalent of a 10-digit binary number that can take on 2¹⁰ or 1024 values. Having over a thousand possible graded responses would generally be sufficient for almost any situation.

Depending on the situation, one normalization method may be preferred over the other. Thresholding throws away the most information but is often simpler to implement in dedicated hardware. Vector normalization also tends to lose more information since all vectors pointing in a given direction are mapped into the same normalized vector. For example, the vectors [1, 0] and [10, 0]] would both be mapped into [1, 0] by the first method. But if the maximum value the elements can take on is 10, the second method would yield the two different vectors [0.1, 0] and [1, 0]. As we shall see in a later section on obstacle avoidance, one can even use a combination of methods with good results.

Computer Simulations

Computer simulations are frequently used to test a learning algorithm before it is applied to real data, or, in our case, used on a real robot. You can save yourself a lot of time by debugging the algorithm in a simulation.

To simulate the delta rule learning algorithm for our light following robot, we start by picking two random numbers between T and 1024 to represent the readings on the two light sensors where T is the minimum light value we want our robot to respond to. After normalizing the resulting vector, we multiply the values by the normalized "right brain" connection matrix to obtain the output values that the network would currently send to the motors. Then we multiply the same normalized input values by the "left brain" connection matrix to see what the target output values should be. These are the target values. Finally, we use the delta rule to update the right brain connection strengths and re-normalize the matrix. This process is then repeated for as many learning trials as we like.

One of the advantages of a simulation is that we can play with different values of the learning parameter α to see what effect it has on learning. The following graph shows the results for a value of α = 0.3:

The colored lines are plots of the four connection weights over learning cycles. Note that the initial weight values at the far left of the graph are the random values selected at the start of the simulation. Since our target output values are generated by a known matrix with diagonal elements -0.32 and off-diagonal elements 0.63, we hope that w₁₁ and w₂₂ will converge to -0.32 and that w₁₂ and w₂₁ will converge to 0.63. Using the legend at the top of the graph to identify which curve goes with each weight, you can see that after 200 learning trials, our weights are close to the correct values.

We can speed up learning by increasing α. The next graph shows the results with α = 0.5:

As you can see, our connections converge toward the correct values much more quickly this time. However, one can make the learning parameter too large. Here is the result when α = 0.9:

In this case, all four connection weights simply bounce around the correct values but never settle down. This results illustrates the need to balance speed of learning against the stability of the result.

Robot Demonstration

At last we are ready to try our learning algorithm on the robot.

Based on our simulation results, we will use a learning parameter of α = 0.5. The robot's behavior is then governed by the following steps:

First, we set the robot's network weights to random real values between -1 and 1, followed by normalization.
Our robot reads the values from its two light sensors. If both values are below the minimum threshold, we wait 100 msec and read them again. This simply means that we only want the robot to begin learning when it senses a bright light.
When at least one light reading is above threshold we normalize the input vector and multiply it by our current weight matrix to produce an output vector which we also normalize.
The values of the output vector are scaled by our maximum motor speeds, then we apply the control signals to the motors causing the robot to make a movement.
Initially, this movement will not be correct—i.e. the robot will be just as likely to turn away from the light as toward it. We need a way to teach the robot what the correct move should have been. There are at least two ways we could provide the teaching signal. Either we could use a joystick or other form of manual control to input the correct motor signals, or, like the simulation above, we can simply use the "left brain" network to map the light sensor values into the correct output values. Since this latter option is much easy to implement, that is the one we will use.
So we multiply the input vector by the left brain network and normalize the output values. We then use the delta rule to update the control weights based on the difference between this output and the robot's real outputs.
The learning cycle is then repeated.

The following video shows the result. Note that since the light following task is fairly simple and we are using an essentially perfect real-time teacher, learning is very fast. Within 10-15 seconds of learning, our robot is following the light beam as desired.

What were the network connections after learning? In this particular case, the connections converged to the values:

which is very close to the matrix we used to provide the teaching signals. Different values can be obtained under different training conditions and because the initial random values of the connections can change. However, in all cases, the general pattern of connections will be the same; namely, the diagonal elements will inhibit the motors and the off diagonal elements will turn them on, thus causing the robot to turn in the correct direction. Also, the diagonal elements will have a smaller absolute value than the off-diagonal elements which is what allows the robot to move forward when the light is roughly the same intensity at both sensors.

It is important to note that the initial random connections between our input and output neurons are actually instrumental in getting the robot to move in response to light. There is no need for an external "movement generator": the connection matrix is both the source of the initial exploratory movements as well as the medium in which learning takes place.

Summary

We have seen that a robot operating in a real-world environment can learn a simple light following behavior using a supervised learning algorithm called the delta rule. In this case, the "teacher" was an internalized version of a connection matrix we already knew could produce the behavior. The same results could have been obtained by manually sending the teaching signals using a joystick or other such method, but the learning would have taken much longer since we'd have to manually correct the robot after every small motion. A more interesting version of such "guided learning" will be the subject of a forthcoming article.