Biological brains excel at object
tracking. Even very young infants can move their eyes and head to
follow a finger or face moving in front of them. So imagine we aim
our web camera at an object of interest and then move the object at
random. Our task is to program our robot to move the camera to keep
the object centered in the field of view.
Visual Filters and Blobs
Tracking an object implies that we can
recognize at least one visual characteristic of the object from one
video frame to the next. In computer vision, a good visual property
to start with is color. A brightly colored ball or balloon will do
nicely since it will stand out sharply against the other colors in
the scene. So let's attempt to track a bright orange balloon as we
move it about in front of our camera as shown in the image below:
To center the balloon in the camera's
field of view, we first have to be able to locate it in the current
view. To do this, we filter the current image by removing all
pixels that don't match a certain level of the target color. What
remains, we hope, are just those pixels that belong to our balloon.
The process of filtering an image to highlight the object of interest
goes beyond color. As we will see later, we can define filters even
for complex objects such as the pattern of a human face. For this
reason, visual object detection and recognition is often described in
terms of finding the correct filter or filters for the task at hand.
Returning to our orange balloon, the
image below shows the result of filtering the original image using
RoboRealm's RGB filter set to match only pixels with a high
level of red:
As you can see, the majority of the
pixels are removed and all we are left with are two red pixel areas
or blobs, one contained within the boundary of the balloon and
the other reflected from a magazine lying on the floor. Note how the
redness of the cat falls below the threshold we set for the RGB
filter so that as far as the filter is concerned, the cat does not
exist. This nicely illustrates a key point about perception: we tend
to see what we are looking for. And while this may cause us to miss
something important at times, it is the only way our visual system
can selectively attend to some aspect of world while ignoring the
rest.
To eliminate the magazine pixels and
isolate just the balloon, we apply some additional filters using
RoboRealm. First, we use the Erode filter which whittles away
at each area of the image so that small pixel patches like the
magazine tend to disappear altogether. Then we use the Dilate
filter to bring back some of the pixels we lost on our bigger
blobs. Finally, we use the Convex Hull filter to round out
the border of the balloon and reduce its raggedness. The result
looks like the following:
The balloon has been isolated as a
fairly round white blob which we can now easily locate and track as
it moves across the field of view. To get the balloon's coordinates,
we use RoboRealm's Center of Gravity module which places a box
around the pixels in the image and returns the coordinates of its
center point. The result can then be superimposed on the original
image as shown below:
RoboRealm has nicely isolated the
orange balloon even with the cat all over it. The green diagonal
line in the picture is the displacement of the balloon from the
center of the frame and tells us how we need to move the camera to
center the balloon in the field of view. When we translate this
displacement into rotations of the pan and tilt servos of our robot's
head and camera (details below), we get the behavior shown in the
videos below. The first video shows the view from the robot camera:
The second video shows the view from an
external camera that includes both the robot and the moving balloon:
While tracking of the balloon is relatively satisfactory in these
videos, there is a noticeable lag between changes in the movement of
the target and the response from the robot. There are a number of
reasons for this. First, the camera used (a DLink 920) is operating
wirelessly over 802.11g and there is an irreducible delay in getting
the latest image frame back to the router and over to the main
computer. Second, the tracking algorithm (detailed below) depends on
there being a displacement between the target and the center of the
image frame. For this reason, it is impossible to track the balloon
in perfect sync since this would imply a zero displacement at all
times. The only way this could happen would be for the robot to
anticipate the movement of the balloon before it actually happens.
Clearly animal and human brains are able to do just this under certain
circumstances but it is outside the scope of this article. And
finally, the frame rate of the video camera is also a limiting factor.
In the videos shown here, the frame rate was 30 fps. Better results
can be obtained when using a directly attached USB camera running at
90 fps.
Having said all this, we can improve the tracking speed by adjusting
some parameters in the algorithm as explained below. Here are a couple
of examples demonstrating some faster tracking including a number times
the balloon is kicked into the air:
We will now look at the details of how
we map the visual coordinates of the center of gravity (COG) of the
orange blob into appropriate servo commands to move the head and
camera. We start with the observation that the further the balloon
is from the center of the image, the faster we need to move the
camera since we have a greater distance to travel to re-center the
target. So the servo speeds need to be proportional to the
displacement of the COG from the center of the image. The view
through the camera lens is illustrated in the diagram below:
Let's begin with the horizontal
component of the COG displacement and the corresponding motion of the
head's panning servo. A similar analysis would apply to vertical
displacements and the servo that tilts the head. Let Fx
be the horizontal field of view in degrees of our camera and let Rx
be the horizontal resolution in pixels. (In the videos shown above,
Fx is 61°
and the resolution is 320x240 pixels so that Rx
is 320.) Now suppose the COG of the balloon is currently
displaced horizontally by Dx pixels from the
center of the image. It is easier to work with this displacement in
degrees which we can compute from (Dx /
Rx) ·
Fx. To pan the head through that
angular distance in T seconds, the required servo speed Sx
in degrees per second is given by:
Sx = (Dx
/ Rx)
· Fx
/ T
Since
the values of Rx,
Fx
and T can be fixed for
a given situation, we see that the servo rotation speed is simply
proportional to the COG displacement from the center of the image:
Sx
= kx
· Dx
where
kx =
Fx
/ (Rx
· T)
The
final detail we are missing is how to command our servos to move with
a particular rotational speed in degrees per second as specified by
the above equation. If M is the maximum rotational
speed of our servos, and I is the control value corresponding
to that maximum speed, then the control signal C required to get the
servo moving at speed S is given by:
C = I · S / M
Combining this with the previous
equation, we have:
Cx = kx' · Dx
where kx'
= kx · I / M
Let's now look at a concrete example.
For the camera used in the videos above, the horizontal field of
view Fx is 61 degrees and the
horizontal resolution Rx is 320 pixels.
Reading the manual for the Dynamixel AX-12+ servos, we find that the
maximum speed M is 114 rpm or 684 degrees/second and the
maximum control signal I is 1023. Finally, suppose we want
the robot to move to the target's position in ¼ of a second
(250ms). Then T = 0.25. Plugging these numbers in for kx'
above, we find:
Cx = 1.14 · Dx
In other words, the control signal we
send to our servo is simply 1.14 times the displacement of the
balloon in pixels from the center of the image. Of course, as soon
as either the camera or the balloon moves, the value of Dx
changes and so must our control signal Cx.
Fortunately, even an inexpensive desktop PC can execute this update
at least 20 times per second (once every 50ms) so that the result is
fairly smooth tracking as seen in the previous videos. A similar
analysis for the vertical displacement of the target would show:
Cy = 1.12 · Dy
where we have used Fy
= 45 and Ry = 240. Note that the
multipliers in these two control equations are based on the
assumption of a ¼ second reaction time. For faster tracking,
try increasing these values. For example, to respond in 1/10 of a
second (100ms), the equations would be Cx = 2.85 · Dx
and Cy = 2.80 · Dy.
Programming the Object Tracking Thread
We are
now ready to implement our tracking algorithm in code. As with all
of our robot's behaviors, tracking will take place in its own thread.
On each update cycle of the algorithm, we first query RoboRealm for
the target's current horizontal and vertical coordinates in the
visual field. These give us our Dx and Dy
displacement values which we then plug into our control equations to
give us the servo input values. The servos are then commanded to
update their rotation speeds accordingly. The complete thread is
shown below: