A Roadmap for Pi Robot
A Roadmap for Pi Robot
Last Updated: Feb 27, 2012
One of the hazards of working in
robotics is that it encourages ADD--there are just so many areas to
sensor integration, visual perception, face and speech
recognition, problem solving, learning and memory, language
comprehension, social communication, and so on. One could
literally spend a lifetime bouncing around from one topic to
another. This is beginning to happen
with the Pi Robot project, especially now that ROS makes it even easier
to get distracted by the "latest cool thing." So I thought I'd
draw up a roadmap of sorts to give us something a little more
structured to follow.
From the beginning, the primary goal behind the Pi Robot Project has
been to build a robot that can autonomously navigate
around a typical household or office environment while interacting with
people (and pets) and learn from experience. What we have
discovered over the past five years is that it takes a lot of
preliminary work to put this all together. The good news is that we
have stabilized on a few key ingredients: ROS for the overall software
framework, Dynamixel servos for joints, a Kinect for 3D vision, and a
Hokoyu laser scanner for SLAM and obstacle avoidance (though one can do
something similar with the Kinect alone). Pi's arms have also
grown to six servos (degrees of freedom) each. It takes six
degrees of freedom to specify the position and orientation of Pi's hand
or an object in space, so six servos per arms makes the problem of
reaching for such an object easier to solve.
But we still have lots of work to do on the basics. Here are the
major sign posts we need to pass along the way to our goal. We'll
use a check mark icon to indicate what we have already got under control and an hour glass icon to flag what we have yet to do:
1. Motor Control
- Navigation, Path Planning and SLAM (Simultaneous Localization and Mapping)
Pan and tilt servo control for Pi's camera.
- Topological navigation using semantic labels; e.g. "Go to the living room."
- Both forward and inverse kinematics have been tested using David Lu's arm_kinematics ROS package.
- Test the OpenRAVE kinematics package and compare results to above
- Program simple reaching tasks
- Incorporate collision avoidance while reaching (ROS arm_navigation stack)
2. Visual Feature Detection and Tracking
- Tracking color features (CamShift)
- Tracking "points of interests" using optical flow (Lucas-Kanade)
- Face detection
- Skeleton tracking (Kinect + openni_tracker ROS package)
- Face detection followed by tracking. (See pi_face_tracker.)
- Implement the TLD algorithm (Tracking-Learning-Detecting) by Zdenek Kalal
3. Visual Object Detection and Recognition
- Recognizing feature patterns as object classes (e.g. "chair", "cat", "person")
- Recognizing an object as a specific instance of a class (e.g. "person"=>"Joe")
4. Speech Recognition and Speech Synthesis (See pi_speech_tutorial)
recognition using CMU's Pocket Sphinx (ROS stack rharmony). Recognize basic phrases such as "go forward" or "turn left".
- Recognize more complex phrases such as "bring the blue ball".
- Map recognized phrases into robot actions.
- Find a suitable voice for Pi Robot using the Festival TTS package.
- Implement the semantic frames using the RoboFrameNet ROS stack.
5. Simple Goal Directed Actions
- Create a collection of ROS action
servers and clients for preforming simple tasks such as "pick up the
cup" or "turn to face Joe". ROS actions provide a mechanism for
defining a goal, then setting the task in motion while feedback updates
progress toward the goal, when it has been completed, timed out, or
6. Action Sequences and Task Planning
- We will use the ROS SMACH
package for executing a series of actions aimed a solving more
complicated goals. SMACH stands for "State Machine" and allows
the creation of a task hierarchy that defines how a particular task can
be broken down into sub-tasks. For example, the high level task
"bring me a beer" might be broken down into the following subtasks:
"navigate to kitchen"=>"locate fridge"=>"grasp handle"=>"open
door"=>"locate beer"=>"grasp beer"=>etc. SMACH allows us
to set up this chain of events and then set it in motion, while the
underlying library manages the contingencies between sub-tasks.
- We will use the ROS Executive Teer stack for more complex task planning.
7. Learning and Memory
- Psychologists define three primary kinds of memory: procedural, episodic, and semantic. Pi Robot will need all three, but especially the first two:
Psychologists and machine learning experts also define many different forms of learning: supervised, reinforcement, guided, statistical, and observational.
- Procedural Memory:
Learning to play golf is a kind of procedural memory. At first
you can't even hit the ball, but with some practice, your eye-hand
coordination improves to the point where you might actually par one or
two holes. Once we have Pi's arm kinematics worked out, the
solutions we compute for, say, reaching for an object, will apply to an
ideal situation where there is no "slop" in Pi's joints, which of
course there is. We can therefore insert a neural network between
Pi's vision system and his arm kinematics that will then learn to
compensate for these imperfections.
- Episodic Memory:
If asked to retrieve a particular object that lies somewhere in your
house, it will benefit Pi to remember where he might have last seen it.
This is an example of episodic memory. Another example
would be for Pi to remember the various activities he performed
yesterday or last week. For example, you might ask "Did you tidy
the living room yesterday?" It is tempting to think that because
a robot has a computer for a brain, we could simply store all the data
that is comes in through its sensors. But video alone would fill
even a large hard drive within a day or two. So we must be
selective in what is stored.
- Semantic Memory:
Knowing the capital of Kazakhstan is an example of semantic memory.
The only way to know this is to have heard or read the fact at
some point. Robot's can operate a little differently than people
in this regard thanks to the Web and data structures called semantic networks.
A semantic network connects a collection of facts or concepts by
links that represent the relationships between them such as "birds lay
eggs". A number of projects are well under way (e.g. ConceptNet)
that enable a computer program (which can be run on your robot) to
query these large semantic databases in a way similar to the way we
access basic facts such as "What is the capital of Kazakhstan?" Answer: Astana.
- Supervised Learning:
A good example of supervised learning is color naming; i.e. "this banana
is yellow, but that one is green". For a robot to use color names
the same way we do, it must be shown examples of different colors and
told the correct label. (See for example the work of Kimberly Jameson.)
One way to do this is to use an artificial neural network that
takes color histograms as inputs and produces color names as outputs.
With enough training using a human teacher, the network learns to
categorize colors in a manner similar to people. I have done some
preliminary work on this using a simple Perceptron neural network and
it performs remarkably well.
- Reinforcement Learning:
If you touch a hot stove for the first time, you will suffer pain, but
then you will be unlikely to ever do it again. Conversely, if you
choose a new route to work that gets you there 10 minutes faster, you'll
likely choose that route again. Reinforcement learning requires
an action followed by an outcome that can be scored positive or
negative. Positive reinforcement increases the probability of
repeating the action while negative reinforcement reduces it.
- Guided Learning:
An example of guided learning takes place when a golf instructor
moves your arm through an example of a good swing.
The idea is that your brain will map the proprioceptive sensations into
motor commands you can produce on your own. This kind of learning
is fairly easy to do on a robot since we have continuous feedback from
the servos regarding their current position, speed, torque, and even
- Statistical Learning: Statistical learning is closely related to data mining.
Take for example the simple question: "How many bedrooms are there in a
house?" Of course, there is no single answer. In any given
house, there may be one bedroom or a hundred. But if we knew the
number for every house, the number of bedrooms would form a distribution
with a peak somewhere around 2. Now suppose you enter someone's
house for the first time. How many bedrooms should you guess it
has? Statistical learning theory tells us how we can make an
informed guess from the underlying distribution, but since we don't
know the underlying distribution exactly, we must estimate
the distribution from our experience and then make our guess from that
estimate. One of the more popular methods used in machine
learning for performing these calculations is Bayesian Classification. Others involve simple clustering methods. We will have much to say and do with these method and other statistical learning techniques.
- Observational Learning and Imitation:
Pi Robot's ability to mimic the arm movements of a person standing in
front of him is an example of imitation. Imagine the
possibilities this opens up for teaching a robot a particular
task. Suppose you want Pi to stir something in a pot.
Trying to program a stirring motion from scratch into Pi's various arm
joints would be a difficult task. But if we simply let Pi watch us
stir something, he can then mimic our actions and store them for future
use. Observational learning can go one step further than imitation
alone. For suppose moving my arm a certain way results in damage
to my hand. In this case, we would *not* want Pi to imitate the
action but rather, avoid the action he just observed.
8. Reasoning and Problem Solving
and problem solving are often taken as the hallmark of human
intelligence, but in fact many animal species are quite good at.
Suppose we ask Pi Robot to retrieve an item that is blocked by another
object. What would it take for Pi to figure out he first needs to
move the first object out of the way? Computer programs that can solve problems
have been around since the dawn of AI (see for example ACT-R)
but most assume that the problems to be solved can be given a definite
formal structure (like chess), which is often not the case in
real-world situations. For example, if Pi can move the blocking
object anywhere he likes, where should he move it? More recently,
progress has been made on more general planning and scheduling systems as well as hierarchical task networks. Needless to say, this will be one of our more difficult challenges.
9. Executive Controller: What Should I Do Next?
overriding all of Pi's behavior must be some form of executive
control. Why? The funny thing about a robot is that if
you turn it on and don't tell it something to do, it will just sit
there! (I actually do this myself sometimes...) Ordinarily, we
give a robot something to do by running a specific program aimed at
carrying out a particular task such as "navigate to the dining room" or
"pick up the cup" or "mimic my arm motions". But if we want our
robot to simply wander about the house and perform actions on the fly,
we need a way for Pi to have a set of default behavioral goals that can
nonetheless be interrupted by specific commands or events. A
popular mechanism in robotics for achieving this scenario is called Subsumption.
The idea is to set up a hierarchy of default behaviors based
on their priority. For example, if Pi has nothing better to do,
his default behavior might be "roam around the house and note anything
out of the ordinary". At the same time, he could be streaming the
video image from his camera to a web page so that you can monitor your
home while you're away and send you alerts by email when something odd
is detected. A behavior with higher priority than "roam"
would be "recharge batteries if running low". Another would be
"escape if stuck". In fact, since "roam" would be one of the
lowest priority behaviors, almost anything else would preempt it such
as "Pi, please find the TV remote". Fortunately, ROS has just the
right mechanisms for implementing this executive controller; namely the SMACH and Executive Teer packages that we heard about in Section 6. So we shouldn't have too much work to do once we get to this point.