Table Of ContentUniverza v Ljubljani
Fakulteta za računalništvo in informatiko
Barry Ridge
Učenje osnovnih funkcionalnih lastnosti
predmetov v robotskem sistemu
DOKTORSKA DISERTACIJA
Mentor: prof. dr. Aleš Leonardis
Somentor: doc. dr. Danijel Skočaj
Ljubljana, 2014
University of Ljubljana
Faculty of Computer and Information Science
Barry Ridge
Learning Basic Object Affordances in a
Robotic System
DOCTORAL DISSERTATION
Supervisor: prof. dr. Aleš Leonardis
Co-supervisor: assist. prof. dr. Danijel Skočaj
Ljubljana, 2014
Dedicated to those who come after me.
Acknowledgements
First and foremost I would like to thank my supervisor prof. dr. Aleš Leonardis
and co-supervisor assist. prof. dr. Danijel Skočaj for giving me the opportunity
to do this research, for making it possible to come and live in the beautiful
country that is Slovenia, and for supporting me time and again when the going
got tough. I am eternally grateful. I would like to thank Radu Hourad, who
headed the VISIONTRAIN project which provided my original fellowship, for
the opportunity to be part of an exceptionally stimulating early research training
programme. I would also like to thank Jeremy Wyatt for having given me the
opportunity to work on the CogX project- a fantastic example of what great
people can do when they put their minds together, even while spread over a
continent. A big thank you must go to Aleš Ude for offering me my current job
at the Jožef Stefan Institute which has allowed me to both finish this Ph.D. and
continue my research.
IwouldliketothankallofmyformercolleaguesattheViCoSlabinLjubljana.
Matej Kristan helped me more times that I can ever remember, often seeming
like an oracle on all matters machine learning and otherwise. Roland Perko, the
master of low-level vision, showed me plenty of tricks and above all, taught me
how to be easy-going. A big thank you to Dušan Omerčević for kick-starting my
YouTube career and all the positive energy and great discussions. Luka Fürst got
me started with Matlab and reminded me that Vim is the one true text editor.
I must thank Matej Artač for providing much of the software and groundwork
that got me started working with stereo cameras and 3-D point clouds. Thank
you to Aleš Štimec for pointing me in the direction of cross-modal learning, for
all of the great help with various pieces of hardware and software, and for many
stimulating discussions. I would like to thank Luka Čehovin, without whom
I probably would not have had a webpage, SVN access, multi-core computing
power, and who knows what else. Thank you to Ondrej Drbohlav for helping me
figure out how to get curvature features from surface fitting. Special thanks must
also go to the ViCoS coffee club for all the vibrant discussions lubricated by one
of life’s finer, darker little luxuries.
I would like to thank my current colleagues at the Humanoid and Cognitive
Robotics Lab, and more generally, the Department for Automation, Biocybernet-
ics and Robotics, at the Jožef Stefan Institute, who have bid me warm welcome
into their midst and continue to provide both buoyant comradery and a vibrant
research environment. Thank you to Bojan Nemec, Anton Ružić, Andrej Gams,
Jan Babič, Igor Mekjavić, Leon Žlajpah and Igor Kovač for overseeing such a
fantastic department. Thank you to my lab-mates Miha Deniša, Robert Bevec,
Tadej Petrič, Rok Vuga, Aljaž Kramberger, Luka Peternel, Denis Forte, Nejc
Likar and Jernej Čamernik, for their help, support, laughs, lunchtime conversa-
tions, and for being such great colleagues. And thank you to Adam McDonnell
for being the very embodiment of home away from home. And finally, thank you
to Tanja Dragojevič and Marija Kavčič for keeping the whole show running so
well.
A very special thank you to my friend Andrej Schulz for the cartoon illustra-
tion in the introduction. Sometimes pictures really can paint a thousand words.
Or perhaps a little more in this case.
I would like to thank all of the great friends I have made over the years during
my time in Ljubljana who have made my passage through here all the lighter.
Many have come and gone during that time, and whether the rest of us stay on
or move on, I’m quite sure the memories will burn on.
I would like to thank my doctors, Ines Glavan Lenassi dr. med. and Draženka
Miličević dr. med., without whose kind and professional care at crucial junctures,
I would certainly not have been capable of completing this work.
Last, but of course, most importantly, I would like to thank my wonderful
family without whose love and support I would never have come this far.
Barry Ridge
Ljubljana
October 2014
iv
Abstract
One of the fundamental enabling mechanisms of human and animal intelli-
gence,andequally,oneofthegreatchallengesofmoderndayautonomousrobotics
is the ability to perceive and exploit environmental affordances. To recognise how
you can interact with objects in the world, that is to recognise what they afford
you, is to speak the language of cause and effect, and as with most languages,
practice is one of the most important paths to understanding. This is clear from
early childhood development. Through countless hours of motor babbling, chil-
dren gain a wealth of experience from basic interactions with the world around
them, and from there they are able to learn basic affordances and gradually more
complex ones. Implementing such affordance learning capabilities in a robot,
however, is no trivial matter. This is an inherently multi-disciplinary challenge,
drawing on such fields as autonomous robotics, computer vision, machine learn-
ing, artificial intelligence, psychology, neuroscience, and others.
In this thesis, we attempt to study the problem of affordance learning by
embracing its multi-disciplinary nature. We use a real robotic system to perform
experiments using household objects. Camera systems record images and video
of these interactions from which computer vision algorithms extract interesting
features. These features are used as data for a machine learning algorithm that
was inspired in part by ideas from psychology and neuroscience. The learning
algorithm is perhaps the main focal point of the work presented here. It is a
self-supervised multi-view online learner that dynamically forms categories in
one data view, or sensory modality, that are used to drive supervised learning in
another. While useful in and of itself, the self-supervised learner can potentially
benefit from certain augmentations, particularly over shorter training periods.
To this end, we also propose two novel feature relevance determination methods
that can be applied to the self-supervised learner.
With regard to robotic experiments, we make use of two different robotic
setups, each of which involves a robot arm operating in an experimental envi-
ronment with a flat table surface, with camera systems pointing at the scene.
Objects placed in the environment can be manipulated, generally pushed, by the
arm, and the camera systems can record image and video data of the interaction.
One of the camera systems in one of the setups is a stereo camera, and another in
theothersetupisanRGB-Dsensor, thusallowingfortheextractionofrangedata
and 3-D point cloud data. In the thesis, we describe computer vision algorithms
for extracting both salient object features from the static images and point cloud
data, and effect features from the video data of the object in motion.
A series of experiments are described that evaluate the proposed feature
relevance algorithms, the self-supervised multi-view learning algorithm, and the
application of these to real-world object push affordance learning problems using
the robotic setups. Some surprising results emerge from these experiments and
as well as those, under the conditions we present, our framework is shown to be
able to autonomously discover object affordance categories in data, predict the
affordance categories of novel objects and determine the most relevant object
properties for discriminating between those categories.
Key words:
affordances; affordance learning; self-supervised learning; multi-view learning;
cross-modal learning; multi-modal learning; feature relevance determination;
online learning; cognitive robotics; developmental robotics
Description:7.3 Object Push Affordance Experiments with Katana/Camera Setup 145 a value-free physical object to which meaning is somehow added in.