Cooperative Human Robot Interaction Systems: IV. Communication of Shared Plans with Naïve Humans using Gaze and Speech - PDF

Description
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, Tokyo, Japan Cooperative Human Robot Interaction Systems: IV. Communication of Shared Plans with Naïve

Please download to get full document.

View again

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Automotive

Publish on:

Views: 20 | Pages: 8

Extension: PDF | Download: 0

Share
Transcript
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, Tokyo, Japan Cooperative Human Robot Interaction Systems: IV. Communication of Shared Plans with Naïve Humans using Gaze and Speech Stéphane Lallée, Katharina Hamann, Jasmin Steinwender, Felix Warneken, Uriel Martienz, Hector Barron-Gonzales, Ugo Pattacini, Ilaria Gori, Maxime Petit, Giorgio Metta, Paul Verschure, Peter Ford Dominey Abstract Cooperation 1 is at the core of human social life. In this context, two major challenges face research on humanrobot interaction: the first is to understand the underlying structure of cooperation, and the second is to build, based on this understanding, artificial agents that can successfully and safely interact with humans. Here we take a psychologically grounded and human-centered approach that addresses these two challenges. We test the hypothesis that optimal cooperation between a naïve human and a robot requires that the robot can acquire and execute a joint plan, and that it communicates this joint plan through ecologically valid modalities including spoken language, gesture and gaze. We developed a cognitive system that comprises the human-like control of social actions, the ability to acquire and express shared plans and a spoken language stage. In order to test the psychological validity of our approach we tested 12 naïve subjects in a cooperative task with the robot. We experimentally manipulated the presence of a joint plan (vs. a solo plan), the use of task-oriented gaze and gestures, and the use of language accompanying the unfolding plan. The quality of cooperation was analyzed in terms of proper turn taking, collisions and cognitive errors. Results showed that while successful turn taking could take place in the absence of the explicit use of a joint plan, its presence yielded significantly greater success. One advantage of the solo plan was that the robot would always be ready to generate actions, and could thus adapt if the human intervened at the wrong time, whereas in the joint plan the robot expected the human to take his/her turn. Interestingly, when the robot represented the action as involving a joint plan, gaze provided a highly potent nonverbal cue that facilitated successful collaboration and reduced errors in the absence of verbal communication. These results support the cooperative stance in human social cognition, and suggest that cooperative robots should employ joint plans, fully communicate them in order to sustain effective collaboration while being ready to adapt if the human makes a midstream mistake. Index Terms cooperation, joint plan, shared intention, gaze, spoken language, HRI, cognitive architecture. Stéphane Lallée and Paul Verschure are with SPECS, UPF & ICREA Barcelona, Spain, Katharina Hamann & Jasmin Steinwender are with the Max Planck Institute EVA Leipzig. Felix Warneken is with the Dept. Psychology, Harvard University. Uriel Martienz & Hector Barron- Gonzales are with the University of Sheffield, Ugo Pattacini, Illaria Gori & Giorgio Metta are with the Italian Institute of Technology, Maxime Petit & Peter Ford Dominey are with the Robot Cognition Laboratory, INSERM, Lyon. 1 I. INTRODUCTION A key challenge of robotics is to endow robots with the capability to collaborate closely with humans. This requires systems that can directly interact with humans while adapting to novel exigencies in the environment and responding to the inherently complex actions that humans perform. Despite these complexities, one biological system masters these challenges with apparent ease: human children. From early on in their lives, young children are able to socially interact with others in a cooperative fashion, demonstrating successful cooperation in fairly complex and sometimes novel situations often without much learning and before they have developed a proper command of language or abstract thought [1, 2]. Research in human cognitive development has investigated the cognitive foundations at the basis of this capability to cooperate. Two aspects of human social cognition that stand out in this capability are (1) the capability to understand and represent others as intentional agents, and (2) the capability and motivation to share intentions. Together, these capabilities provide the basis for dialogic interactions centered on shared intentions, which lead to the construction of joint plans [3-5]. Joint plans correspond to representations created and negotiated by two agents that allow them to act together in a coordinated way to achieve their shared goal. Because of the supposed crucial role of joint plans in cooperative behavior, we have focused on the implementation of joint plan learning and use in the context of cooperative human-robot interaction [6-8]. Our previous research demonstrated that indeed, a robot equipped with the ability to learn and use joint plans could successfully learn new cooperative tasks and use the learned joint plan to perform the shared task with novel objects. In the current research, we extend this work in cooperation, and evaluate the psychological plausibility and efficiency of a human-like dyadic interaction based on joint plans expressed through gesture, gaze and speech. We test the hypothesis that optimal cooperation between a naïve human and robot requires that the robot has a joint plan, and that it communicates this joint plan through all modalities available including spoken language and gaze. Dyadic social interaction is a central feature of human behavior that entails regular patterns of behavior in which /13/$ IEEE 129 each interactant s actions influence the other s behavior [9]. In this area a number of features of inter-human social behavior stand out. Timing, turn-taking, and synchronization dynamics in dyadic interaction has long been recognized as fundamental for communication [10]. Developmental psychologists have proposed that an important aspect of satisfactory positive interaction (for instance during social play) is reciprocal involvement, expressed by the level of mutual responsiveness as observed in conversational turntaking [11]. Recently, several authors have highlighted the impact of these dynamics of interaction, like the timing and facial/gestural expressiveness, in the study of human-robot interaction [12-14]. To investigate such effects, Staudte & Crocker [12] exposed subjects to videos of a robot gazing at different objects in a linear array tangent to the line of sight, and sentences referring to these objects, that were either congruent or incongruent with the videos. Subjects were to respond whether the sentence accurately described the scene. The principal findings of the study is the effect of robot gaze on human performance, with most rapid performance for congruent gaze, poorest performance for incongruent gaze, and intermediate performance when the robot made no gaze. Extending such studies into the domain of actual physical HRI, Huang et al. demonstrated that when task-related gaze cues anticipate the linguistic references in verbal communication, recall and response times are significantly improved vs. conditions where gaze is delayed or inconsistent [13]. Similarly, Boucher et al. demonstrated that when robot gaze is directed to a target object prior to the completion of the verbal specification of that object, subjects can anticipate the spoken specification, and begin to manipulate that object with significantly reduced (even negative/anticipatory) reaction times, vs. conditions where gaze is masked or eliminated [14]. These studies indicate the crucial role of gaze in coordinating joint action. In the behavioral sciences context of joint action, it is often difficult to determine whether an activity that is performed by multiple agents should be conceived of as a joint collaborative activity that is based upon joint intentions or merely as the common outcome of individual intentions. For example, each individual agent might be acting on an individual intention towards an individual goal, and even though the outcome emerges from the combined efforts of the agents, they are not necessarily acting jointly. In other words, what qualifies as a plural activity [15] does not necessarily qualify as a joint collaborative activity. In order to test whether coordinated joint activity can emerge in the absence of joint plans, one could experimentally manipulate the presence/absence of joint plans in experimenters who would interact with naïve subjects. However, while manipulating a human experimenter to use, or not, a joint plan is methodologically difficult if not impossible, this it is technically feasible with a robot subject. The motivation for the current research is thus twofold. The principal motivation is to manipulate the presence vs. absence of a joint plan within the robot cognitive system, in order to determine if joint task outcome with naïve subjects will be improved in the presence of a joint plan vs. parallel individual plans. The second motivation, within this context, is to determine the effects of spoken language and task relevant gaze cues in the successful communication and achievement of joint action. Fig. 1. Cooperative interaction task, in conditions with Joint plan, communicated by gaze alone, without speech. A. Yellow box initially covers blue toy. Human observes as robot moves towards box to uncover toy. B. Robot gazes to target object for human to grasp. C. Robot looks to Human to indicate that human should act. D. Human responds to gaze cue and initiates action to move the toy to the indicated location. We will report on experiments in which twelve naïve subjects interact in a cooperative task with the icub humanoid robot [16] under four experimental conditions that involve full cooperation, cooperation with no speech, cooperation with no gaze, and finally the solo condition in which the robot does not use a joint plan. We measure the effects of these manipulations on several specific measures of cooperation performance. The human and robot are to 130 work together to achieve the goal of uncovering a toy with a box, so that the exposed toy can then be retrieved, illustrated in Figure 1. II. COOPERATIVE ROBOT SYSTEM METHODS The experiments were performed with the icub robot [16], and the ReacTable instrumented table that could detect the identity and location of objects placed upon it, illustrated in Figure 1. The icub is controlled by a cognitive system [6-8] that provides for the creation and use of joint plans for the execution of sensory-guided actions in cooperation with the human subject. Our novel contribution with respect to cognitive system development is the introduction of coordinated gaze, speech and shared plan learning and execution, illustrated in Figure 3 and developed in section B below. A. Cognitive System for Cooperation The cognitive system is based on the creation and use of joint plans. A joint plan is defined as a sequence of actions with each action allocated to one of the agents, such that both agents represent this plan, and use it to achieve their shared goal. A hallmark of the joint plan is that it allows for role reversal that is the cooperation partners can reverse their complementary roles, with each taking on the previous role of the other, respectively [17]. Fig. 2. Control Architecture for cooperative interaction. Spatial location of objects on ReacTable communicated to Object Property Collector which stores the state of the world. Spatial reasoning detects object movement and spatial relations among them providing the required information for Action Recognition. Supervision and Planning monitors interaction, focuses gaze attention on linguistic references, and manages joint/shared plan execution. Shared plans and action definitions are stored for long term use in Knowledge Base. Motor command monitors manipulation of object predicates transformed to motor space using passive motor paradigm (PMP). See text for details. The current research thus exploits a series of developments that have resulted in the ability of the robot to learn a joint plan, to use that joint plan in cooperation with a human, and to demonstrate role reversal [6-8]. Again, in role reversal, the cognitive agent is able to use the same joint plan, but exchange roles with the other partner. This ability has been recognized by developmental psychologists as evidence that the agent has bird s eye view knowledge of the joint plan, rather than a purely ego-centric view [17]. The system is outlined in Figure 2. Joint/shared plans are managed by the Supervision and Planning subsystem. Through spoken language interaction, new joint plans can be established through different combinations of spoken language specification, imitation or demonstration. Spoken language generated by the robot has a semantics that is defined in terms of the shared task [18]. Verbs refer to the actions of manipulating objects on the table, common nouns refer to objects and pronouns to the robot and human. In the current experiment, a pre-learned joint plan for the shared task is employed. The finality of this plan is that the initially covered toy is uncovered and retrieved. The first agent uncovers the toy. The second agent can then retrieve the toy. Finally the first agent replaces the box on the table. Perceptual information about the location of objects and the human partner is extracted from the ReacTable (see below), and stored in the Object Properties Collector (OPC). When the joint plan calls for the robot to manipulate an object, the spatial coordinates of that object in the task-space of the robot are used to generate the appropriate action (specified in more detail below). In order to coordinate the unfolding joint action, the robot must perceive when the human has performed his actions. A simple spatial reasoning engine detects spatial relations of proximity and change in position based on OPC, so that Action Recognition can detect actions including put(object, location) [6]. B. Coordinated Speech and Gaze Control In order to coordinate speech and gaze, the Attention Focus module of Supervision and Planning handles the translation of actions in the joint plan, and determines the linguistic references. When the linguistic referent in the utterance is identified the gaze is directed there, and the word is sent to the speech synthesizer. This results in continuous speech with coordinated gaze. Gaze thus attains the target several hundred milliseconds before the linguistic reference is pronounced. At each step in the unfolding of a plan (joint or solo), the robot retrieves the current action, and identifies the agent, the object and the final location where that object should be placed. In conditions where gaze is active, prior to the execution of the next action, the robot directs coordinated gaze movements to the agent (if the agent is the human), then the objet, and finally the desired location where that object should be placed. In conditions where language communication is active, the gaze is coordinated with the timing of the spoken language. That is, if the robot says You put the box on the left, as each word you, box and left is spoken, the gaze is directed to that location. The location of the human subject is pre-specified based on the experimental set up as illustrated in Figure 1. The temporal structure of speech-gaze coordination is illustrated in Figure Fig. 3. Speech and gaze coordination. For all pertinent linguistic references (agent, object, recipient location), gaze is directed to the referent object, location, or human just prior to pronunciation of the linguistic referent. The period where gaze precedes linguistic referent termination is indicated by thick grey lines in the Gaze Channel. Final imperative gaze to user indicated in unfilled line on Gaze towards listener. When present, gaze thus provides an additional communication channel for joint plan management. Figure format modified from [13]. To achieve this coordination, the icub eyes can move independently in the horizontal and vertical head centered orbits, and the head has three additional degrees of freedom. Coordinated gaze, as eye-head motion can be directed with an inverse kinematics engine that will take the eyes to a target in the three-dimensional space task space surrounding in the icub. These movements are coordinated with an initial oculomotor saccade that is then followed by a slower head movement. The robot's eye movement and head movement completion times are respectively 100ms and 600ms. The eye thus attains the target first, with the head continuing to move to the target, and the eyes compensating for this continued head movement in order to stay fixated on the target. The generation of these human-like movements studied in human gaze is achieved in the robot with the icub gaze controller. The controller employed to coordinate the icub gaze acts intrinsically in the Cartesian space, taking as input the spatial location of the object of interest where to direct the robot attention, and then generates proper minimum-jerk velocity commands simultaneously to the neck and the eyes. C. Spatial Localization and Accurate Object Manipulation Objects are manipulated by the human and robot on an instrumented table (ReacTable ), as illustrated in Figure 1. Fiducial markers on the base of objects are detected by an IR camera beneath the ReacTable surface, providing millimeter accuracy for object localization. The 2D surface of the table is calibrated into the joint space of the icub by a linear transformation calculated based on a sampling of three calibration points on the table surface that are pointed to by the icub. Thus, three points are physically identified in the Cartesian space of the icub, and on the surface of the ReacTable, thus providing the basis for calculation of a transformation matrix which allows the projection of object coordinates in the space of the table into the Cartesian space of the icub. These coordinates can then be used as spatial arguments to the motor control system of the icub. Motor control is provided by PMP. The Passive Motion Paradigm (PMP) [19] is based on the idea of employing virtual force fields in order to perform reaching tasks while avoiding obstacles, taking inspiration from theories conceived by Khatib during 80's [20]. with a tool that relies on a powerful and fast nonlinear optimizer, namely Ipopt [21]; the latter manages to solve the inverse problem while dealing with constraints that can be effectively expressed both in the robot s configuration space (e.g. joints limits) and in its task-space. This new tool [22] represents the backbone of the Cartesian Interface, the software component that allows controlling the icub directly in the operational space, preventing the robot from getting stuck in kinematic singularities and providing trajectories that are much smoother than the profiles yielded by the first implementation of PMP. III. HUMAN EXPERIMENTAL METHODS A total of N = 12 naïve university subjects were tested in each of four conditions (specified in Table 1) in a humanrobot cooperation task, illustrated in Figure 1. The task was based on experimental paradigms used with human infants [4]. Specifically, the goal of the shared task was to retrieve a small object, the toy that was covered by a larger object, the box. One participant would lift the box, allowing the other to take the toy, and finally the first participant would replace the box on the table. Thus the joint plan requires three successive movements, allocated as stated to the two participants. Prior to the start of the experiments, subjects were informed of the structure of the task, and then were shown an example of how the shared task unfolded, with one of the experimenters interacting with the robot. Subjects were simply instructed to attempt to achieve the joint goal of retr
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks