Trends and challenges in robot manipulation

See allHide authors and affiliations

Science  21 Jun 2019:
Vol. 364, Issue 6446, eaat8414
DOI: 10.1126/science.aat8414

Hand it to you

Our ability to grab, hold, and manipulate objects involves our dexterous hands, our sense of touch, and feedback from our eyes and muscles that allows us to maintain a controlled grip. Billard and Kragic review the progress made in robotics to emulate these functions. Systems have developed from simple, pinching grippers operating in a fully defined environment, to robots that can identify, select, and manipulate objects from a random collection. Further developments are emerging from advances in computer vision, computer processing capabilities, and tactile materials that give feedback to the robot.

Science, this issue p. eaat8414

Structured Abstract


Humans have a fantastic ability to manipulate objects of various shapes, sizes, and materials and can control the objects’ position in confined spaces with the advanced dexterity capabilities of our hands. Building machines inspired by human hands, with the functionality to autonomously pick up and manipulate objects, has always been an essential component of robotics. The first robot manipulators date back to the 1960s and are some of the first robotic devices ever constructed. In these early days, robotic manipulation consisted of carefully prescribed movement sequences that a robot would execute with no ability to adapt to a changing environment. As time passed, robots gradually gained the ability to automatically generate movement sequences, drawing on artificial intelligence and automated reasoning. Robots would stack boxes according to size, weight, and so forth, extending beyond geometric reasoning. This task also required robots to handle errors and uncertainty in sensing at run time, given that the slightest imprecision in the position and orientation of stacked boxes might cause the entire tower to topple. Methods from control theory also became instrumental for enabling robots to comply with the environment’s natural uncertainty by empowering them to adapt exerted forces upon contact. The ability to stably vary forces upon contact expanded robots’ manipulation repertoire to more-complex tasks, such as inserting pegs in holes or hammering. However, none of these actions truly demonstrated fine or in-hand manipulation capabilities, and they were commonly performed using simple two-fingered grippers. To enable multipurpose fine manipulation, roboticists focused their efforts on designing humanlike hands capable of using tools. Wielding a tool in-hand became a problem of its own, and a variety of advanced algorithms were developed to facilitate stable holding of objects and provide optimality guarantees. Because optimality was difficult to achieve in a stochastic environment, from the 1990s onward researchers aimed to increase the robustness of object manipulation at all levels. These efforts initiated the design of sensors and hardware for improved control of hand–object contacts. Studies that followed were focused on robust perception for coping with object occlusion and noisy measurements, as well as on adaptive control approaches to infer an object’s physical properties, so as to handle objects whose properties are unknown or change as a result of manipulation.


Roboticists are still working to develop robots capable of sorting and packaging objects, chopping vegetables, and folding clothes in unstructured and dynamic environments. Robots used for modern manufacturing have accomplished some of these tasks in structured settings that still require fences between the robots and human operators to ensure safety. Ideally, robots should be able to work side by side with humans, offering their strength to carry heavy loads while presenting no danger. Over the past decade, robots have gained new levels of dexterity. This enhancement is due to breakthroughs in mechanics with sensors for perceiving touch along a robot’s body and new mechanics for soft actuation to offer natural compliance. Most notably, this development leverages the immense progress in machine learning to encapsulate models of uncertainty and support further advances in adaptive and robust control. Learning to manipulate in real-world settings is costly in terms of both time and hardware. To further elaborate on data-driven methods but avoid generating examples with real, physical systems, many researchers use simulation environments. Still, grasping and dexterous manipulation require a level of reality that existing simulators are not yet able to deliver—for example, in the case of modeling contacts for soft and deformable objects. Two roads are hence pursued: The first draws inspiration from the way humans acquire interaction skills and prompts robots to learn skills from observing humans performing complex manipulation. This allows robots to acquire manipulation capabilities in only a few trials. However, generalizing the acquired knowledge to apply to actions that differ from those previously demonstrated remains difficult. The second road constructs databases of real object manipulation, with the goal to better inform the simulators and generate examples that are as realistic as possible. Yet achieving realistic simulation of friction, material deformation, and other physical properties may not be possible anytime soon, and real experimental evaluation will be unavoidable for learning to manipulate highly deformable objects.


Despite many years of software and hardware development, achieving dexterous manipulation capabilities in robots remains an open problem—albeit an interesting one, given that it necessitates improved understanding of human grasping and manipulation techniques. We build robots to automate tasks but also to provide tools for humans to easily perform repetitive and dangerous tasks while avoiding harm. Achieving robust and flexible collaboration between humans and robots is hence the next major challenge. Fences that currently separate humans from robots will gradually disappear, and robots will start manipulating objects jointly with humans. To achieve this objective, robots must become smooth and trustable partners that interpret humans’ intentions and respond accordingly. Furthermore, robots must acquire a better understanding of how humans interact and must attain real-time adaptation capabilities. There is also a need to develop robots that are safe by design, with an emphasis on soft and lightweight structures as well as control and planning methodologies based on multisensory feedback.

Holding two objects in one hand requires dexterity.

Whereas a human can grab multiple objects at the same time (top), a robot (bottom) cannot yet achieve such dexterity. In this example, a human has placed the objects in the robot’s hand.



Dexterous manipulation is one of the primary goals in robotics. Robots with this capability could sort and package objects, chop vegetables, and fold clothes. As robots come to work side by side with humans, they must also become human-aware. Over the past decade, research has made strides toward these goals. Progress has come from advances in visual and haptic perception and in mechanics in the form of soft actuators that offer a natural compliance. Most notably, immense progress in machine learning has been leveraged to encapsulate models of uncertainty and to support improvements in adaptive and robust control. Open questions remain in terms of how to enable robots to deal with the most unpredictable agent of all, the human.

Have you ever found yourself busy foraging in your bag in search of a set of keys? If so, you may recall that it took only a few seconds to find them among the disparate contents of the bag. For certain, you did not reflect on your abilities and may have carried on this display of unique dexterity through a swift in-hand manipulation, taking out the correct key and inserting it into the lock even though the corridor lights had gone out. All day long, our fingers grasp, move, and transform objects and interact with objects in various media such as air, water, and oil. We do not spend time thinking about what our hands and fingers are doing or how the continuous integration of various sensory modalities—such as vision, touch, proprioception, and hearing—help us outperform any other biological system in the breadth of the interaction tasks we can execute. Largely overlooked, and perhaps most fascinating, is the ease with which we perform these interactions, resulting in a belief that they are also easy to accomplish in artificial systems such as robots.

Manipulating objects is such a ubiquitous activity that we forget how difficult it was to acquire this competence as a child. Children are born with simple grasp reflexes. It takes them 3 years to develop an individuated control of each finger and another 6 years to display an adult-equivalent ability for making smooth contact and for planning sequences of manipulation skills (1). Even for humans, some dexterous activities may pose a challenge. For example, tying shoes may be done in various ways, and there may be several valid models of how to execute such an activity. In addition, we can visually demonstrate how to do something and what the expected result may be, but we cannot easily communicate the magnitude of the applied forces and torques or the size of the friction coefficient necessary to satisfy stability conditions. Still, we find ways of achieving manipulation goals through training and exploration even if the end result is not always optimally performed. We may also adapt as circumstances dictate (e.g., tying shoes with an excess of free shoelace or when the ends are quite short), forcing us to deviate from our normal methods. Thus, the context in which interactions are performed affects various parameters of the execution.

Although robotics has made vast progress in mechanical design, perception, and robust control targeted to grasping and handling objects, robotic manipulation is still a poor proxy for human dexterity. To date, no robots can easily hand-wash dishes, button a shirt, or peel a potato.

What can robots do today?

Robots are skilled at picking up and manipulating objects in repetitive and familiar settings such as industrial assembly setups. In such settings, the geometry, material properties, and weight of the objects are commonly known. Robots can handle some variation in routine movements in terms of adapting to small differences in the object properties, but the whole process is typically optimized to a limited set of expected variations. In early factory settings, robot arms followed predetermined trajectories and assumed that objects would always appear at the same place. Today, robots can adapt their trajectory to retrieve objects at different locations, making it possible for objects to be placed by humans or simply dropped on a conveyer belt instead of being deposited at exact positions by other machines. The classical assembly lines in which robots were bolted into the floor and placed one after another, typical for the automobile industry, can now be made more flexible. Objects moving on conveyers can be detected fairly easily by cameras and picked up if fully visible. However, detection of transparent objects or objects partially hidden (e.g., when stacked on top of one another) remains difficult.

With the need to frequently change the type of goods produced, the robotics industry strives for multipurpose object grasping and handling solutions. One step toward this objective is to provide robots with a choice of grippers varying in size and strength and to enable robots with tool-changing mechanisms so that they can select the correct tool. To determine which tool to use for a given task, a robot must have knowledge of an object’s properties, such as shape, weight, material, and so forth. This information is readily available in factories where all objects are known. However, this requirement presents a limitation for robots in other settings, where the set of objects to be manipulated may not be known beforehand.

What can robots not do today?

Although robots are adept at handling rigid objects, they still struggle with flexible materials—such as fruits and vegetables or clothing items—that differ in size, weight, and surface properties. Manipulations that produce a deformation (e.g., inserting, cutting, or bending) are particularly difficult, as accurate models of the deformations are needed. Industrial grippers often use pneumatic vacuum pumps to pick up objects by sucking. This technique is unbeatable when it comes to grasping an object but is much less useful for object manipulation (e.g., reorienting the object and placing it in a confined space). One step to address this challenge is to provide robots with more dexterous hands. Yet creating hands as dexterous as human hands is difficult, owing to a lack of sensors and actuators equivalent in size, precision, and efficiency to our skin and muscles.

Improvements in robots’ dexterity are not limited to the engineering of more-capable hands. Advanced software programs are required to analyze in real time the large flux of visual, tactile, and force information and to relate these different senses to recognize objects and model their transformations. Additionally, robots need advanced cognitive capabilities to predict where, how, and why to manipulate objects. The rest of this Review describes why overcoming these challenges is difficult and where the field of robotics stands today.

Why is designing robotic hands difficult?

Although research on robot hands has been ongoing for more than five decades (24), the most common hand used in many applications to date is still a parallel jaw gripper, usually without any extra sensing. Picking up objects with a gripper devoid of sensing is akin to grasping with the tip of your thumb and index finger when both are numb! This tool may suffice for simple pick-and-place actions, but not for more-complex motions such as shuffling keys. Because the human hand performs intricate movements with ease, it is a natural inspiration for robotics. But designing robotic hands with sensors and actuators similar to those of the human hand is difficult for many reasons.

When constructing anthropomorphic robot hands, it is challenging to fit all of the necessary actuators, sensors, and mechanical structure in the limited available space. Another obstacle is to keep the total weight of the hand low so that it satisfies the payload requirements of the arm to which it is attached. Hence, compared with human hands, most anthropomorphic hands and prostheses do not have nearly as many controllable degrees of freedom (5, 6).

The human hand is soft and flexible, with a dexterous thumb whose distinctive range of motion remains difficult to replicate mechanically (7), as the intricate combinations of tendons and muscles differ markedly from traditional serial robotic joint design (8). Today, robotic hands are still largely composed of rigid plastic and metal components, with electric motors as actuators. This rigidity is partially the cause of the lack of dexterity, as it allows no room for mistakes when executing grasps. Rigid fingers closing on an object may easily move rather than grasp an object if its pose is not perfectly estimated, and applying too much force may crush the object. A growing trend in robotics is the development of soft hands that can conform to an object’s shape, absorb unexpected forces at contact, and compensate for load change during manipulation (9, 10).

Softness can be achieved through a change in hardware or software or a combination of both (Fig. 1). Softness from material used to construct hands builds on solutions from 3D manufacturing and materials science. For instance, one can manufacture rigid and flexible materials in a layer-by-layer manner to create foldable fingers that can deploy and retract as needed (11). Currently, the low payload and slow speed of these elastomers restricts manipulation to light objects only. As an alternative to generate more power, pneumatic or hydraulic actuation may be used (12, 13).

Fig. 1 Soft hands.

Robotic hands are traditionally made of hard materials with rigid control of fingers. Recent designs aim to mimic the human hand’s natural compliance by using soft actuators, soft materials, and advanced controllers. (A) Rigid material and actuators and (B) a rigid cover with partially soft cable-driven actuation: Both hands become soft through software intervention, modulating pressure at the fingertips via tactile feedback. [Reproduced from (17, 33)] (C) A soft, foldable gripper that can adapt shape and stiffness. [Reproduced from (11)] (D) Soft actuation and material for a rehabilitation glove that can be worn by a human. [Reproduced from (12)]

Human hands are covered with a multipurpose skin that provides the appropriate level of friction and damping. Human skin is a high-frequency and high-resolution sensor that provides precise information on normal and tangential forces, information that is critical for grip adjustment. Human skin can also measure stretch and temperature. By contrast, robot hands typically measure exerted forces through miniature force sensors placed solely at the fingertips (14). Force sensors yield very accurate 3D measurements, but they cannot easily reveal the exact location of contact. To move objects once held in the hand or to hold multiple objects at once (Fig. 1), one needs to measure precise contact points, not just at the fingertips but also along the length and side of the fingers and inside the palm. This can be achieved through artificial skins that provide contact measurements all along the limbs. Interest in artificial skins can be traced back to the 1980s (15, 16), but major advances were achieved in the past decade.

At present, we find a variety of affordable commercial products, several of which can be customized to a robot’s shape. Touch sensors measure the normal contact force; a few also provide data on tangential forces, torque, temperature, vibrations, or surface properties. Nevertheless, most touch sensors are rigid, and their placement is constrained to fingertips and along limb segments. Touch detection at the joint (knuckle, elbow, knee) is, however, crucial to detect entrapment. It is also useful to guide exploration inside objects (17). Such contact can be detected only by soft sensors that bend and extend along the flexion and extension points of the limb (Fig. 2, right) (18). Hence, flexible and stretchable skins are of utmost interest to roboticists (19). Prototypes exist in laboratories, and we can expect to see their deployment soon, given the current interest in soft electronics (20).

Fig. 2 Sense of touch for robots.

(Left) Vision can be used to infer contact forces (red). [Reproduced from (21)] (Right) Stretchable artificial skin measures contact at knuckles, which may be useful for exploring internal parts of objects. [Reproduced from (18)]

As an alternative to using skin, it is possible to deduce haptic (contact and force) information from vision. One can, for instance, infer forces from vision through a dynamic model of contacts (21) (Fig. 2, left) or use an optical sensor that renders deformation of an object’s geometry at a high spatial resolution (22). The necessity of estimating the exact position of an object, its local geometry, and other properties such as weight and weight distribution depends strongly on the application. It is the interplay among hand design, material, and internal and external sensing that offers the appropriate redundancy. One additional challenge is the need to measure contact at a very high frequency for accurate and timely detection of slip (23). Such high spatial and temporal resolution, together with the real-time processing of vision data for object tracking, leads to a computational overload with a massive data stream that must be interpreted in real time. This processing is usually carried out by a CPU (central processing unit) located away from the hand. Alternatively, processing may be performed on the hand itself through the use of a dedicated CPU (24), but such CPUs have focused solely on processing vision data. Additional studies are needed to develop hardware for processing tactile information in conjunction with vision.

Dexterous robot hands may hence be realized by using research in materials science for the design of soft actuation, enabling contact sensing along the entire surface of the hand, and by using advances in electronics for onboard, real-time processing of multisensory data.

Design beyond anthropomorphism

Although the human hand is fascinating, it does not have to be the ultimate solution for robotics. A human hand design may be desirable for aesthetic reasons; for instance, when designing hand prostheses or humanoid robots. But this same design may be superfluous for many robots. Industrial hands remain a good solution for specific tasks. Rather than try to replicate the positioning of human fingers, these hands have two or three fingers arranged symmetrically around the palm, a design particularly suited for industrial screwing.

Robotics keeps oscillating between anthropomorphic and traditional industrial designs for hands. But the gripping systems of simpler animals may also provide inspiration. For instance, fish suck in their prey. Adding suction at robots’ fingertips is useful under water, as this technology cancels the flow generated by the hand (25).

Why not create hands that both leverage and go beyond nature? For instance, the human thumb is amazing, but it creates an asymmetry that constrains the orientation of the hand for manipulation. Two thumbs on the same hand, however, would provide a dexterity beyond human capability (Fig. 3).

Fig. 3 Designing hands beyond human dexterity.

Two thumbs would make it possible to execute screwing and unscrewing motions with one hand rather than two. This capability may be useful for robots and humans via prostheses. [Illustrations: Laura Cohen]

Desiderata for the next generation of robotic hands

The objects around us have been built and adapted to our hands, which are still rather small and very robust in comparison with contemporary robot hands. Enabling robots to pick up small items such as pens, raisins, screws, and needles is a clear functionality goal. Today, robotic arms and hands are commonly developed separately, and integrating them is an engineering job of its own. Industrial arms have substantial payloads but are commonly designed to be bolted into the floor and are too large to be deployed outside industrial settings. The arms of humanoid robots and robots intended for fine assembly tasks have low payloads, which are typically not sufficient to carry a hand and an object held by the hand. Adding sensing functionality to arms and hands requires cabling that can quickly become complicated. Furthermore, many hands come with no or limited means of measuring contact and forces. Thus, a change in paradigm is needed to move away from developing robotic arms devoid of hands and hands devoid of arms. We must further ensure that hands are developed in a “plug-and-play” manner and can easily be attached and detached through existing tool-switching systems. State-of-the-art force and tactile sensors must become an inherent part of the arm–hand system.

Robot dexterity is as much a by-product of advances in hardware as it is of advances in software. It requires suitable algorithms to rapidly and efficiently process the vast amount of information collected through sensors and actuators. At the same time, it needs algorithms to adequately control the movement of a hand in relation to object, scene, and task properties. We next review advances in perception, control, and learning for manipulation.

Perception for manipulation

As for humans, robot perception for manipulation is multimodal (Fig. 4). Vision is instrumental for recognizing and localizing objects. When associated with a database of existing objects, robot vision can help infer geometric and physical properties of known and even unknown objects (26), and this information is important for shaping the aperture of the hand and the forces to be applied. Proprioception—namely, knowledge of where the robot’s limbs are located—is needed to guide the arm and hand toward the object, with visual support to continuously track the object. Touch and force measurements become important once contact has occurred and the object is held or explored by the hand. The associated control algorithms are used to guide the grasp and/or to infer the object’s physical properties, such as rigidity and mass distribution, that may have been poorly estimated or unknown previously. Sound has also received attention recently as a means to infer an invisible object’s content and to monitor changes in content during manipulation (27).

Fig. 4 Manipulation is multimodal.

Vision is used before contact, whereas haptics and sound are involved upon contact to estimate an object’s physical properties that cannot be directly observed. [Photo: Learning Algorithms and Systems Laboratory, EPFL]

As an example, a robot is tasked with fetching a package of milk from a refrigerator. Before the robot holds the package in hand, it may not know how much milk the package contains nor the package’s actual weight. Given that the package may be made of cardboard, the robot needs to know the weight in order to apply a suitable grasping force and avoid destroying the package. In the case of milk, sound may also provide information about the viscosity when the package is shaken, as milk will sound different from another substance, such as yogurt.

Over the past few years, major efforts have been undertaken to analyze visual information, and progress has been considerable. Nevertheless, robots still struggle to recognize objects that are partially occluded (28), particularly when viewed from a moving camera or when an object moves in a robot’s hands (29). In comparison to developing vision algorithms, much less effort has been devoted to analyzing haptic information, given that solutions for covering entire hands with haptic sensors are still lacking. Today, visual and haptic information are still used primarily in a sequential manner [e.g., with visual information provided in the preparation phase and haptic data provided upon contact (30)], and only a few recent works integrate both modalities for recognition, grasping, in-hand adaptation, and shape reconstruction (3135). In comparison, humans are proficient at alternating between different senses, from vision to touch and back, and can do so rapidly even if these senses change in processing frequency. By contrast, robots still lack the ability to decide what sensors to use, when to use them, and when to switch between sensors.

Grasping: A stepping stone

Before a robot can manipulate an object in hand, it must be able to grasp its fingers around the object. If grasping is conceptualized only as getting fingers around an object with no additional constraints considered, the challenge of grasping may appear to be solved. However, grasping an object is a far more daunting problem. For decades, researchers have worked to establish the theory of how to form a stable grasp. This became an intricate mathematical exercise aimed at determining the minimal number and optimal positions of the fingertips on the object’s surface to ensure stability (36).

Although it is valuable, most of this theoretical work relies on assumptions such as a known 3D model of the object, a rigid point contact, and no uncertainty in the process. To incorporate uncertainty originating from imperfect object models and dynamics in the interaction process, we must go beyond modeling a single point contact and pursue substantial advances in the basic theory.

Thus, many of the more recent approaches are data driven (37). To avoid computing an optimal grasp each time a robot encounters an object, one can build a database of grasps and employ methodologies for sampling and ranking candidate grasps in real time. This approach deals with uncertainty in perception and provides fast and online generation of grasps for known, familiar, and even unknown objects. Prior knowledge of object properties determines the necessary perceptual processing and associated object representations for generating and ranking grasp candidates. Although this method works well for known and familiar objects, unknown objects necessitate additional heuristics for the discovery of geometric structures (e.g., handles, for which a robot would have a candidate grasp). This challenge is closely related to the classical problems of instance recognition and categorization in computer vision, but the notion that grasping is not an isolated process adds a new dimension.

In addition to being object dependent, grasps are also robot dependent. Moreover, as the number of degrees of freedom of the hand increases, so does the complexity of the control. This is particularly an issue for anthropomorphic hands. One avenue of research to simplify the control draws inspiration from biology and promotes the use of postural synergies (38). Synergies form a basis of the subspace of effective human movements in relation to those that are possible by the kinematics of the body. These have been used as a tool for robot hand analysis, control, and design choice (3942). Several studies have also demonstrated how underactuated hands can be leveraged to grasp and manipulate objects in unstructured environments and how this work may lead to adaptive hands that are relatively cheap, lightweight, and easy to control compared with fully actuated hands (4348). More recent work has optimized hand design to improve manipulation capabilities (49, 50), providing open-source software for such design. Other recent work has suggested that the ability of compliant hands to deform in and with the environment may reduce the cognitive load of manipulation (51). Furthermore, this idea can be studied systematically using morphological computation (52), in which compliant interactions allow adaptation of behavior to a particular context, without the need for explicit control.

From grasping to manipulation

Grasping is not an end on its own; it is also related to the task a human or robot is executing. For example, one grasps a cup differently depending on whether the goal is to drink from it, fill it with fluid, put it in a dishwasher, or serve it to another person (53) (Fig. 5). Similarly, although a knife, fork, or spoon may be held with the same grasp when used to mix soup, this grasp differs from those employed when these utensils are used for eating or cutting. To determine the optimal way to grasp an object, one must understand the purpose of the grasp. Hence, while roboticists aimed to solve the problem of how to grasp an object, they first had to identify the reason for executing the grasp. Today, researchers consider grasping as part of an overall plan for object manipulation.

Fig. 5 Grasp functionality.

(Top) A human grasps an item differently depending on whether the aim is to hold it, open the cap, or hand it to someone else. (Bottom) Robots can also be programmed to hold the same glass differently depending on whether they are tasked with handing it to a human or pouring out its contents. [Photos: Learning Algorithms and Systems Laboratory, EPFL]

To determine the correct grasp to use with the correct tool, one must first have the correct tool at their disposal. When in need of a hammer but no hammer is in reach, a human will instead select the first object sturdy enough to act as a hammer. Future efforts to develop robots that can reason in this manner when the most appropriate tool is not available will be critical to facilitate deployment of robots in natural environments. Additionally, robots with this capability will be able to use tools originally designed for human dexterity to perform household tasks without making undesired modifications to our households. How to program such “common sense” tool use is hence an important avenue for research, and some initial work has been conducted in this direction (5457).

Manipulations that remain difficult

The previous sections detail the many problems that remain to be solved before robots can perform grasps with a human level of intelligence. This said, robots are already fairly efficient at grasping and releasing certain types of objects. They are also capable of performing a variety of simple manipulation actions such as throwing (58), sliding (59), poking (60), pivoting (61), and pushing (62). Difficulties arise when these actions must be performed in cluttered environments or require contact-rich interactions (e.g., when an object of interest is placed close to or is covered by other objects or is located in a confined space such as a shelving unit). It is necessary to plan a feasible path and generate a set of intermediate actions to ensure no damage to the hand or other objects. Today, it is also recognized that perception and control are tightly coupled, and the field of interactive perception (63) regards manipulation as a means to perceive and perception as a means to achieve better manipulation.

Manipulation actions that generate changes on the object (cutting, crushing) remain particularly difficult, as they require a model of the deformation and advanced perception to monitor the alterations (64). To facilitate adaptation to the changes induced, these actions also necessitate that the forces be applied by the hand (e.g., a reduction of friction when unscrewing a bottle cap, an increase in viscosity when digging into a melon) (Fig. 6) (65). Thus, modeling an object’s friction and viscosity properties is still an important open problem.

Fig. 6 Remaining challenges for robot manipulation.

Dexterous movement of objects within the hand (left), manipulation of deformable objects (e.g., fruits and vegetables) (65) (middle), and manipulation of objects in collaboration with humans (right) present ongoing difficulties. [Photos: Learning Algorithms and Systems Laboratory, EPFL]

In-hand manipulations in which an object is moved while being held are also particularly complicated. Examples include twirling a pen across fingers or preparing a key to be inserted into a keyhole. These actions comprise an extensive combination of (re)grasping movements and sliding and rotating maneuvers, as well as interactions between two arms and hands, in some cases. When discussing such advanced interactions with objects in robotics, we commonly talk about intrinsic and extrinsic dexterity. The former denotes the ability of the hand to manipulate objects using its available degrees of freedom. Hands with high intrinsic dexterity often mimic the structure of the human hand (66). Alternatively, the hand can be simpler, and the end-effector is designed specifically for a particular task (67, 68). Extrinsic dexterity is the ability to compensate for the lack of degrees of freedom by using external support, such as friction, gravity, and contact surfaces (69). This functionality also enables dexterous manipulation with simple parallel grippers.

One of the largely underdeveloped areas in robotics is dual-arm or bimanual (70) manipulation, as well as the use of the second hand and/or arm to support both intrinsic and extrinsic dexterity (71). Some recent work in this area (72) proposes integration of object representation, definition of simple movement primitives, and planning to model the problem in an efficient manner. This area will gradually produce more and more contributions in the future, given that most of today’s humanoid robots have bimanual capabilities. Furthermore, manipulation does not stop at simply controlling the hand; it requires control of the arm, torso, and ultimately the entire body (73). The challenges listed above only increase in scale when one wishes to enable a full humanoid robot to manipulate objects while maintaining its balance (74) (Fig. 7). Finally, control of more-complex manipulation skills that require reasoning, such as using an object to retrieve another object, are still in infancy.

Fig. 7 Whole-body manipulation.

Manipulation of a heavy object by a humanoid robot requires coordination of arms and body to maintain balance. [Reproduced from (74)]

Learning for manipulation

Human dexterity is a skill acquired during childhood and further refined throughout life, in activities such as playing a musical instrument or practicing a craft. Similarly, robot dexterity cannot be achieved in the confines of our laboratories. To be able to manipulate the vast array of objects that exist throughout the world, robots must be able to learn continuously, adapt their perception, and control unfamiliar objects.

Learning also addresses some challenges linked to the lack of accurate models of objects and contact dynamics and the increasing complexity of control for robots with large degrees of freedom. Hence, many of the present approaches to dexterous manipulation rely on learning methodologies in place of control-theoretic approaches. For instance, learning can be used to embed representations of stable or suitable grasps (7578), which can then be applied to verify stability and generate regrasping motions at run time or to catch a fast-moving object (79). Learning is particularly suitable for embedding the dynamic nature of grasping and manipulation, as well as for modeling manipulation of complex nonrigid objects. Learning has been used to model contacts (80) and is also beneficial for reducing control dimensions by determining latent space, as required in bimanual dynamics (65).

Nevertheless, solving all problems by solely relying on learning is not a viable solution and has certain limitations. First, learning requires data for training, and a common approach is to generate data from trial-and-error experiments. However, this process is tedious and may damage the robot. A growing trend in providing training data is to test the algorithms in simulation first and then refine the learning on a real platform; e.g., for learning dexterous in-hand manipulation (8183). Training in simulation depends on having an accurate simulator of the tasks. Alternatively, robots may learn from image data and videos available on the internet (84) or from demonstration by a live expert, usually a human. Yet it may not always be possible to find an expert, especially when the tasks are dangerous or require extreme precision. Hence, although learning is important, it cannot be the answer to every problem in robotics.

Manipulating objects in interaction and collaboration with humans: Reality and challenges

Human–robot collaboration in manufacturing setups has been deemed crucial for the industry (85, 86). Although historically, humans were prohibited from entering the robot’s environment (ISO 10218; ANSI/RIA R15.06-1999), it is now accepted that robots can work in close proximity to and in collaboration with humans. However, potentially dangerous scenarios may still occur and need to be addressed. Currently, human–robot collaboration is allowed through the use of manipulators that remain fairly light and are endowed with internal force sensors for detecting unexpected contact or collision with humans. For applications that require maneuvering heavy weight, robots capable of managing the weight may be coupled with an external vision system for monitoring human presence. Yet challenges remain for accurately detecting human presence. At present, the best solution is to combine sensing of proximity and force with external vision-based monitoring. Nonetheless, the 100% fence-based safety paradigm is gone, and industrial standards now target risk minimization and mitigation (ISO/TS 15066).

In addition to facing a world in which objects move and change, robots are now expected to manipulate these objects in collaboration with humans. Interactive and collaborative manipulation adds a new dimension to robot manipulation but presents a wealth of challenges (87). For example, when a robot is tasked with handing an object to a person or carrying a large object jointly with someone, the robot must grasp and move the object carefully and with foresight, so that the robot can infer where the human will move and the human does not get injured. As simple as it may seem, the act of a robot handing an object to a human entails several complex questions, which have in turn inspired studies on how to enable a robot to properly perform this task (8892). These questions range from how to present an object for optimal human grasping to others related to social factors, such as the role of gaze, social cues, and awareness of user state. There is no agreement on what factors are most important in determining how handovers are carried out between two humans, let alone between a robot and a human. Although most research has focused on a robot handing objects to humans, there have also been studies on robots taking objects from humans (9395). Additionally, several efforts have been aimed at enabling a robot to manipulate objects jointly with humans, and the act of carrying objects jointly with humans has been demonstrated using both humanoid robots (9699) and mobile manipulators (100103). Notable recent efforts have explored human–robot joint manipulation of deformable materials (104), helping humans to dress (105), and assistive support (106).

Hence, for robots to work seamlessly with humans, researchers are striving to equip robots with the tools for better perception of humans and more adaptive control modes. In addition, roboticists seek guarantees in terms of machine performance and the use of common evaluation scenarios and benchmarks.


Since the 1960s, substantial progress has been achieved in several areas of robotic manipulation. We have established the basic theory of evaluating stability of a grasp, control algorithms that can adapt to unpredicted situations, and changing dynamics when the appropriate sensor feedback is available to perform state estimation. Lately, the field has also seen advances in data-driven approaches in which even dexterous in-hand manipulation can be accomplished, but only for very specific problems and in highly tailored environments. Achievement of robust, flexible, and adaptive grasping and manipulation of completely unknown objects in media such as water and oil (not solely in air) is expected to result in a major manufacturing revolution, which will affect most of the work that relies on fine manipulation and high dexterity. However, systematic development is ongoing toward several technologies vital for meeting and exceeding human dexterity and fine manipulation capabilities.

First, there is still a need for basic theoretical development. We must seek to understand and model soft point contact and provide stability rules for both point contacts and surface contacts. We also need to develop a better method for modeling objects whose states change markedly after manipulation (e.g., a cucumber after being sliced, an onion after being chopped). A thorough description of manipulation and task goals will be required for planning and generating appropriate intermediate grasping and manipulation actions. This emphasis on theory and planning is also relevant for data-driven approaches, as we need better tools for simulating soft bodies and generating relevant scenarios and examples that include force and torque information.

In addition to the aforementioned modeling and software aspects, we also seek to achieve substantial progress in hardware development and design. One area of particular relevance is that of robot sensing. It will be important to develop skinlike sensors that are well integrated with hand design but do not require excessive cabling or add substantial weight. This sensing functionality should facilitate force and torque measurements, determining shear forces to detect and counteract slippage. To achieve dexterous in-hand manipulation, we also need actuated hands that can be controlled with high frequencies. Such hands must function in different media (air, water, and oil) without being damaged or needing to be covered by special gloves. Overall, we need hands that are light, cheap, robust, and easily integrated with any type of robotic arm.

Finally, an important industrial challenge will be to bring robots in closer proximity to humans and enable safe physical interaction and collaboration. Fences that used to separate humans from robots will disappear gradually. Robots will thus need to be engaged in collaborative tasks to jointly manipulate objects with humans while adapting to unexpected human behavior. Equipping robots with advanced physical interaction capabilities to achieve safe and smooth synchronization of motion between machine and human is still a major hurdle. This objective will require advances in detailed tracking of human fine body movement, as well as a better understanding of how humans collaborate and achieve joint goals through planning and direct physical interaction. Furthermore, there is a demand for robots that are safe by design, putting focus toward soft and lightweight structures as well as control and planning methodologies based on multisensory feedback. Human ways of acting will continue to serve as inspiration for future robot systems, and robots will serve as a tool for better understanding humans.

References and Notes

Acknowledgments: We thank the reviewers for many helpful comments to improve the article, A. Kheddar and J. Paik for providing images of their research, and L. Cohen for hand-drawn illustrations. Funding: We acknowledge funding from the European Research Council, the Knut and Alice Wallenberg Foundation, and the Swedish Foundation for Strategic Research. Competing interests: The authors declare no competing interests.

Stay Connected to Science

Navigate This Article