Cooking robot may offer artificial culinary intelligence
January 26, 2015
One of the greatest questions in developing of artificial intelligence is how to provide robots with a software template that enables them to recognize objects and learn actions by watching humans.
Researchers from the University of Maryland Institute for Advanced Computer Studies and the National Information Communications Technology Research Centre of Excellence in Australia have developed a software system that allows robots to learn actions and make inferences by watching cooking videos from YouTube.
“It’s very difficult [to teach robots] actions where something is manipulated because there’s a lot of variation in the way the action happens,” said co-author Cornelia Fermüller, a research scientist at the University of Maryland’s Institute for Advanced Computer Studies. “If I do it or someone else does it, we do it very differently. We could use different tools so you have to find a way of capturing this variation. ”
The intelligent system that enabled the robot to glean information from the videos includes two artificial neural networks that mimic the human eye’s processing resulting in object recognition, according to the study. The networks enabled the robot to recognize objects it viewed in the videos and determine the type of grasp required to manipulate objects such as knives and tomatoes when chopping, dicing and preparing food.
“In addition to [accounting for variation] there is the difficulty involved in capturing it visually,” Fermüller said. “We’ve looked at the goal of the task and then decomposed it on the basis of that.”
Fermüller said the group classified the two types of grasping the robot performed as “power” versus “precision.” Broadly, power grasping is used when an object needs to be held firmly in order to apply force—like when holding a knife to make a cut. Holding a tomato in place to stabilize it is considered precision grasping—a more fine-grain action that calls for accuracy, according to the paper.
When observing human activity in real life, robotic systems are able to perceive the movements and objects they are designed to recognize in three dimensions over time, Fermüller said. However, when the movement and objects are viewed in a video, that information is not as immediately understood.
“The way we think of videos is as a three-dimensional entity in the sense that there are two dimensions of space and one dimension of time,” said Jason Corso, an associate professor of electrical engineering and computer science at the University of Michigan. “It’s not as 3D as the world we live in, but one can use a video … which is a spacetime signal, and from it correspond feature points that could be used to reconstruct the 3D environment that is being seen or imaged in that video.”
According to the paper, the development of deep neural networks that are able to efficiently capture raw data from video and enable robots to perceive actions and objects have revolutionized how visual recognition in artificially intelligent systems function. The algorithms programmed into the University of Maryland’s cooking robot are one example of this neural functioning.
“So what was used here was really the hand description and object tool description, and then the action was inferred out of that,” Fermüller said.
Previous research on robotic manipulation and action recognition has been conducted using hand trackers and motion capture gloves to overcome the inherent limitations of trying to design artificial intelligence that can learn by example, she said.
“Part of the problem is that robot hands today are so behind what biological manipulation is capable of,” said Ken Forbus, a professor of computer science and education at Northwestern University. “We have more dynamic range in terms of our touch sensing. It’s very, very difficult to calibrate, as there’s all sorts of problems that might be real problems and any system is going to have to solve them.”
Forbus said some of the difficulty that presents itself in robotic design arises from the fact that the tools robots are outfitted with are far behind the ones humans are born with both physically and in terms of sense perception.
“There is tons of tacit knowledge in human understanding—tons,” Forbus said. “Not just in manipulation, [but] in conceptual knowledge.”
According to Forbus, artificial intelligence researchers have three ways to incorporate this type of conceptual thinking into intelligent systems. The first option is to try to design robots that can think and analyze in a manner superior to humans, and the second is articulating the tacit knowledge that humans possess by trying to boil it down into a programmable set of rules. The third way is to attempt to model the AI on the type of analogical thinking humans use as they discern information and make generalizations that help provide a framework for how to act during future experiences.
“That’s a model that’s daunting in the sense that it requires lots and lots of [programmed] experience,” Forbus said. “But it’s promising in that if we can make analogical generalization work in scale … it’s going to be a very human-like way of doing it.”