[Joseph DeGol]

Automatic Grasp Selection Using A Camera In A Hand Prosthesis


Joseph DeGol, Aadeel Akhtar, Bhargava Manja, Tim Bretl

2016 International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC '16)


Oral Paper

Best Student Paper Award (3rd Place)


URL

Paper

Slides



ABSTRACT

In this paper, we demonstrate how automatic grasp selection can be achieved by placing a camera in the palm of a prosthetic hand and training a convolutional neural network on images of objects with corresponding grasp labels. Our labeled dataset is built from common graspable objects curated from the ImageNet dataset and from images captured from our own camera that is placed in the hand. We achieve a grasp classification accuracy of 93.2% and show through realtime grasp selection that using a camera to augment current electromyography controlled prosthetic hands may be useful.



DATA

For this paper, we used three datasets: Deep Grasping, ImageNet, and HandCam, which are all described in more detail below. These datasets were adapted or created to test the learning of grasps for a prosthetic hand given that an embedded camera sees an object in its field-of-view. More specifically, each dataset consists of close-range images of objects that can be picked up by a human hand.

Annotations:
Each image was hand-annotated to be one of five grasps: power, tool, key, pinch, three-jaw-chuck. The annotations are provided in json format as follows:

  { 
      "[image_name_0]":
      {
          "grip": "[grip_type]",
          "comment": "[comment_text]"
      },
      "[image_name_1]":
      {
          "grip": "[grip_type]",
          "comment": "[comment_text]"
      },
      etc...
  }
                                    
[image_name] - Name of annotated image
[grip_type] - one of five values: "3 jaw chuck", "key", "pinch", "power", or "tool"
[comment_text] - space for any comments during annotation

Deep Grasping:
This dataset was published by the Robot Learning Lab at Cornell by Ian Lenz, Honglak Lee, and Ashutosh Saxena as part of their paper "Deep Learning for Detecting Robotic Grasps". More information and their dataset can be downloaded from here: http://pr.cs.cornell.edu/deepgrasping/. In case the data is changed with their future work or removed, we include here the data and annotations that we used for our experiments.

Download Data

Download Annotations

Below are some sample images from this dataset. All images are taken from roughly the same perspective with an object on a white background. All images have a resolution of 640 x 480.

Below is a table of the bias in the dataset based on our annotations. More specifically, it is the percentage each grasp is represented in this dataset. As can be seen, the representation of each grasp is not equal. This motivated us to collect additional data.

Key Pinch Power 3 Jaw Chuck Tool
Bias 0.0 % 21.8 % 47.0 % 28.0 % 3.2 %

ImageNet:
ImageNet is a large, popular dataset in the computer vision community that was originally developed for object recognition and detection. The dataset consists of over 14 million images with annotated objects and bounding boxes. More information about ImageNet can be found at http://image-net.org/index. Since we want to learn how to classify between five grasps, we downloaded images for 25 common graspable objects: Ball, Basket, Blowdryer, Bowl, Calculator, Camera, Can, Cup, Deodorant, Flashlight, Glassware, Keys, Lotion, Medicine, Miscellaneous, Mugs, Paper, Pen, Remote, Scissors, Shears, Shoes, Stapler, Tongs, and Utensils. The images were curated based on a preference for close up views of real objects (trying to avoid computer generated images) which resulted in a final image count of 5,180. These images were then annotated into one of the five grasps.

Download Data

Download Annotations

Below are some sample images from our curated portion of the ImageNet dataset. As can be seen, the images are taken from a variety of viewpoints and lighting conditions. The resolution of the images varies.

Below is a table of the bias in the dataset based on our annotations. More specifically, it is the percentage each grasp is represented in this dataset. As can be seen, the representation of each grasp is not equal; however, it is closer to uniform when compared to the Deep Grasping annotations.

Key Pinch Power 3 Jaw Chuck Tool
Bias 11.8 % 10.6 % 47.5 % 19.2 % 10.9 %

HandCam:
The HandCam dataset was created to test how well our algorithm worked for grasp selection from the camera in our prosthetic hand. We trained our classifiers using the ImageNet dataset above and only used the HandCam dataset for testing. For each of the five grasps, ten objects were chosen and photographed from five different perspectives. This results in a total of 50 images per grasp and a total of 250 grasps total; which also results in an equal representation of each grasp.

Download Data

Download Annotations

Below are some sample images from our newly created HandCam dataset. The images are taken from the camera on the hand, which was a PointGray FireflyMV USB 2.0 with an output resolution of 640 x 480 pixels.

Below is a table of the bias in the dataset based on our annotations. More specifically, it is the percentage each grasp is represented in this dataset. As can be seen, the representations are uniform by design.

Key Pinch Power 3 Jaw Chuck Tool
Bias 20.0 % 20.0 % 20.0 % 20.0 % 20.0 %


VIDEO

Here is a video that describes the task of automatic grasp selection and demonstrates the use of a camera in a prosthetic hand for automatic grasp selection when viewing a nearby object. A Link to download the video is also provided.

Download



BIBTEX
@inproceedings{DeGol:EMBC:16,
  author    = {Joseph DeGol and Aadeel Akhtar and Bhargava Manja and Tim Bretl},
  title     = {Automatic Grasp Selection using a Camera in a Hand Prosthesis},
  booktitle = {EMBC},
  year      = {2016}
}
								


Last Updated: [General:7/14/16] [CamHand:7/14/16]