Tactile Augmentation for Vision-based Pose
Estimation

Overview

For most sensing mechanisms, there are failure cases for specific scenarios. Vision is difficult with objects that are reflective or in some types of lighting. Ultrasonic sensors are unreliable in locations with excessive background noise. Therefore, it is important to have redundant methods of validating one’s perception of the world.

As with many projects in robotics, our motivation stemmed from natural human interaction with our environment. When humans are unsure of their sense data, we validate our surroundings by touch. We thus seek to bring tactile feedback to vision based reconstruction to mitigate the effects of failure cases. In order to force difficulty in CV recognition of objects, we decided to use specular or clear objects that would result in insufficient initial point clouds.

Thus, we aimed to use touch feedback to augment vision-based pose estimation of such difficult objects, by utilizing a particle filter inspired algorithm as a means to update possible object poses (particles) with tactile data.

Implementation/Results

For our experimental setup, we used the Sawyer arm robot as the main platform. On the Sawyer’s end effector, we mounted an Intel RealSense camera and a resistive touch sensor connected to an Arduino. The Sawyer, RealSense, and Arduino were all connected to a desktop computer running our programs in ROS.

Our proposed approach was as follows, and is shown below. We first get an initial estimate of the object’s pose from the Realsense Pointcloud. We then seed our particle filter with this initial guess. Finally, we start our touch feedback loop. For the feedback loop, we sample n new particles, centered randomly around the weighted average of our current particles. Then, we randomly select a particle to try to touch. Based on whether or not a touch was detected, we reweigh our particles. We can then repeat this loop a fixed number of times, or until our center of mass of particles converges or stops moving.

This implementation can be found at: Link

A full analysis can also be found at: PDF Link

Figure 1: Proposed Algorithm Flowchart/Experimental Setup

Below is a video looping through our algorithm a few times. Notably, we can see the point clouds dispersing further from the end-effector, as touches are not detected.

Below are a few examples of the algorithm before and after successful touches.

Conclusion

Overall, our algorithm is generally able to move the particles towards the true pose of the object. However, it requires fine-tuning for an accurate final pose estimation. Notably, one major problem that we face is that our task has very sparse feedback. In other words, when the robot fails to touch an object, it gains little information except that the object is relatively not in that location. Thus, it becomes difficult for the particles to converge as they keep moving around randomly. Realistically, if the particle filter keeps failing to touch anything, it devolves into uniform random search of the workspace.

Regardless, the implications of our work are numerous. In many computer vision problems relating to robotics, perfect lighting is very rarely possible. Thus, having redundancy in terms of localization of objects is actually very beneficial. For example, a robot such as Spot could easily use its arm to execute our proposed algorithm when searching for objects within a dimly lit environment such as a cave or a dark warehouse.

There is still much future work to be done. Improving the touch feedback loop is one such option. We could try different ways of deciding where to touch an object, as opposed to randomly selecting a particle with the highest weight, for instance. Additionally, we could try changing the way we approach a touch in terms of gripper orientation and trajectory, as right now we are only approaching from the top, which isn't ideal for narrow objects. We could also touch and reweigh particles before resampling for new particles, so weights contain more tactile data and are more accurate. Finally, the history of touches could be taken into account, to alleviate our sparse feedback problem. In addition to all this, we could also explore different reweighting schemes or incorporate more vision based feedback, and include more concrete error metrics to track performance.

Tactile Augmentation for Vision-based Pose Estimation

Overview

Implementation/Results

Conclusion

Tactile Augmentation for Vision-based Pose
Estimation