Photographic Inverse Kinematics

Lingdong Huang 


( Or, How to Pretend You Can Dance)

( Or, How to Pretend You Can Dance)

During a discussion with Cathy from Tangible Media Group about a caligraphy project, we came up with an interesting idea: If we take, say a million photograph of a person, who in each of the photographs places their hand at a different position on a plane, then later, we can specify an arbitrary position on the plane to find a photograph in which the person's hand is at that position. We can even specify a path on the plane, and have an animation of the person's hand following that path, even if the person never performed that motion in real life.

I was super excited about the idea (Cathy seems less so, though), and later went back to the studio to test it out. I found a quiet corner so nobody had to wonder what funny thing I was doing. Then using my laptop to record the video, I hold my phone with flashlight on (so that later when using computer vision to locate my hand, I could simply track the brightest spot), scanning every hypothetical pixel on a virtual plane in front of me, first horizontally, then vertically.

Here's a sped up video of what I was doing:

Then I analyzed the video with OpenCV. The background had some pretty white spots, so simply tracking the brightest pixel did not work well. Therefore, I blended in a bit of background subtraction: "If something is always white, it's probably some white wall in the background; If something is white AND moving, that's the thing we're interested in tracking!". Finally I used non-maximal suppression (Gaussian blur, basically) to pinpoint the center of the bright spot.

Here's a picture of all the points I covered, according to the analysis:

You can see that where the pose is more comfortable, I tend to go slower and produce more datapoints, and where the pose is uncomfortable, I tend to rush through and produce less datapoints. I tried to be as even as possible when I did the motion, though I somewhat anticipated this distribution considering biological and psychological limitations.

Next, the fun part: specify any point on the image, and my photographic likeness will reach it with its hands! Though more efficient methods are possible, I simply iterate through all the hand locations and find the nearest neighbor. Now you can move your mouse around to make me dance! Finally, my innate inability to dance is solved by this simple software!

We could also make my hands track a pre-determined path, for example, a Chinese character. (Somewhat less exciting, because I know caligraphy already).

There's a lot more to explore! What would it take to locate the hand in 3D space instead of 2D plane? What about locating two hands in separate locations? Could we make a database of all possible poses of a person, every limb, every finger? Is there a better way to track it, a software where you need to match your hand to a point shown on screen perhaps? What other applications can there be? Of course, the most urgent of all, is to find a real dancer or model and a nice backdrop to photograph, so it's not just a random dude (i.e. me) doing it in front of a bunch of servers.