Professor Christoph Bregler
Analyzing & Animating People
Crass CS Students Quarterly recently had the opportunity to interview Stanford professor Christoph Bregler, who is currently conducting research in the area of animating and analyzing people. This reporter was lucky enough to catch up with Prof. Bregler when he was taking a break from his research enjoying a drink at a dimly lit bar in one of the seedier parts of Menlo Park.
Crass CS Students Quarterly: So, Chris Can I call you Chris?
Professor Bregler: (lighting a cigarette) Call me Professor Bregler.
CCSSQ: Alright. So, Professor Bregler, what are you working on right now?
PB: I am currently researching methods of finding and analyzing representations of people and synthesizing them.
CCSSQ: Synthesizing people?
PB: (blows smoke in face of reporter) No. Synthesizing the animations of people.
CCSSQ: Oh, I get it. Like Scooby Doo.
PB: No, not like Scooby Doo. Scooby Doo was a cartoon dog.
CCSSQ: Oh, good point Anyway, what good does it do to recognize and synthesize humans?
PB: Maybe I should just talk. You can listen and take notes.
CCSSQ: Sure thing, Chris, er, Professor Bregler.
PB: The area Im researching is interdisciplinary. Working to understand and analyze human motion requires aspects of computer vision, machine learning, and graphics.
CCSSQ: Did you ever play that game Street Fighter II? That had awesome graphics. I could do the spinning piledriver.
PB: Anyway, you asked earlier why Im studying the recognition and synthesis of humans. There are two main fields which will benefit from this research. The first is Human Computer Interaction. By learning how to recognize and synthesize humans, we can create computers which interact with humans more naturally, with more natural input. It also improves real-time interaction. The second field which benefits is Data Storage and Mining. At this point there are no methods of searching for images in a large collection based on content of the image. In finding methods of searching images for particular pieces of information, such as people doing particular things in the images, this research would allow us to search for images based on content the same way we are currently able to search for text.
CCSSQ: Thats cool and all, but isnt that built into Java or something?
PB: Please be quiet. Anyway, this type of research encompasses whole new animation paradigms. Traditionally, animation is done using key framing, examples of which you may have seen in movies such as Pixars Toy Story and Antz. In this new field, however, we are using a new type of motion capture based animation. You may have seen or read about motion capture based animation which requires actors to wear special body suits or get marked up with special markers. Our method, however, is quite different. We are using video motion capture techniques to extract the animation information directly from standard video.
There are two types of motion which need to be analyzed: rigid (also known as articulated) and non-rigid. Rigid motions are simple, like the movement of an arm. Lots of research has been done with articulated motion, especially in robotics. Non-rigid motion is a lot harder. Face movements are a type of non-rigid motion. There are so many muscles in the face that many subtleties are possible.
For each of these two types of motion, analyzation can be decomposed into three distinct steps: measurement, recognition, and animation. Measurement employs constraint optimization techniques, recognition entails estimation of high dimensional structures, and animation makes use of rendering techniques.
If you want to measure articulated motion, you need to use visual tracking. The standard techniques for this are template matching, edge/shape/color detection, background subtraction, or optical flow analysis. Most of these techniques have been studied quite thoroughly. New challenges include the problem of complex variation, self occlusion, and noise (such as folds in material or low contrast).
CCSSQ: Sometimes my clothing is wrinkled because the drier in my dorm doesnt work very well. Is that what you mean by folds? Anyway, what can you do with this stuff?
PB: These measurements, combined with biometric data, allow us to collect motion data of people running, walking, even skipping. From there we can synthesize animations of people based on the motion capture data. Weve found that this synthesis is more realistic if we combine data from several different camera views.
There are many applications for these techniques, including analyzing dancers or celebrities such as Charlie Chaplin.
Another part of our research is on analyzation and animation of faces. Faces are difficult to analyze and animate because we are so attuned to them that we quickly notice anything wrong. Also, we still dont have a full understanding of the kinematics involved in facial expressions and gestures. Thus, we collect images to create constraints. For example, we built up a model of Kennedys Cuban missile crisis speech. One particular subfield of this research would be getting a computer to read lips, known as visual speech recognition.
CCSSQ: This stuff is pretty cool. How can I get involved in it? Do I have to be smart? You are really smart.
PB: Yes, you have to be very smart. You need to have a solid background in linear algebra and graphics, and good programming skills are always a plus.
CCSSQ: What classes do you teach? I want to be in your class! You are so smart!
PB: Maybe at some point Ill teach graphics. Next quarter Im teaching a course on machine learning. Come check it out!