I undertook a PhD in Computer Graphics and Animation at the University of Sheffield between 2000-2004. During that time I worked under Dr Alan Watt and Dr Steve Maddock in the Computer Graphics Group. My subject was the synthesis of visual speech movements (i.e. lip, jaw, and tongue) for animation. The research tackled a wide variety of subjects including:
- Deformation techniques – research into geometry based deformation techniques for animation, such as free-form deformation (FFDs), geometric muscle models, wires, bones, etc.
- Capture techniques – work on computer vision, and traditional motion capture technologies. Included retargetting of motion data to new subjects.
- Animation techniques – including motion-based models (i.e. concatenative synthesis) and target-based models (i.e. coarticulation dominance functions). Developed my own variants based upon space-time optimisation, and n-gram concatenation.
- Evaluation of speech synthesis – evaluation of work included both objective metric-based measurements, and subjective evaluation of speech. Looked into confusions to see if animation was enhancing or impeding understandability.
Spacetime Synthesis Technique for Visual Speech
Most speech animation is performed by blending between phonetic targets, i.e. the articulatory equivalents of the fundamental sounds in a language. This is the case in Cohen & Massaro dominance functions, and most other techniques based upon visemes (i.e. visual-phonemes). However, it is uncertain what the form of these functions is, and even whether they exist at all. In my work I separated the physical constraints of the articulators (i.e. the speed and acceleration of movement) from the targets themselves. So a target has a relative importance over the parameters which define the vocal tract, and the system seeks to find the optimal trajectory which satisfies the physical constraints. This seems a more natural way in which to define the motion. Targets are no longer points in space, they have an ideal value plus an allowable variance from that ideal – and the solved trajectory meets all of the constraints of the motion. This research resulted in a SIGGRAPH sketch. Below is an animation demonstrating the technique using synthesised audio to guide the animation.
video: target-based synthesis of speech movements.
Retargetting of Face Motion Capture Data
Traditional motion capture systems use markers attached to the human face to record the movement of the skin. These markers accurately record movement at discrete points, but that motion is tied to the physical shape of the actor’s face. This limits use, because in many cases we want an actor to puppeteer another character with different facial characteristics. During my PhD research I developed a method for retargetting motion from an actor onto a character model which is different in its physical structure. This method is based upon the use of Radial Basis Functions, a form of continuous mapping function, which morph the space of the actor’s face to fit that of the target model. This allows motion from an actor to be used to animate any human target character.
video: demonstration of the retargetting system.
video: further demonstration of retargetting.