My Career: Science, Research, Policy, and Ethics

Theoretical Contributions

With Carol Fowler, Robert Remez, and Michael Turvey, I helped introduce the consideration of speech from the perspective of dynamical systems / action theory. My theoretical approach to perception and production, particularly in the case of speech, eschews attention to the momentary and punctate aspects of the signal, focusing not on traditional features and cues, but on spatiotemporal coordination of global aspects of the system, such as spectral coherence over long stretches of time (an approach related to current speech understanding systems, like Siri or Amazon Alexa).

With Robert Remez and various other colleagues, the technique of sinewave synthesis was used to explore perceptual organization. We noted that "the criteria for the perceptual organization of speech - visible, audible, and even palpable - are actually specified in a general form, removed from any particular sensory modality ...", but, to me, related to the underlying spectral coherence of signals created by biological activity.

My approach stresses the constraints and structure stemming from the realities of embodied systems, again across both time and physical space. I helped expand the modeling of speech production to incorporate an event based approach for controlling movement of the vocal tract over time and articulatory space by building on the conceptual approach developed by Paul Mermelstein and colleagues at Bell Laboratories. This work was influenced, in part, by the event-based focus of James J. Jenkins. My articulatory synthesis model, ASY, illustrates how simple physical changes, such as velar opening, directly account for degrees of nasality, avoiding the complexity of attempting to reconcile numerous spectral cues. This event orientation evolved into a gestural computational system developed at Haskins Laboratories that combined ASY with the articulatory phonology of Catherine Browman and Louis Goldstein, and the task dynamic model of Elliot Saltzman. In this system utterances are organized ensembles (or constellations) of units of articulatory action called gestures. Each gesture is modeled as a dynamical system that characterizes the formation (and release) of a local constriction within the vocal tract (the gesture’s functional goal or `task’). Goldstein and Rubin have described the "dances of the vocal tract" that underlie the production of continuous speech.

Biomechanical constraints stemming from such embodiment also can be exploited in the recovery of vocal tract shapes from the acoustic signal as seen in the continuity mapping approach of John Hogden, used by Hogden, Rubin, and colleagues to re-conceptualize how realistic physical constraints affect pattern recognition. This involves reverse engineering the path from the acoustic signal to its physiological source (aka: the inverse problem) using a gradient maximum likelihood approach.

Working with colleagues around the world (particularly in Japan, Canada, Brazil, and France), we expanded the approach to include an understanding of the importance of spatiotemporal coordination in audiovisual speech. These collaborations with Eric Vatikiotis-Bateson, Takaaki Kuratate, Kevin Munhall, Hani Yehia, and others focused on multimodality by exploring the simultaneous combination of speech, facial information, and gesture, leading to innovations in analysis, synthesis, and simulation.

See, also:

My Career: Science, Research, Policy, and Ethics

Haskins Laboratories and Yale

Theoretical Contributions

National Science Foundation

Ethical Issues Related to Research and Technology

White House Office of Science and Technology Policy

Science Policy and Advocacy

Other Activities

Honors and Awards

         

List of Science and Policy Roles
Download CV as a PDF file
Wikipedia page
HOME