Abramson/Lisker VOT Stimuli
Arthur Abramson (1925-2017) and Leigh Lisker (1918 - 2006) were American linguists best known for their pioneering work on voice onset time (VOT). The initial collaboration of Abramson and Lisker at Haskins Laboratories resulted in a paper, “A cross-language study of voicing in initial stops,” in 1964 in the journal Word that has become one of the most widely cited papers in all of phonetics. In this paper, they introduced the acoustic VOT measure to characterize the nature of stop consonant voicing distinctions.
Voice onset time (VOT) is a feature of the production of stop consonants, readily identifiable in the speech waveform. It is defined by the time elapsed between the release of the stop consonant constriction (sometimes called the “burst”) and the onset of voicing or periodicity, the vibration of the vocal folds. It is visible in an acoustic spectrogram as a pronounced peak going from low to high frequency. Negative VOT values mark voicing that begins during the period of articulatory closure for the consonant and continues in the release, for those unaspirated voiced stops in which there is no voicing present at the instant of articulatory closure. Positive VOT values mark voicing that begins after the period of articulatory closure for the consonant.
The signals, below, are from an Abramson/Lisker VOT experiment that examined labial place of articulation. To control VOT in measured increments, Abramson and Lisker used the Haskins Laboratories formant synthesizer. To create this continuum from /ba/ to /pa/, the basic pattern included three steady-state formants for a vowel of the type [a]. Labial stop releases were simulated by means of appropriate formant transitions. VOT variants were synthesized ranging from voicing starting 150 ms before the release (“M” in filenames for minus) to 150 ms after the release (“P” in filenames for plus).
(Details: 10 ms steps; Note that the positive 140 token is missing. Some of the tokens seem to be about 5 ms too long: P30, P50, P90, P100, P110. This discrepancy is probably due to the combination of recording conditions used to make these versions. A few ms of silence precede each syllable. All are in MP3 format.)