Speech Perception

.. t (Liberman, 1996). It can be concluded that the movement of a speaker’s face and lips can have a strong influence on perception of speech stimuli. Audiovisual integration also occurs for non-speech sounds. For example, sound localization often is influenced by vision (Moore, 1997). Models of Speech Perception There are many models of speech perception.

There is not one specific model that is generally accepted. Three influential models being discussed are the motor theory, the cued based approach, and the TRACE model. Motor Theory In the motor theory the objects of speech perception are the intended phonetic gestures of the speaker. According to Liberman (1996), “they are represented in the brain as motor commands that call for movements of the articulators through certain linguistically significant configurations.” The listener perceives the articulatory gesture the speaker is intending to make when producing the word or utterance. In the motor theory, speech perception and speech production are closely linked and innately specified.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

This model accounts for many speech perception characteristics. However, the model does not specify how the translation from the signal to the perceived gesture is accomplished, thus making the model incomplete (Liberman, 1996). The motor theory is in two ways motor. First, it is considered motor because it takes the proper object of phonetic perception to be a motor event. Secondly, it assumes that adaptations of the motor system for controlling the organs of the vocal tract took precedence in the evolution of speech (Liberman and Mattingly, 1985).

Cue Based Approach In the cue based approach there is a sequence of steps of processing. The speech signal undergoes analysis in the peripheral auditory system. The next step is acoustic property detectors. This includes onset detectors, spectral change detectors, formant frequency detectors, and periodicity detectors. These detectors compute relational attributes of the signal.

The next step is an array of phonetic feature detectors. They examine the set of auditory property values over a chunk of time and make decisions as to whether a particular phonetic feature is present (i.e. nasality). All of these decisions are language specific. In conclusion, it should be possible to find a relatively uniform mapping between acoustic patterns and perceived speech, as long as the acoustic patterns are analyzed in appropriate ways (Stevens, 1986). TRACE Model The TRACE model consists of a large number of units, broken down into three levels, which are the feature, phoneme, and word levels.

Each of these levels contains highly interconnected processing units called nodes. TRACE accounts for several different aspects of human speech perception. Like humans, TRACE uses information from overlapping portions of the speech wave to identify successive phonemes. The model’s tendency toward categorical perception is affected by many of the same parameters, which affect the degree of categorical perception shown by humans (Elman and McClelland, 1986). This model is considered a connectionist model, based on neural networks.

In the lowest level, the nodes represent the phonetic features. In the second level the nodes represent the phonetic segments. Lastly, the nodes represent the words. When a particular level of activation is reached the nodes are fired, which indicates that a feature, phoneme, or word is present (Moore, 1997). At the feature level, there are banks of detectors for each of the dimensions of speech sounds. Each bank is reproduced for several successive moments in time.

At the word level there are detectors for every word. The detectors are replicated across time slices. Units with adjacent centers span overlapping ranges of slices (Elman and McClelland, 1986). When a node fires, activation is passed along to connected nodes. Excitatory links exist between nodes at different levels, which can cause a node at the next level to fire. There are also inhibitory links between nodes within the same level, which allows highly activated nodes to inhibit competitive nodes with less activity.

This results in one node taking all the activity. The flow of activation is not just from the feature detectors to the word level. The excitatory activation flows in both directions, which allows for information gathered at the word level to influence phonetic identification (Moore, 1997). Like humans the TRACE cannot identify a word until it has heard part of the next word. It can, however, better determine a where a word will begin when it is preceded by a word rather than a non-word.

Although the model is influenced by word beginnings, it can recover from underspecification or distortion of a word’s beginning. The model is able to use activations of phoneme units in one part of the TRACE to adjust the connection strengths determining which feature will activate which phoneme. This model is called the TRACE because the pattern of activation left by a speech input is a trace of the analysis of the input at each of the levels (Elman and McClelland, 1986). Resistance of Speech to Corrupting Influences One factor that can greatly affect speech perception is background noise. For satisfactory communication, the signal to noise ratio should be +6dB. When this does not occur, speech perception drastically drops.

Moore (1997) stated that at a 0dB signal to noise ratio word articulation scores reach 50%. A second factor, which may affect speech perception, is a change in frequency spectrum. Many transmissions only pass a certain range of frequencies. This may leave some speech signals out since information by the speech wave is not confined to any particular frequency range. A third factor is peak clipping. If an amplifier is overloaded then the peaks of the waves may be flattened off, thus causing a loss in some of the speech signal.

This degrades the quality and naturalness of speech, but does not greatly affect the intelligibility of speech (Moore, 1997). Conclusion When discussing speech perception, one is seldom really concerned about perception of speech alone, but in fact about essential aspects of language. Speech is a complex stimulus varying in both frequency and time. A basic problem in the study of speech perception is to relate speech wave properties to specific linguistic units. A second problem is finding cues in the acoustic waveform that clearly indicates a particular linguistic unit. Often times, a phoneme will only correctly be identified if information obtained from a word or syllable is utilized.

Speech is perceived and processed in a different way from non-speech stimuli, called speech mode. Speech intelligibility is relatively unaffected by severe distortions of the signal. Speech is an effective method of communication, which remains reliable under difficult conditions (Moore, 1997). Bibliography Works Cited Fant, G. (1973). Speech Sounds and Features.

Cambridge, MA: The MIT Press. Liberman, A.M. (1996). Speech. Cambridge, MA: The MIT Press. Liberman, A.M.

and Mattingly, I.G. (1985). The Motor Theory of Speech Perception Revised. Cognition, 21. 1-36. Lobacz, P.

(1984). Processing and Decoding the Signal in Speech Perception. Helmut Buske Verlag Hamburg. Luce, P.A. and Pisoni, D.B. (1986).

Trading Relations, Acoustic Cue Integration, and Context Effects in Speech Perception. The Psychophysics of Speech Perception. Edited by M.E.H. Schouten. Moore, B.C.J.

(1997). An Introduction to the Psychology of Hearing. (4th ed.) San Diego, CA: Academic Press. Stevens, K.N. (1986).

Models of Phonetic Recognition II: A feature based model of speech recognition. Montreal Satellite Symposium on Speech Recognition. Edited by P. Mermelstein. Studdert-Kennedy, M. and Shankweiler, D.

(1970). Hemispheric Specialization for Speech Perception. Journal of Acoustical Society of America, 48. 579-592.