Search:

# Auditory Models

A collection of software, research, history, reflections, and data related to auditory models.

## Demos and Software (AMO)

### Talk at Johns Hopkins on human phoneme recognition

The Role of the Cochlea in Human Speech Recognition from CLSP Seminars PRO Jont Allen, UIUC 2007 August 7 Center for Speech and Language Processing, Johns Hopkins University vimeo video

### Demos (These older demos are inferior to the Interspeech-2013 Tutorial demos, above)

• KunLun software to analyze and modify speech (wav format), using the AI-gram software KunLun (zip) and wav files example phrases (zip)
• Demos of what KunLun can do Video-demos (OLD broken format: Video-demos)
• Support documentation that describes the basic speech perception research behind KunLun:
1. Allen, Jont and Li, Feipeng (2009). Speech perception and cochlear signal processing, IEEE Signal Processing Magazine, Invited: Life-sciences, 26(4), pp 73-77, July. (pdf, djvu)
2. Feipeng Li and Jont B. Allen. (2011) Manipulation of Consonants in Natural Speech; IEEE Trans. Audio, Speech and Language processing, (officially published: Jul, 2010; Appearance date: Mar 2011) pp. 496-504 (pdf)
3. Li, F., Menon, A. and Allen, Jont B., (2010) A psychoacoustic method to find the perceptual cues of stop consonants in natural speech, apr, J. Acoust. Soc. Am. pp. 2599-2610, (pdf)
4. Li, F., Trevino, A., Menon, A. and Allen, Jont B (2012). "A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise" J. Acoust. Soc. Am., v132(4) Oct, pp. 2663-2675 (pdf)
• AIgram source code zip, txt; If you would like to download this code, ask me for the password.

### Research Objectives and Accomplishments

The research in the Human Speech Recognition group is directed at a fundamental understanding of speech perception in both normal-hearing (NH) and Hearing-Impaired ears. These are related problems, and are actually a continiuium, not two separate things. Most people are born with normal hearing. Within a few years we learn, without seeming effort, to understand human speech. How this happens is a mystery. But what happens is not a mystery. The research we have been doing over the past 10 years, as documented in the section below, is a systematic study of the nature of the failure to process and communicate under various conditions. Only by stressing the system, causing failure, can we hope to understand it. There are at least four levels of experimentation:

1. The first level of experiments is with NH ears, with speech in noise.
2. The second level of experiments are filtering experiments, where the speech is filtered before the noise is added.
3. In the third series of experiments, the speech is truncated in time.
4. Finally small regions of speech are modified by a few dB, or removed altogether.

Examples of such processing are given in later on this page.

#### Findings:

We have found that speech perception is a discrete (binary) zero error task Singh and Allen, 2012. Working at the token level, we defined 2 groups: ZE, NZE. Zero-Error (ZE) speech is defined as speech that NH listeners never make an error in identifying, at and above above -2 dB SNR. The non-ZE (NZE) sounds are all the rest. All of the speech CV sounds that we have tested contain many ZE tokens: most CV consonants consist of more than 80% ZE utterances.

The remaining 20% of the CVs may be broken down into 0% < medium-error (ME) <10% and >10% high-error (HE) groups. ME consonants are typically utterances having varying degrees of mispronounced utterances. HE consonants are typically those that are heard as a different sound, with high probability (>20%). Based on the entropy across normal hearing listeners, we view such sounds as mislabled. The reasons for these errors can typically be traced to a specific flaw in the production of the sound, which is typically easily identified.

### Historical Documents

Page last modified on November 30, 2016, at 07:23 AM