Welcome to Speech and Signal Processing Laboratory |
For the past few years we have been collaborating with auditory
scientists (from the University of Amsterdam and Eaton-Peabody Lab for
Auditory Physiology (MIT/Harvard)) in learning about how the auditory
system processes acoustic signals such as speech, encodes them and
makes inferences on them. Our goal is to identify those aspects of
auditory processing that are responsible for its superiority over
current artificial implementations and to emulate the practically
useful ones in a computer. The biggest barrier to widespread use of automatic
speech recognition(ASR) systems in real-life situations is their
unreliable performance in background noise and interference. In marked
contrast to current artificial systems, human listeners are able to
correctly identify speech utterances in many acoustically-challenging
contexts. Humans also do remarkably well in separating out individual
voices from those of other speakers and from acoustic clutter of all
sorts (cocktail party effect). How are we able to do this?
Examination of auditory perception and the neurophysiological basis
suggests to us that this difference is due to powerful sound separation
mechanisms coupled with robust spectro-temporal representations of
signals used by the auditory system. Currently, every speech-recognition system that engineers have built
uses framewise feature vectors. The feature vectors are derived from
short-term spectral envelopes computed by standard spectral analysis or
by using a bank of fixed bandpass filters (BPFs). When speech is
degraded by noise, interference, and channel effects (such
as telephone, reverberation etc.,) perturbations at one
frequency affect the entire feature vector rendering the extracted
features vulnerable. This type of framewise spectral envelope
extraction that models the speech and interference together, is at odds
with how the auditory system processes and recognizes speech. In the
auditory system, sound components are spectrally and temporally
separated, analyzed and subsequently fused into unified objects,
streams and voices that exhibit perceptual attributes, such as pitch,
timbre, loudness, and location. We propose to
develop methods and algorithms to process complex acoustic signals
observed by one or more acoustic sensors. The long term goal is to
develop a machine that can deal with the day-to-day booming, buzzing
acoustic environment around us and make inferences on the sounds the
way human beings and animals are able to do. Current signal
analysis methods are inadequate for this purpose. Since the
auditory system provides an existence proof of such a system it seems
reasonable to use it as an inspiration for our strategy. However,
our algorithm development is anchored in fundamental signal processing
principles. The major aims of our current research are as follows:
|
Recent Publications
|
Acknowledgements This research was supported by grants from the
National Science Foundation under grant number EIA-0130793 and
CCR-0105499 |
Email: Dr. Kumaresan kumar@ele.uri.edu for comments.
Page last updated: September 10, 2009