We report here some preliminary results obtained by using by using the features Average log envelope (ALE) and Average instantaneous frequency (AIF) at the output of several fixed bandpass filters on noisy Aurora speech database. These preliminary results indicate that the ALE and AIF features that we advocate are atleast as good as traditional features based on MFCC (Mel-frequency cepstral coeffecients) filterbank and its relatives. Thus we are hopeful that with our current adaptive filterbank based feature extraction methods and with the improved signal and interference separation, we can easily improve the recognition performance significantly.
Experiments with the aurora 2 database were
conducted to determine the level of robustness for mismatched
conditions, i.e. when the models were trained on clean speech and
tested on noisy utterances. By holding the back-end constant, we
ensured that any increase in the word accuracy when compared with the
standard methods was due to our front-end processing techniques. The
results are tabulated in the following figure.
The top panel shows the word accuracy rate and the bottom panel shows
the performance of our method when compared with the standard
Mel-cepstrum front-end with 3 mixture HMM back-end, set by European
Telecommunication Standards Institute (ETSI) STQ-Aurora group. Negative
sign impiles poorer performance. The results indicate a substantial
improvement for certain tasks, especially for SNRs of 0 to 15 dB.
Average recognition rates showed improvement for every task
in sets A and B. Accuracy rates for set C were a bit
disappointing, underperforming the standard set by the reference
front-end. This poorer performance is probably because of mismatched
channel conditions. This should be rectified if we track the
formants using our current feature extraction methods. As shown in the
table overall accuracy rates (last line) our method improved by an
average of 13.97% for set A and by 17.92% for set B and by -31.72% for
set C. The overall accuracy rates for clean training
improvement using our features is 7.97%.