a) - e) Waveforms/Spectrogram of
the utterance ``Three'';
f) and g) Waveform/Spectrogram of instrumental music;
(note the sharp "attack" phase of each note)
Figure (1) shows examples of a speech utterance and
notes from a musical instrument (castanet). Figure (1a) shows the
waveform of the utterance "three"(its spectrogram is in Figure (1b))
which starts with a stop consonant. The characteristic signature of a
stop consonant is an (almost) complete closure of the vocal tract
followed by a sharp release of broadband energy called, a burst. These
events are zoomed into in Figure (1c). The burst qualifies as a
spectrally-diffuse component in our terminology. The second burst is
followed by aspiration (noise-like signal which has energy in several
frequency bands and hence is presumed to be a sum of spectrally-compact
components) which is then followed by start of periodic voicing.
The signal component corresponding to
the narrow first formant region is shown in figure (1d). Clearly, we
can associate a carrier frequency (the dominant harmonic's frequency)
with this spectrally-compact signal. We model such a signal component
(actually its complex or analytic version), using a bandpass signal
model (see [1]). The signal component in the third formant region is
shown in Figure (1e). This signal component originates from a broad
formant and hence is a sum of many harmonically related components.
Addition of many time-varying sinusoids results in signal
reinforcements at some time instants and cancellations at other time
instants (resulting in envelope actually or nearly going to zero at
some time locations). Thus the waveform in Figure (1e) appears to be
composed of a sequence of ``bandpass pulses''. We would classify this
signal component as spectrally-diffuse. In this case it would
seem that the carrier frequency is not the dominant feature of this
signal (although it is important) but the time locations of the
bandpass pulses are also relevant parameters. Therefore we model
each ``bandpass pulse'' signal in figure (1e) in the frequency domain
(see [1]). This model is ofcourse applicable to the bursts in
Figure (1c) as well.