Zaphod B's Sculpt-O-Sound (TM) Audio Products

Sculpt-O-Sound (TM)


Project 1: Formant-O-Matic FS-01


Formant-O-Matic is a formant synthesizer running on Linux. It can be controled via a gui build using TC Zhao and Overmars's xforms library. It's original use was to convert text to speech. The production of speech was based on the so called source filter model. The source is a model of the vocal chords and the filter is a model of the vocal tract. By repeatedly exciting the vocal tract (filter) with a pulse from the vocal chords (source) sounds can be produced. Given that the sounds are in an ordered sequence (although the underlying logic to this sequence in whatever language you pick has not been found as far as I know) even speech can be produced. Since not all sounds are produced by the vocal chords but some are produced by friction of air also a noise source is part of the speech production model.

Formant-O-Matic So what has all this to do with making Music? Simple! I have replaced the vocal chord model with a subroutine that can read any .WAV-file. So it is now possible to produce speech and instead of using a vocal tract model resulting in speech which sounds sort of human, I can use e.g. a bass guitar and have the output sound like a bass guitar that is actually speaking. So imagine the input text for formant-o-matic to be the all explaining term 'joh' and add the timbre of a bass guitar to that. This combination is ideally suited to be used in a house production!

Now it gets a bit more complicated. The essence of producing speech is to make a sequence of sounds in the right order. These sounds may vary over time in amplitude and in frequency content. So what Formant-O-Matic provides you with is the ability to read a glottal pulse replacement like a puls from a bass guitar. What you do next is input the text you want to synthesize. Then an amplitude envelope for the guitar pulse is generated and also the center frequencies and bandwidths for the filter are generated. Since the amplitude envelope and filter settings may vary over time during the utterance to be produced a whole sequence of them is generated and depicted in 3 plotting areas. In fact for every 5 mS of speech a set of parameters is generated. The values needed are read from a file that contains information on frequencies and amplitudes for the Dutch language. The values are valid for steady state sounds. So in a sequence of sounds formant-o-matic uses linear interpolation between steady states of the parameters. In real speech the transitions are more complex (context dependent). If the output of the synthesis process isn't what was expected it is possible to change the parameters over time by simply dragging them to the desired value. To be able to adjust the filter parameters more adequately the waveform display (a typical time domain representation) showing the input and output signal can be exchanged for one showing a sonagram (i.e. a frequency domain representation).


But enough talk, listen to these examples. They are self explanatory!

  • Input example 1: Single note played with a Kurzweil K2000 slap bass
  • Output example 1: Synthesized output
  • Input example 2: Some notes played with a Kurzweil K2000 slap bass
  • Output example 2: Short house style loop

    Literature: Bouten, J.S. Integratie van het Klatt '90 bronmodel in de Allofoonsynthese werkomgeving.
    Doctoraalscriptie Taal, Spraak en Informatica, Katholieke Universiteit Nijmegen, 1991.


    Home


    © Zaphod B.


    Last change: 17 dec 2003,
    ©1999-3001 Zaphod B., all rights reserved.
    Content subject to change without notice.
    Content provided 'as is', see disclaimer.
    Site maintained by