MaxMBROLA Project
A MBROLA-based real-time voice synthetizer for Max/MSP
[News] [Description] [MaxMBROLA~] [MIDI-MBROLA] [Applications] [Download] [Contacts]

NEW (as of 2010): MAXMBROLA support is now continued here

MaxMBROLA~ external object: MBROLA inside Max/MSP

Max/MSP objects work as small servers. They are initialized when they are imported inside the workspace. They contain a set of dedicated functions (methods) which are activated when the object receive particular messages. These messages can be simple numbers, symbols or complex messages with a header and arguments. Considering that real-time request-based protocol of communication between objects, what we have to do is to define a particular set of messages (header and arguments).


Fig 1. Internal structure of the MaxMBROLA~ external object.

As shown in Figure 1, we can separate the possible requests in two main channels. On one side, there is parameter modification, which influence the internal state of the synthesizer. On the other side, there is the phonetic/prosodic stream, which generate speech instantaneously.

Available actions of the object


Fig 2. Supported messages of the MaxMBROLA~ external object.

Internal state modifications

Some particular modifications of the internal state of the MBROLA synthesizer can be applied with Max/MSP requests. Here is a description of that supported actions. The labels used to name inlets (from left to right: Messages, Fs, Time, Pitch and Voice and examples of the supported messages are illustrated on Figure 2.

  • The use of the synthesizer always starts with the initialization task (Messages inlet). That function starts the MBROLA engine, loads the requested database of diphones and set all the internal parameters to their default values. All the existing MBROLA databases are compatible with the external. Here are some details about using MBROLA voices.

    voice MyHD:Users:TCTSLab:MyVoices:fr1:fr1

  • The stream provided by the external can be frozen (Messages inlet). It means that the phonetic/prosodic content stays in memory but the MBROLA engine stops the synthesis task.

    pause

  • The MBROLA engine can be stopped (Messages inlet). That function flushes the phonetic/prosodic content, stops the synthesis process and sets all the internal parameters to their default values. The database of diphones stays loaded.

    stop

  • Fs inlet receives a floating point number. It controls the output sampling rate. Indeed, the original sampling rate depends on the database (16000Hz or 22050Hz). Linear interpolation is performed allowing the use of that external object with all possible sampling rates.

  • The inlets Time, Pitch and Voice each receive a floating point number. These values are respectively the time ratio (deviation of the reference speed of speech), the pitch ratio (deviation of the reference fundamental frequency of speech) and voice ratio (compression/dilation ratio of the spectrum width). For each inlet, 1.0 is the default value. The object doesn't transmit values lower than 0.01 (means "100 time lower than the default value").

Phonetic/prosodic stream processing

Here are the requests that generate speech in the Max environment. All following messages are sent into the Messages inlet.

  • A loading request allows to use a standard *.pho file (which include the list of phonemes to be produced and the target prosody) to perform synthesis. Examples are available together with MBROLA voices and complete explanations about standard SAMPA (Speech Assessment Methods Phonetic Alphabet: it is the machine-readable phonetic alphabet used in many speech synthesizers) notation is given here.

    phostream MyHD:Users:TCTSLab:MyPhoFiles:mavoix.pho

  • We developed a function that directly accepts SAMPA streams inside Max messages to provide user control to interactive speech production. The standard SAMPA notation has been modified to fit to the Max message structure. For example, the following stream:

    phonemes & b 50 0 156 * a 150 100 204 #

    begins by initializing the synthesizer, then produces a syllable /ba/ of 200 (50 + 150) milliseconds with a fundamental frequency increasing from 156Hz to 204Hz (two pitch points). Finally, it flushes the phoneme buffer. More details about the syntax can be found here.

The MaxMBROLA_NR~ external object: let Max/MSP resample!

Coming soon...

Some sound exemples

MaxMBROLA_NR_phofile.mp3
Sound extracted from Max/MSP (sfrecord~ object). The MaxMBROLA_NR~ object, in a poly~ sub-patch, with the down 2 argument (with fr4 loaded, 22kHz voice) receive the phostream request loading the file trenet.pho.

MaxMBROLA_phofile.mp3
Sound extracted from Max/MSP (sfrecord~ object). The MaxMBROLA~ object (with fr1 loaded, 16kHz voice) receive the phostream request loading the file trenet.pho.

MaxMBROLA Project - Nicolas D'Alessandro, Raphäel Sebbe, Baris Bozkurt & Thierry Dutoit
Laboratoire de Théorie des Circuits et Traitement du Signal - FPMs
Last update : 27 / 06 /2005