phontab.md 17.5 KB
Edit Raw Blame History Permalink


Phoneme Tables

Phoneme files
Phoneme Definitions
Phoneme Properties


Type
Properties
Place of Articulation
starttype
endtype
lengthmod
voicingswitch

Phoneme Instructions


length
ipa
WAV
FMT
VowelStart
VowelEnding
Vowelin
Vowelout
ChangePhoneme
ChangeIfDiminished
ChangeIfUnstressed
ChangeIfNotStressed
ChangeIfStressed
IfNextVowelAppend
RETURN
CALL

Conditional Statements


Conditions
Attributes

Sound Specifications
Vowel Transitions


A phoneme table defines all the phonemes which are used by a language,
together with their properties and the data for their production as
sounds.

Generally each language has its own phoneme table, although additional
phoneme tables can be used for different voices within the language.
These alternatives are referenced from Voice files.

A phoneme table does not need to define all the phonemes used by a
language. It can inherit the phonemes from a previously defined phoneme
table. For example, a phoneme table may redefine (or add) some of the
vowels that it uses, but inherit most of its consonants from a standard
set.

The source files for the phoneme data are in the "phsource" directory in
the espeakedit download package. "Vowel files", which are referenced in
FMT(), VowelStart(), and VowelEnding() instructions are made using the
espeakedit program.
Phoneme Files
The phoneme tables are defined in a master phoneme file, named
phonemes. This starts with the base phoneme table followed by
phoneme tables for other languages and voices. These inherit phonemes
from the base table or previously defined tables.

In addition to phoneme definitions, the phoneme file can contain the
following:


    include <filename>

  
Includes the text of the specified file at this point. This allows
different phoneme tables to be kept in different text files, for
convenience.  is a relative path. The included file can
itself contain include statements.


    phonemetable <name> <parent>

  
Starts a new phoneme table, and ends the previous table.

 Is the name of this phoneme table. This name is used in Voice files.

 Is the name of a previously defined phoneme table whose phoneme
definitions are inherited by this one. The name base indicates the first
(base) phoneme table.
Phoneme Definitions
A phoneme table contains a list of phoneme definitions. Each starts with
the keyword phoneme and the phoneme name (this is the name used in
the pronunciation rules in a language's *_rules and *_list files),
and ends with the keyword endphoneme.

The phoneme mnemonics are based on the scheme by
Evan Kirshenbaum
which represents International Phonetic Alphabet symbols using ascii
characters.

For example:


    phoneme aI
  vowel
  starttype #a endtype #i
  length 230
  FMT(vowels/ai)
endphoneme

phoneme s
  vls alv frc sibilant
  voicingswitch z
  lengthmod 3
  Vowelin  f1=0  f2=1700 -300 300  f3=-100 80
  Vowelout f1=0  f2=1700 -300 250  f3=-100 80  rms=20

  IF nextPh(isPause) THEN
    WAV(ufric/s_)
  ELIF nextPh(p) OR nextPh(t) OR nextPh(k) THEN
    WAV(ufric/s!)
  ENDIF
  WAV(ufric/s)
endphoneme

  
A phoneme definition contains both static properties and executed
instructions. The instructions may contain conditional statements, so
that the effect of the phoneme may be different depending on adjacent
phonemes, whether the syllable is stressed, etc.

The instructions of a phoneme are interpreted in two different phases.
In the first phase, the instructions may change the phoneme and replace
it by a different phoneme. In the second phase, instructions are used to
produce the sound for the phoneme.

The import_phoneme statement can be used to copy a previously
defined phoneme from a specified phoneme table. For example:


    phoneme t
  import_phoneme base/t[
endphoneme

  
means: phoneme t in this phoneme table is a copy ofphoneme t[ from phoneme
table base. A length instruction can be used after import\_phoneme to
vary the length from the original.
Phoneme Properties
Within the phoneme definition the following lines may occur: ((V) indicates
only for vowels, (C) only for consonants).
Type
One of these must be present.


Type
Description


vowel


liquid
semi-vowels, such as:  r, l, j, w


nasal
nasal e.g.:  m, n, N


stop
stop (plosive) e.g.: p, b, t, d, k, g


frc
fricative e.g.: f, v, T, D, s, z, S, Z, C, x


afr
affricate e.g.: tS, dZ


pause


stress
Used for stress symbols, eg: ' , = %


virtual
Used to represent a class of phonemes.


Properties


Property
Type
Description


vls
(C)
voiceless e.g. p, t, k, f, s


vcd
(C)
voiced e.g. b, d, g, v, z


sibilant
(C)
e.g.: s, z, S, Z, tS, dZ


palatal
(C)
A palatal or palatalized consonant.


rhotic
(C)
An r type consonant.


unstressed
(V)
This vowel is always unstressed, unless explicitly marked otherwise.


nolink

Prevent any linking from the previous phoneme.


nopause

Used in a liquid or nasal phoneme to prevent eSpeak NG inserting a short pause if a word starts with this phoneme and the previous word ends with a vowel.


trill
(C)
Apply trill to the voicing.


Place of Articulation


Articulation
Type
Description


blb
(C)
bilabial


ldb
(C)
labio-dental


dnt
(C)
dental


alv
(C)
alveolar


rfx
(C)
retroflex


pla
(C)
palato-alveolar


pal
(C)
palatal


vel
(C)
velar


lbv
(C)
labio-velar


uvl
(C)
uvular


phr
(C)
pharyngeal


glt
(C)
glottal


starttype

  
    starttype <phoneme>

  
Allocates this phoneme to a group so that conditions such as nextPh(#e) can
test for any of a group of phonemes. Pre-defined groups for use for vowels are:
#@ #a #e #i #o #u. Additional groups can be defined as phonemes
with type virtual.
endtype

  
    endtype <phoneme>

  
Allocates this phoneme to a group so that conditions such as prevPh(#e) can
test for any of a group of phonemes. Pre-defined groups for use for vowels are:
#@ #a #e #i #o #u. Additional groups can be defined as phonemes
with type virtual.
lengthmod

  
    lengthmod <integer>

  
(C) Determines how this consonant affects the length of the previous vowel.

This value is used as index into the length_mods table in the CalcLengths()
function in the eSpeak NG program.
voicingswitch

  
    voicingswitch <phoneme>

  
This is used for some languages to change between voiced and unvoiced phonemes.
Phoneme Instructions
Phoneme Instructions may be included within conditional statements.

During the first phase of phoneme interpretation, an instruction which
causes a change to a different phoneme will terminate the instructions.
During the second phase, FMT() and WAV() instructions will terminate the
instructions.
length

  
    length <length>

  
The relative length of the phoneme in miliseconds. Typical values are about 
140 for a short vowel and from 200 to 300 for a long vowel or diphong. 
A length() instruction is needed for vowels. It is optional for consonants.
ipa

  
    ipa <ipa string>

  
In many cases, eSpeak NG makes IPA (International Phonetic Alpbabet) phoneme
names automatically from eSpeak NG phoneme names. If this is not correct, then
the phoneme definition can include an ipa instruction to specify the correct
IPA name. IPA strings may include non-ascii characters. They may also include
characters specified by their character codes in the form U+ followed by 4
hexadecimal digits. For example a string: aU+0303 indicates 'a' with a
'combining tilde'.
WAV

  
    WAV(<wav file>, <amplitude>)

  
 is a path to a WAV file (22 kHz, 16 bits, mono) within phsource/
which will be played to produce the sound. This method is used for unvoiced
consonants.  does not include a .WAV filename extension, although
the file to which it refers may or may not have one.

 is optional. It is a percentage change to the amplitude of the
WAV file. So, {gfm-extraction-a2c65f643e78c2fc3b0fd06e70d10395} means: play file 'ufric/s.wav' at 50% amplitude.
Default value is 100.
FMT

  
    FMT(<vowel file>, <amplitude>)

  
 is a path to a file (within phsource/) which defines how to
generate the sound (a vowel or voiced consonant) from a sequence of formant
values. Vowel files are made using the espeakedit program which is not part
of this project.

 is optional. It is a percentage change to the amplitude of the
sound which is synthesized from the {gfm-extraction-b5a7ac9ff7d0d019e397a298d853dcbc} instruction. Default value is 100.


    {gfm-extraction-893b632df962ccb57f050c1f33c87b06}
  

For voiced consonants, a {gfm-extraction-b5a7ac9ff7d0d019e397a298d853dcbc} instruction may be followed by an {gfm-extraction-ec46da7ce21a597f548ea3f14576f643}
instruction. {gfm-extraction-ec46da7ce21a597f548ea3f14576f643} has the same format as a {gfm-extraction-868963665678b6340907f6e82b096272} instruction, but the
WAV file is mixed with the sound which is synthesized from the {gfm-extraction-b5a7ac9ff7d0d019e397a298d853dcbc} instruction.
VowelStart

  
    VowelStart(<vowel file>, <length adjust>)

  
This is used to modify the start of a vowel when it follows after a sonorant consonant
(such as [l] or [j]). It replaces the first frame of the  which
is specified in a FMT() instruction by this , and adjusts the
length of the original by a signed value . The VowelStart()
instruction may be specified either in the phoneme definition of the vowel, or
in the phoneme definition of the sonorant consonant which precedes the vowel.
The former takes precedence.
VowelEnding

  
    VowelEnding(<vowel file>, <length adjust>)

  
This is used to modify the end of a vowel when it is followed by a sonorant
consonant (such as [l] or [j]). It is appended to the  which
is specified in a FMT() instruction by this , and adjusts the
length of the original by a signed value . The VowelEnding()
instruction may be specified either in the phoneme definition of the vowel, or
in the phoneme definition of the sonorant consonant which follows the vowel.
The former takes precedence.
Vowelin

  
    Vowelin <vowel transition data>

  
(C) Specifies the effects of this consonant on the formants of a following
vowel. See vowel transitions.
Vowelout

  
    Vowelout <vowel transition data>

  
(C) Specifies the effects of this consonant on the formants of a preceding
vowel. See vowel transitions.
ChangePhoneme

  
    ChangePhoneme(<phoneme>)

  
Change to the specified phoneme.
ChangeIfDiminished

  
    ChangeIfDiminished(<phoneme>)

  
Change to the specified phoneme (such as schwa, @) if this syllable has
"diminished" stress.
ChangeIfUnstressed

  
    ChangeIfUnstressed(<phoneme>)

  
Change to the specified phoneme if this syllable has "diminished" or
"unstressed" stress.
ChangeIfNotStressed

  
    ChangeIfNotStressed(<phoneme>)

  
Change to the specified phoneme if this syllable does not have "primary" stress.
ChangeIfStressed

  
    ChangeIfStressed(<phoneme>)

  
Change to the specified phoneme if this syllable has "primary" stress.
IfNextVowelAppend

  
    IfNextVowelAppend(<phoneme>)

  
If the following phoneme is a vowel then this additional phoneme will be
inserted before it. Usually it is short pause to distinguish two vowels from
diphthongs.
RETURN
Ends executions of instructions.
CALL

  
    CALL <phoneme table>/<phoneme>

  
Executes the instructions of the specified phoneme.
Conditional Statements
Phoneme definitions can contain conditional statements such as:


    IF <condition> THEN
    <statements>
ENDIF

  
or more generally:


    IF <condition> THEN
    <statements>
ELIF <condition> THEN
    <statements>
...
ELSE
    <statements>
ENDIF

  
where the ELSE and multiple ELSE parts are optional.

Multiple conditions may be joined with AND or OR, but not a mixture of
ANDs and ORs.

A condition may be preceded by NOT. For example:


    IF <condition> AND NOT <condition> THEN
    <statements>
ENDIF

  
Conditions
Conditions can be:


    thisPh(<attribute>)

  
Test this current phoneme


    prevPh(<attribute>)

  
Test the previous phoneme


    prevPhW(<attribute>)

  
Test the previous phoneme, but only within the same word. Returns false if
there is no previous phoneme in the word.


    prev2PhW(<attribute>)

  
Test the phoneme before the previous phoneme, but only within the same word.
Returns false if it is not in this word.


    nextPh(<attribute>)

  
Test the following phoneme


    next2Ph(<attribute>)

  
Test the phoneme after the next phoneme.


    nextPhW(<attribute>)

  
Test the next phoneme, but only within the same word. Returns false if there
is no following phoneme in the word.


    next2PhW(<attribute>)

  
Test the phoneme after the next phoneme, but only within the same word. Returns
false if not found before the word end.


    next3PhW(<attribute>)

  
Test the third phoneme after the current phoneme, but only within the same word.
Returns false if not found before the word end.


    nextVowel(<attribute>)

  
Test the next vowel after the current phoneme, but only within the same word.
Returns false if there is none.


    prevVowel(<attribute>)

  
Test the previous vowel before the current phoneme, but only within the same
word. Returns false if there is none.


    PreVoicing()

  
This is used as part of the instructions for voiced stop consonants (e.g. [d]
and [g]). If true then produce a voiced murmur before the stop.


    KlattSynth()

  
Returns true if the voice is using the Klatt synthesizer rather than the eSpeak synthesizer.
Attributes

  
    <phoneme name>

  
True if the phoneme has this phoneme name.


    <phoneme group>

  
True if the phoneme has this starttype (or if it has this endtype if it is
used in prevPh()). The pre-defined phoneme groups are #@, #a, #e, #i,
#o, #u.


    isPause

  
True if the phoneme is a pause.


    isPause2

  
nextPh(isPause2) is used to test whether the next phoneme is not a vowel or
liquid consonant within the same word.


    isVowel
isNotVowel
isLiquid
isNasal
isVFricative

  
These test the phoneme type.


    isPalatal
isRhotic

  
These test whether the phoneme has this property.


    isWordStart
notWordStart

  
These text whether this is the first phoneme in a word.


    isWordEnd

  
True if this is the final phoneme in a word.


    isFirstVowel
isSecondVowel
isFinalVowel

  
True if this is the First, Second, or Last vowel in a word.


    isAfterStress

  
True if this phoneme is after the stressed vowel in a word.


    isVoiced

  
True if this phoneme is a vowel or a voiced consonant.


    isDiminished

  
True if the syllable stress is "diminished"


    isUnstressed

  
True if the syllable stress is "diminished" or "unstressed"


    isNotStressed

  
True if the syllable stress is not "primary stress".


    isStressed

  
True if the syllable stress is "primary stress".


    isMaxStress

  
True if this is the highest stressed syllable in the word.
Sound Specifications
There are three ways to produce sounds:


Playing a WAV file, by using a WAV() instruction. This is used for unvoiced
consonants such as [p], [t] and [s].
Generating a wave from a sequence of formant parameters, by using a FMT()
instruction. This is used for vowels and also for sonorants such as [l],
[j] and [n].
A mixture of these. A stored WAV file is mixed with a wave generated from
formant parameters. Use a FMT() instruction followed by addWav(). This is
used for voiced stops and fricatives such as [b], [g], [v] and [z].

Vowel Transitions
These specify how a consonant affects an adjacent vowel. A consonant may
cause a transition in the vowel's formants as the mouth changes shape
between the consonant and the vowel. The following attributes may be
specified. Note that the maximum rate of change of formant frequencies
is limited by the program.


    len=<integer>

  
Nominal length of the transition in miliseconds. If omitted a default value is used.


    rms=<integer>

  
Adjusts the amplitude of the vowel at the end of the transition. If omitted
a default value is used.


    f1=<integer>

  
0: f1 formant frequency unchanged.

1: f1 formant frequency decreases.

2: f1 formant frequency decreases more.


    f2=<freq> <min> <max>

  
: The frequency towards which the f2 formant moves (Hz).

: Signed integer (Hz).  The minimum f2 frequency change.

: Signed integer (Hz).  The maximum f2 frequency change.


    f3=<change> <amplitude>

  
: Signed integer (Hz).  Frequency change of f3, f4, and f5 formants.

: Amplitude of the f3, f4, and f5 formants at the end of the
transition. 100 = no change.


    brk

  
Break. Do not merge the synthesized wave of the consonant into the vowel. This
will produce a discontinuity in the formants.


    rate

  
Allow a greater maximum rate of change of formant frequencies.


    glstop

  
Indicates a glottal stop.
Type	Description
`vowel`
`liquid`	semi-vowels, such as: `r`, `l`, `j`, `w`
`nasal`	nasal e.g.: `m`, `n`, `N`
`stop`	stop (plosive) e.g.: `p`, `b`, `t`, `d`, `k`, `g`
`frc`	fricative e.g.: `f`, `v`, `T`, `D`, `s`, `z`, `S`, `Z`, `C`, `x`
`afr`	affricate e.g.: `tS`, `dZ`
`pause`
`stress`	Used for stress symbols, eg: `'` `,` `=` `%`
`virtual`	Used to represent a class of phonemes.
Property	Type	Description
`vls`	(C)	voiceless e.g. `p`, `t`, `k`, `f`, `s`
`vcd`	(C)	voiced e.g. `b`, `d`, `g`, `v`, `z`
`sibilant`	(C)	e.g.: `s`, `z`, `S`, `Z`, `tS`, `dZ`
`palatal`	(C)	A palatal or palatalized consonant.
`rhotic`	(C)	An `r` type consonant.
`unstressed`	(V)	This vowel is always unstressed, unless explicitly marked otherwise.
`nolink`		Prevent any linking from the previous phoneme.
`nopause`		Used in a `liquid` or `nasal` phoneme to prevent eSpeak NG inserting a short pause if a word starts with this phoneme and the previous word ends with a vowel.
`trill`	(C)	Apply trill to the voicing.
Articulation	Type	Description
`blb`	(C)	bilabial
`ldb`	(C)	labio-dental
`dnt`	(C)	dental
`alv`	(C)	alveolar
`rfx`	(C)	retroflex
`pla`	(C)	palato-alveolar
`pal`	(C)	palatal
`vel`	(C)	velar
`lbv`	(C)	labio-velar
`uvl`	(C)	uvular
`phr`	(C)	pharyngeal
`glt`	(C)	glottal