Wolfgang von Kempelen's Speaking Machine

Science and technology in Hungary, Speech synthesis, List of Austrian inventors and discoverers, Hungary
A replica of Kempelen's Speaking Machine, built 2007-09 at the Department of Phonetics, Saarland University, Saarbrücken, Germany

Wolfgang von Kempelen's Speaking Machine is a manually operated speech synthesizer that began development in 1769, by Austro-Hungarian author and inventor Wolfgang von Kempelen. It was in this same year that he completed his far more infamous contribution to history: The Turk, a chess-playing automaton, later revealed to be a very far-reaching and elaborate hoax due to the chess-playing human-being occupying its innards.[4] But while the Turk’s construction was completed in six months, Kempelen’s Speaking Machine occupied the next twenty years of his life.[2] After two conceptual “dead ends” over the first five years of research, Kempelen’s third direction ultimately led him to the design he felt comfortable deeming “final”: a functional representational model of the human vocal tract.[3]

First design

Kempelen’s first experiment with speech synthesis involved only the most rudimentary elements of the vocal tract necessary to produce speech-like sounds. A kitchen bellows, used to stoke fires in wood-burning stoves, was invoked as a set of lungs to supply the airflow. A reed extracted from a common bagpipe was implemented as the glottis, the source of the raw fundamental sound in the vocal tract. The bell of a clarinet made for a sufficient mouth, despite its rigid form. This basic model was able to produce simple vowel sounds only, though some additional articulation was possible by positioning one’s hand at the bell opening to obstruct airflow. The physical hardware for constructing the nasals, plosives and fricatives that most consonants require was not present, however. Kempelen, like many other early pioneers of phonetics, misunderstood the source of the perceived “higher frequencies” of certain sounds as a function of the glottis, rather than as the function of the formants of the entire vocal tract, so he abandoned his single-reed design for a multiple-reed approach.[2][3]

Second design

The second design involved a console, similar to that of a musical resonant length was not crucial to the creation of the high-frequency components of certain vowels and fricatives, so he tuned them all to be the same pitch for the sake of consistency between letters. While not all letters were represented at this point, Kempelen had developed the technology required to produce most vowels and several consonants, including the plosive “p”, and the nasal “m”, and thus was in a position to begin forming syllables and short words. However, this immediately led to the primary flaw of his second design: the parallel nature of the multiple reeds allowed for more than one letter to be sounded at a time. And in the process of building syllables and words, the sonic “overlap” (now referred to as co-articulation) rendered sounds very uncharacteristic of human speech, undermining the intention of the design altogether. Kempelen comments:
“In order to continue my experiments it was necessary, above all, that I should have a perfect knowledge of what I wanted to imitate. I had to make a formal study of speech and continually consult nature as I conducted my experiments. In this way my talking machine and my theory concerning speech made equal progress, the one serving as guide to the other.”[3]
“It was possible, following the methods I’d been using, to invent separate letters, but never to combine them to form syllables, and that it was absolutely necessary to follow nature which has only one glottis and one mouth, through which every sound emerges and which gives a unity to them.”[2][3]
Thus, Kempelen began work on his third, and ultimately final design, which itself was in many ways a “close-as-possible” representation of the physiology of the vocal tract.

Third design

The third approach followed a similar design to the first, which was conceptually more faithful to the natural design of the human vocal tract than the second. It consisted, like before, of a bellows, a reed and a simulated mouth (this time made of French, Italian and English (German was possible, but required a greater skill-level by the operator, due to the more frequent use of consonants in the German language). Its greatest limitation was the bellows, which, although they were six times the capacity of human lungs, ran empty of air much faster than that of its human counterpart. Because the design was based on a single reed as the glottal sound-source, he had none of the problems of co-articulation that came inherently with the second design. But that single reed also meant that the Speaking Machine “spoke” in monotone[4]. Kempelen expended some time to try and introduce several prosodic pitch-variation mechanisms into the reed assembly, but to no avail. He decided to leave the design to be improved upon by the next batch of experimenters. All of these important additions for the third design came from the two decades of intensive research of the vocal tract in relation to spoken languages by Kempelen, for which the behavior of each crucial physiological element of speech production was scrutinized and replicated acoustically and/or mechanically.[3]

A significant contribution

Shortly after the completion and exhibition of his Speaking Machine, in 1804, von Kempelen died, though not before publishing an extremely comprehensive journal of the past twenty years of his research in phonetics. The 456 page book, titled Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine (which translates to The Mechanism of Human Speech, with a Description of a Speaking Machine, published in 1791)[2][4], contained every technical aspect of both Kempelen’s construction of the Speaking Machine (including the preliminary designs) and his studies of the human vocal tract.[3]

In 1837, Sir Charles Wheatstone resurrected the work of Wolfgang von Kempelen, creating an improved replica of his Speaking Machine.[3][4] Using new technology developed over the previous 50 years, Wheatstone was able to further analyze and synthesize components of acoustic speech, giving rise to the second wave of scientific interest in phonetics. After viewing Wheatstone’s improved replica of the Speaking Machine at an exposition, a young Alexander Graham Bell set out to construct his own speaking machine with the help and encouragement of his father.[4][5] Bell’s experiments and research ultimately led to his invention of the telephone in 1876[4], which revolutionized global communication.

In 1968, Marcel van den Broecke (University of Amsterdam) built a replica as part of a MA thesis, about which he reported in "Sound Structures", Marcel van den Broecke, Vincent van Heuven and Wim Zonneveld (eds.), chapter 2, p 9-19: "Wolfgang von Kempelen's Speaking Machine as a Performer", Foris Publications, Dordrecht-Netherlands/Cinnaminson-USA, 1983. Acoustic predictions using N-tube approximations of the vocal tract and applying them to the replica's characteristics showed what had already been established perceptually, namely that the machine could only produce two vowel-like sounds, viz. an /a/-like vowel and an /o/-like vowel. Of the consonants produced, the general purpose plosive is very convincing. A general purpose nasal can also easily be identified, but sibilants and the rattling /r/ are as unpleasant as eye witness von Windisch reported two centuries earlier.


