A Very Brief History of the Chinese Language

I very much enjoyed Hongyuan Dong’s A History of the Chinese Language, which packs about as much information as possible into 200 pages while being about as readable as you can hope for. Here is 3000+ years of linguistic history further compressed into about six screens:

1. The Sino-Tibetan Language Family

Chinese is in the Sino-Tibetan language family. Other languages in the family include the Tibetan languages, Burmese, and other less well known languages scattered around the Himalayas and Southeast Asia.

The Chinese languages have no close cousins, as the other groups of Sino-Tibetan languages diverged many thousands of years ago and are not obviously similar.

Some linguists think the Tai-Kadai (including Thai and Lao) and the Hmong-Mien languages of southern China and Southeast Asia are related, but the consensus view is that the similarities are due to millennia of contact, not a common origin. Vietnamese is quite similar to Chinese on the surface, but this is certainly due to contact.

2. The Varieties of Chinese

Much ink has been spent on the question of whether the varieties of Chinese are “languages” or “dialects”. In China they are generally called 方言 “local speech.” 

The regional varieties are classified in seven main groups. The Mandarin group includes the standard language, based on Beijing speech, and a vast swath of dialects in northern, central, and southwestern China. The other groups are mostly spoken in the southeast and include Cantonese, Shanghainese, Taiwanese Hokkien, and Hakka among others.

The varieties are not just different accents; they are different enough that speakers either cannot understand each other at all or only with difficulty, comparable to speakers of Spanish, Portuguese, French, and Italian. The varieties are, however, tied together by a shared formal written form (previously Classical, now Mandarin), and of course the standard language is increasingly present everywhere.

Non-Chinese languages are also native to frontier areas of Northern, Southern, and Western China, isolated pockets in Southern China, and indigenous Taiwanese.

3. Old Chinese: Confucius’ Language

Old Chinese covers the period from the earliest inscriptions in the 1000s BC through the Han Dynasty in the third century. There is an extensive written corpus including the Confucian and Daoist classics, and in fact until the 20th century, Classical or Literary Chinese based on this form of the language was the main written standard. 

Although we have many Old Chinese texts, it is still difficult to reconstruct the pronunciation because Old Chinese was never recorded in a phonetic script. The main sources of information on the pronunciation are:

1. The Classic of Poetry 詩經, a collection of poems with a clear rhyming structure. We can assume that if two characters rhyme in these poems, then they would have rhymed in spoken Old Chinese.

2. Chinese characters themselves often contain a component that is a phonetic “hint”, and we can assume similar pronunciations for sets of characters with the same hint (e.g. the element 皮 indicates that 疲被陂 should have been pronounced similarly).

3. Extrapolating backward from Middle Chinese pronunciation, which we know a little more about.

Linguists have done this and reconstructed a pronunciation that looks quite unfamiliar to us these days. It lacks tones and it contains many consonant clusters unlike anything in the modern Chinese languages. For instance 兼葭蒼蒼 白露為霜 is jiān jiā cāng cāng bái lù wéi shuāng in Mandarin, but is reconstructed as something like kem kra tsang tsang brak praks gwraj srang. Of course, Old Chinese was spoken for over a thousand years in a vast empire, so the reconstructed pronunciation only represents its general structure and not any specific real person’s speech.

As for the grammar of Old Chinese, its syntax is actually quite different from modern Chinese, but neither language has much inflection or complex morphology. There might have been more inflection in archaic forms of Old Chinese, vestiges of which survive today in the form of characters that are used for two distinct but related words (e.g. 傳 is used for both chuán “to transmit” and zhuàn “a record”) but it is difficult to reconstruct that sort of thing from non-phonetic written records.

4. Middle Chinese: Li Bai’s Language

Middle Chinese covers the period from the fall of the Han to the Song dynasty in the 13th century. This period encompasses the Tang dynasty poets such as Li Bai, who are still widely read.

During the Middle Chinese period, Chinese scholars started to write about how their language was pronounced. With dictionaries and tabulations such as the Qieyun (切韻) , we have clear indications about which characters were pronounced the same in the prestige dialect, as well as somewhat less clear indications about their phonetic realization.

The dictionaries and tables give us most of our knowledge about the abstract phonological structure of Middle Chinese, but the precise phonetic details are mostly reconstructed by comparing the present-day dialectal forms. To a lesser extent, we can also learn about the pronunciation by examining the forms of loanwords from Sanskrit into Chinese and from Chinese into Japanese, Korean, and Vietnamese.

The sounds of Li Bai’s language are recognizably Chinese, though no longer intelligible to a Mandarin speaker. Like Mandarin, it had four tones (although they don’t match up one-to-one with the tones of Mandarin) and the syllable structure was similar, though several mergers and splits separate it from modern forms in the details.

By this time, the vernacular language had diverged from the classical language still used in most writing.

5. Vernacular Writing and Early Mandarin

Although most people wrote in Classical Chinese, we can see new developments in the grammar and vocabulary of the language in two ways:

1. Vernacular writing such as folk songs, plays and novels

2. New constructions popping up in otherwise classical documents

Using these two methods, many of the grammatical and lexical differences between contemporary Mandarin and Classical Chinese can be traced back several centuries. We can assume that these changes accumulated gradually in the spoken vernaculars.

By the start of the Yuan dynasty in the 13th century, the spoken language of northern China was recognizable as an early form of Mandarin.

6. Modernity and Standardization

Around the end of the Qing dynasty at the start of the 20th century, linguistic reform became associated with modernity. Intellectuals spurred a change from literary Chinese to writing in vernacular Mandarin, and this was made official by the Republic of China government. Both the Mainland and Taiwan governments established official standard spoken languages based on Beijing Mandarin. Many words for modern scientific, political, and economic concepts were either coined using Chinese word roots to translate foreign words or borrowed from Japanese terms that had themselves been coined using Chinese word roots.

7. Chinese Characters

Most of what you read about Chinese characters is half-truths, this included.

The oldest known Chinese characters are inscribed on the “oracle bones” of the 2nd millennium BCE. These are already Chinese characters proper, not merely pictographic proto-writing. Therefore, there must have been earlier stages that have not been preserved.

Character shapes recognizable as early forms of those in use today were first standardized in the Qin dynasty (3rd century BCE).

Characters come in several types:

1. A few characters are pictograms. 

2. A few characters are pictograms with added marks to indicate abstract meaning.

3. A few characters are compounds of pictograms.

4a. Most characters are compounds of a phonetic element and a semantic element.

4b. Some characters have a semantic hint, but the other element is not purely phonetic.

5. A few characters are re-purposed or variant forms of other characters.

8. In Conclusion

In English and many other world languages, most of what we think of as the forces shaping the modern forms are thought of as coming from outside: Latin, Greek, French, Christianity, the colonial empires and globalization. Chinese linguistic history is much more inward-facing and tends to draw attention to deeper time scales. In both cases, today’s linguistic forms are deeply marked by the traces of linguistic history as well as cultural traditions regarding literature and identity.