writing in east asia

[estimated reading time 26 minutes]

the language frame

east asia is home to myriad languages but three are very dominant — chinese, japanese and korean. today we will look at how these languages communicate in writing and how that compares to english.

the first thing to note about this is that these languages share certain elements but in many ways are extremely different. chinese and japanese have many things about their writing systems that are similar enough to be mutually-comprehensible in a way. but the languages are in no way grammatically linked. this may seem odd but chinese is a language far closer to english and other western languages than either japanese or korean. looking at this other pair, japanese and korean, the first thing that becomes obvious is that these are realistically two divergent paths in the evolution of a single language that eventually broke far enough from each other to be separate. they share common grammar with little variation and a huge overlap of similar vocabulary, though with massive pronunciation differences.

with the background out of the way, we will look at their writing systems in more depth. english uses an arcane, cumbersome writing system developed millennia ago that is neither phonetic nor efficient. chinese has an even more ancient system combining meaning symbols with sounds but it is purely phonetic and extremely efficient. japanese combines a semi-phonetic version of the chinese character system, somewhat simplified and repurposed, with a pure-phonetic system of sound-only characters. korean uses a highly-efficient phonetic writing system that appears to be the best of all possible worlds but, through an odd twist of linguistic history, corrupts the whole enterprise by introducing complex unpronounced final characters that impact meaning and multiple simultaneous notations for the same phoneme.

the conclusion we will eventually arrive at is that an optimal writing system would take the phonetic absolutism of chinese and japanese kana and notate it with the phonetic system from korean, eliminating the meaning symbols of chinese and japanese and simplifying korean’s writing to be exactly what is pronounced without exception, regardless of meaning or tradition. this may be clearly understood to say, too, that, while there are many positive aspects to english as a language, its writing system has nothing to offer and is a disaster that would be better scrapped and forgotten as a historical error with no redeeming features. this is, however, the end of a longer road. to get there, it is beet to look at these written languages in terms of five areas of comparison — character sets, meaning vs sound, visual efficiency, learning complexity and phonetic accuracy. this ignores the spoken forms and grammar and focuses only on how they are written. this also avoids the added complexity of looking at chinese written in pinyin or japanese in romaji as irrelevant to the questions of the moment.

writing with character

to begin, let’s review the basics and look at how each of these languages appears on the page. a simple interactive greeting.

in english… hello. nice to meet you. how are you?

in japanese… こんにちは。はじめまして。お元気ですか？

in korean… 안녕하세요. 만나서 반갑습니다. 어떻게 지내세요?

finally, in chinese… 你好。很高兴认识你。你好吗？

as a point of comparison, english has 26 characters, each with two basic versions, each representing many possible sounds with a huge degree of overlap. it is a single character set that communicates only sound, doing even that very poorly.

japanese has three character sets. at this point, it’s useful to clarify that we will use roman transliteration for all technical terms here. these three are kanji, hiragana and katakana. kanji are simplified chinese characters borrowed for meaning and given approximately two potential sounds (sometimes more, occasionally only one). hiragana and katakana are generally equivalent and simply function as two options for the same sound characters for different purposes (usually hiragana is grammatical while katakana is foreign and special words but this is a loose characterization at best). kana are phonetic and do not vary in reading but they are not simple phonemes — most are a group, leading to there being far more of them than individual sounds in a sound-poor language would require. for example, the first character in the example we just saw is こ, a combination of sounds /k/ and /o/, rather than using a single symbol for each sound. the first kanji character in this example, 元, denotes three sounds, /g/, /e/ and /n/. while there is no hard count on the number of kanji, an estimate in the ten-thousand range is close enough as only about 3000 are necessary for daily use and the rest are rare enough to be irrelevant. it is useful to note that most basic ideas are represented as kanji (犬, 猫, 魚) while imported concepts are written with katakana (コーヒー, ラジオ, テレビ) and grammatical or non-conceptual information is written with hiragana (さん, ちゃん, くん, examples of honorifics).

korean, while an extremely similar language in structure and content, has a completely different writing system, a brilliant model created in the fifteenth century to replace the chinese symbol characters used at the time — and, shockingly, used well into the twentieth century by many koreans and still understood by many in older generations. hangul, however, is a thoroughly modern approach to writing. it uses 14 consonants and 10 vowels, loosely shaped to indicate their sound, grouped in syllabic blocks. the first block in the example, 안, contains a silent placeholder (ㅇ), a vowel (ㅏ) and a consonant (ㄴ) representing the sounds /a/ and /n/. while this sounds like a perfect system and is likely the closest natural language evolution over time has ever produced, there is a shocking break with this phonetic paradise. when applied to korean, pronunciation of final consonants and their modification for non-sound reasons is frequent and is referred to as batchim, making spelling frighteningly difficult even for many native-speakers. this adds to the spelling difficulty of multiple potential writings for identical or nearly-identical sounds. english has no fixed relationship between its characters and sounds. japanese has a bidirectional relationship — only one way to write each sound and only one way to say each character, with a variety of caveats. hangul, however, has a mostly unidirectional link, one way to pronounce the written form but often two or more ways to write a single sound.

chinese is the other extreme from the english where we began this survey. it has approximately the same number of functional characters as japanese, modern chinese (in particular that of the mainland, usually called simplified as opposed to the traditional version used elsewhere by far fewer) being an evolution of the ancient chinese characters japanese also derived its kanji from. much like japanese, two or three thousand characters are commonly-known and useful for daily reading and writing, the rest being mostly arcane and/or specialized, less useful in a general context. how they function, however, is far simpler than their japanese counterparts. each symbol represents a single syllable and has a single reading. a sentence with ten symbols has ten syllables and can only be read one way (within a single language — reading it in cantonese as opposed to mandarin is certainly an option but this isn’t significant to the discussion). there is no distinction between grammatical elements and meaning.

the sound and the furry

english doesn’t differentiate its writing based on meaning, only sound. the link between pronunciation and writing should therefore be extremely strong but english’ ambiguous connection in both directions between sound and written characters results in ambiguity at best, often complete confusion. there is no way to accurately predict how a new english word will be written or spoken without having encountered it before. there are certainly rules but they are followed only intermittently and the exceptions are myriad and spectacularly nonspecific. an example may help.

sight
site
night
knight
nite
date
bait
eight
ate

in these nine examples, the first two have identical pronunciation but different meanings (vision, location). the next three are pronounced the same but again meanings vary (time, person, time). the final four vary only in their initial consonant sound (/d/, /b/ and the last two being identical with no initial) but again spelling of this is variable.

while many english scholars claim that this is not an issue with the structure of english writing as context makes confusion unlikely, it has two obvious issues. the first is that there is vast scope for the writing to be completely comprehensible but technically incorrect — in other words, communicating perfectly well but judged inferior as it is not using the expected spelling, while accurately representing the sounds, which is the whole point of written language in a non-symbolic context in the first place. the second is slightly more esoteric but a stronger argument against the system’s failures. if, as these scholars claim, perhaps even truthfully, context solves all problems of confusion between multiple alternate possible spellings, their existence is unnecessary and spelling standardization on a single model would be a vast improvement. the only reasonable argument to justify the confused spelling model for english, therefore, actually becomes the strongest justification for its replacement.

korean is the other language in the set where writing communicates sound without meaning. with the exception of the idiosyncratic batchim, the only difficulty here is that there is sound overlap between several characters. for example, ㅐ and ㅔ are realistically the same sound represented by two possible component characters, making it impossible to know which will be used in a word. there are certainly ways to make an educated guess but, as with english, the overlap is a problem. unlike english, however, this problem only occurs in one direction. there is no confusion for pronunciation — 개 and 게 vary by meaning (dog, crab) but there is no ambiguity when speaking them from the written version. yes, some traditional theorists will argue that there is a slight difference between these two (and other pairs and groups of) component characters in hangul but this minimal difference is insignificant to the point — whether a small difference occurs is not an argument against simplification. this being said, as a simple sound-text representation, hangul has english beaten by orders of magnitude.

looking at the relationship between sound and meaning is very different in the other two languages we’ve been focusing on.

chinese is written with a single character set but some pieces of those characters provide meaning while others determine sound. this is not necessarily obvious at first glance. a good example (the majority of characters work in this way to some degree but this one is an obvious simple combination) is 洋 (ocean). this combines the character 水 (water, in slightly-modified form to fit in the space) with 羊 (sheep) with the result that it signifies the meaning of water (shuǐ) with the sound of sheep (yáng). this allows an interesting result — it is in many cases possible to understand the meaning of a chinese sentence without necessarily remembering how to pronounce it. while this can be awkward, it certainly makes it easier to communicate in partial-comprehension situations.

the relationship between written and spoken sound is unidirectional but fixed. there are many ways to write the syllable yáng, for example (洋 and 羊 as we have already seen but 杨, a type of tree, also immediately comes to mind with many others existing). there is, however, only one possible pronunciation for these three characters. there can be no confusion when speaking from written text.

japanese has a different approach to the link between pronunciation and meaning. most kanji have at least two possible readings (generally onyomi and kunyomi are used to describe them, the first being an approximation of the original chinese pronunciation modified to fit japanese phonetic norms, the second being a usually multisyllabic japanese-only reading, though there are many characters that have multiple possibilities and some with only one). this could easily lead to confusion but rarely does except in young children and beginner language-learners as there is a rigid pattern determining when each is used and this pattern has no common exceptions. so while 学 in 学校 (school) is pronounced /gaku/, in 学ぶ (learn) it is /mana/. having mastered the pattern, however, reading kanji for sound is a simple exercise.

kana (both hiragana and katakana) are bidirectional sound pairs. while there are contexts where one or the other is used, assuming a single character set in any situation means writing is no more than direct translation of sound. the phonemic group /ka/ is always written in hiragana as か, /go/ as ご, etc. it may be useful to point out here that there are three extremely common exceptions to this — は, を and へ, when used as particle markers being generally pronounced as /wa/, /o/ and /e/ rather than /ha/, /wo/ and /he/. but, as this is a predictable thing, it likely poses no real issues, unlike english exceptions where they appear mostly random unless the history of a word is already known.

of course, english gets its spelling system from its history, the language being a mashup of old german and old french with various other languages’ spelling systems added. this is not an excuse for, a half-millennium later, however, continuing to use a standardized spelling system that is perhaps the least useful of any modern language. it is, though, an explanation for its existence, compared to kana, created specifically for japanese, or hangul, designed with korean in mind — the alphabet used in english was created more than a thousand years before english was first spoken and it was adopted without any real consideration for its complete unsuitability to the task as english was a vulgar language of the common masses and teaching a new writing system was simply never considered — today it is a valid option but tradition and laziness prevail in english-speaking locations, leading to the continued use of this problematic character set.

the quick without the dead language

turning to efficiency of written communication, there is a comparative statistic that is useful to remember. an average english book contains about 350-450 thousand characters (about 80000 words but “word” is something that has far less meaning outside english, especially in symbolic languages so becomes useless as a comparative tool). looking at a similarly-average book in chinese of approximately similar content length, it contains about 100-140 thousand characters, making it about three times more dense.

this is certainly a very generalized statistic but it is a good place to begin and these are the two extremes of the scale. korean with its slightly longer constructions, while there is no official data on the subject that i’m aware of, is likely close to the chinese number but a little higher. japanese, having extra grammatical information in kana, requires additional characters for the same content, making its writing less efficient again than korean in most cases.

there is another question regarding efficiency, though, which will immediately come to mind for anyone speaking korean or japanese. these are topic-focused languages while english and chinese are subject-focused. it is nearly impossible in modern standard english to eliminate either the subject or the verb. chinese more frequently eliminates state verbs, occasionally others, while rarely dropping subjects. japanese and korean, however, rarely need subjects to be objectively stated. this is not the only difference allowing compressive efficiency in the languages but it is likely the one with the largest impact on overall information density.

there is a comparison to be made between these languages on speaking speed, too, though this is not an issue of writing — japanese and korean are spoken much more quickly than chinese and that faster than english. from an oral communication standpoint, the same information can be communicated just as quickly in korean or japanese as mandarin but english is far slower. while unrelated to the writing, it is a relative important thing to keep in mind as a comparison.

to take the example from earlier, again…

english… hello. nice to meet you. how are you?

japanese… こんにちは。はじめまして。お元気ですか？

korean… 안녕하세요. 만나서 반갑습니다. 어떻게 지내세요?

chinese… 你好。很高兴认识你。你好吗？

an easy and somewhat representative count can be made.

english 37, japanese 20, korean 29, chinese 14. while this doesn’t take into account many aspects of written language and this is a spectacularly-formalized statement in korean, it is a clear demonstration of what has already been said, that english has an overwhelmingly-inefficient writing system, chinese being vastly more and japanese and korean being in the middle with their agglutinating syntax compared to chinese’ tense-neutral non-conjugating system necessitating greater use of characters for similar meanings. it may be useful to note that these are relatively-formal versions of these three statements in all four languages.

hello world?

english is not the most difficult language to learn. with myriad unnecessary cases and complex structure systems, finnish, dutch and german are vastly more time-consuming for new language learners while arabic and hindi’s (among other languages in the region) clumsy linked-script writing systems are difficult for those who haven’t acquired them as children compared to discrete-block characters in the languages we have discussed today. there are many things about the english writing system, however, that make it perhaps the most difficult writing system for learners seeking spoken fluency and the ability to switch between speaking and writing in both directions.

english uses a multi-version character set. it has a completely meaningless secondary set of characters called capital letters that are a traditional formal affectation adopted centuries ago but with no lasting meaning. there is an expectation of an understanding of their use, despite no change in either meaning or pronunciation and no generalized agreement on many aspects of their standardization. this is compounded by the fact that, unlike the other three languages discussed here, there is a completely separate written form and this is only vaguely-standardized and extremely regionally-variant. it may be useful to note that each of these variant handwritten forms also has at least two version of each character, a small and large version, often several additional decorative versions and variations. this is compounded in difficulty as these various regional forms, while the text is somewhat standardized, are not mutually legible across boundaries, the most notable difficulty being between linked scripts from western europe and north america where native speakers can understand the text but often can’t interpret the writing to quickly form letters for comprehension.

while comparatively complex, chinese characters have only one form within a language. variations exist between simplified and traditional but a single language generally uses only one so the separation is insignificant for learners. whether typed or written by anyone of any age, provided the writing is generally neat, any reader can read any character without difficulty. japanese characters are similar in this way. complex characters are often replaced with phonetic equivalents for young people and learners but this doesn’t require additional knowledge, just an awareness that it happens (犬 becomes いぬ and 魚 becomes さかな in children’s books, for example — dog, fish). in korean, where no symbolic characters need to be learned, a single non-divergent non-variant writing system simplifies the whole writing-speaking-learning process, allowing the 24-component set to be the only thing necessary to learn for reading comprehension — beyond, of course, vocabulary and grammar, which are necessary for spoken comprehension before anything is written or read, anyway.

this can be compared to english non-phonetic writing where leaners are ridiculed for not differentiating “site” and “sight” or “but” and “butt” (identical pronunciation, divergent meanings) and where there is continuous fighting about capitalization (pc, tv and internet just to name three words where capitalization status is debated, of the thousands where this is a constant issue — often one that leads to debates in a work or school context that simply should never have existed).

alpha, bravo and charlie

phonetic accuracy is a somewhat odd point of comparison for all four languages’ writing systems we’re looking at and this is where they don’t quite hold up in the same way.

english has no phonetic accuracy. the reason for this is simple. it has no unified and standardized basis for its spelling system and there is vast overlap between characters’ pronunciation, often independent of context. there is frequently disagreement even between native speakers (adult, progress, thorough, data, etc) of correct pronunciation and spelling has absolutely no solution to these issues, regardless of its dictionary standardization status, which is spurious at best as many common dictionaries disagree on pronunciation and usage of frequently-used words like adult and data. it may be worth noting that this is not an original observation. english has evolved over time without oversight or modernization and, to the extent that anything is intentional in language, the lack of standardization in english is purposeful rather than oversight. this doesn’t make it any less problematic for native speakers or leaners.

korean occupies an interesting place for phonetic accuracy. in most cases, it is pronounced exactly as predicted from the spelling and spelled as expected. with overlapping potential character components for certain sounds, however, and the oddly-arcane batchim rules, however, there are exceptions that make certain components of hangul distinctly phonetically-inaccurate. it is, however, overall an excellent example of how a two-way phonetic transcription and communication system can work. with minor upgrades and modernization, hangul could achieve a perfect score on this front.

japanese is purely phonetically-accurate once the rules have been learned. these rules are not quite as simple as most in korean or chinese but they are without exception so it is simply a matter of memorization. there are no silent characters and, while there is not a one-to-one relationship between characters and sounds in either direction, selecting which character represents each sound and which sound is represented by each character is a logical process that is both completely predictable and relatively straightforward. there is a slight difference in intonation emphasis between certain words but this is somewhat insignificant given pronunciation-to-text links — 雨 (rain) and 飴 (candy) have the same pronunciation with different intonation, for example.

chinese is also phonetically-accurate in the same way as japanese without any of the exceptions. it has a predictable speech structure where each character is a single syllable, leading to rapid text-to-speech processing. its complex symbol library is the only difficulty and the unidirectional transfer is potentially problematic, especially as it doesn’t have a pronunciation key system like japanese (hiragana written above or beside kanji to indicate pronunciation is common, especially for more difficult or less-frequently-used characters). while 雨 and 糖 in mandarin have only one pronunciation (not the same one in this case), yǔ and táng have many potential written forms, leading to this accuracy being extremely useful when reading written chinese but far less helpful when copying speech.

final thoughts

english is a language with many potential upsides. it is evolutionary so it quickly changes with trends to create new words — this may be positive or negative. it is extremely well-adopted worldwide — it is, in fact, difficult to find a location where english is not at least somewhat useful in communication and this really can’t be said for any other modern language, much in the way imperial latin was the standard across europe, west asia and north africa in the classical period and chinese was for most of asia for thousands of years, especially among educated classes. english, mostly as a result of american political and military dominance, has taken that role in the twentieth and twentyfirst centuries.

it is, however, hamstrung by its arcane, inefficient and cumbersome writing system (among other things but that is a topic for another day). a complete lack of phonetic accuracy, inefficient visual presence and absence of spelling standardization or even pronunciation agreement, compounded by multiple potential character forms and whole competing written character sets by location and generation lead english writing to be accepted but simply not fit for use.

looking at these other writing systems has led to various conclusions. chinese is extremely efficient with its single set of characters combining phonetic accuracy with high-compression meaning. japanese takes that same structure and augments it with phonetic characters, leading to less efficiency but easier learning, especially for children and beginners. korean eliminates the unidirectional transcription problem of many-characters-for-one-sound-group common to all chinese and much japanese writing but adds a problem already present in english, character overlap and, with batchim, slight variations in pronunciation not accounted for by the basic character components.

this is realistically the end of the comparison but it may be useful to make a comment about how this could inform future english modernization. this is no longer an objective view, however, so that must be admitted.

english writing is a useless system that must be replaced. what it can be replaced with could be a symbolic meaning-pronunciation system like that in chinese or a phonetic-transcription system like hangul. it would be ridiculous to adopt the mess of multiple character sets japanese currently uses — it works far better than the english alphabet but it is so plagued with problems it wouldn’t be worth the effort to switch when something far more useful already exists and i suspect japanese wouldn’t be using it today, either, if an option like hangul had existed and its continued use is a result of tradition and cost of change rather than any mistaken thoughts of it being the best possible approach to writing.

what we have seen is a comparison between how a symbolic system (chinese) and a phonetic system (korean) compare. korean is less efficient in its writing but vastly easier to learn. assuming pronunciation vagaries like letter duplication and silent or implied final consonants were eliminated before it (or something like it) was adopted as a new writing system for english, this would mean increased efficiency (though not as much as a purely-symbolic system) and complete context-independence, allowing learners and advanced readers including software-based text-to-speech and future artificial intelligence to accurately read without difficulty, even words not previously encountered.

while there are many things about english that would be better with modernization and simplification, this is a completely feasible potential upgrade that could take place in a single generation with minimal effort — a twenty-character alphabet phonetic system loosely based on hangul would half the length of english books (an exciting environmental improvement to say the least) and make reading and writing a much easier and more intuitive process for children and learners, likely eliminating at least a significant minority, potentially a majority, of text-based learning disabilities coming from stream-written languages like english and non-phonetic transcription systems like the english alphabet.

as a side-note, it may be useful to see what impact such a change would have on other western languages that share the same script (or at least the same idea of script). it is likely french, spanish, german, italian and various less-widely-spoken languages would quickly shift to a phonetic system if english made the upgrade, probably within a few years, a decade at most. languages using other alphabets (russian, hebrew, greek) would likely follow suit, perhaps just as quickly, though likely slightly more slowly, especially in the case of russian, where tradition is a cultural obsession. indigenous languages, especially in the americas, that have been very poorly-served by the english alphabet and this could mean a resurgence in their functional usability, especially in the central and south american areas where languages like guaraní, quechua and nahuatl are frequently difficult to transliterate using vague english spelling and letters.