Just a place to jot down my musings.

Sunday, July 19, 2009

On spellings, part two

Having looked at one particular Hindi example last time, it may be helpful to begin from a bird's eye view this time by looking at what spelling does in general. Before proceeding any further, let me attest to my complete ignorance of the various East Asian / CJK writing systems; I have no idea how much of what I'm going to try to sketch out here applies to those systems. (Needless to say, cuneiform, hieroglyphs, Mayan glyphs, Germanic runes, and a whole host of other writing systems also fall into the category of writing-system-of-which-I'm-woefully-ignorant, but they're different from CJK in that they fall into the subcategory of writing-system-of-which-I'm-woefully-ignorant-and-of-which-any-unfortunate-reader-of-this-blog-is-also-likely-to-be-ignorant while the CJK systems belong to the writing-system-of-which-I'm-woefully-ignorant-but-with-which-some-particular-unfortunate-reader-of-this-blog-may-perchance-be-familiar subcategory.)

Okay, with that giant caveat out of the way, let's begin. A system of writing for a language is a way of committing linguistic elements to a material less ethereal than air—whether this be sand or stone is immaterial. It should be recognized off the bat that the basic elements of the writing system, the "graphemes", so to speak, need not necessarily exist in injective correspondence with any of the language's basic units, whether they be morphemes or phonemes or tonemes or whatever. (The best example of this is the English alphabet: aside from the letters 'a', 'o', and 'i', which correspond to one-letter words, no letter necessarily or uniquely represents a basic phonetic or grammatical element of the English language. I could be overlooking something big here, so please correct me.) The point of noting this is to acknowledge that the relation between the writing system and the language is fundamentally conventional: the written entity is but a signifier that signifies a particular linguistic entity, just as that linguistic entity is itself a signifier for something else. 

At this point I've definitely stepped into a deep and messy dungheap that the philosophy of language attempts to analyze, and I'm not even going to pretend to extricate myself or justify some philosophical position over another. All I'm trying to say here, and I think this is fairly uncontroversial in and of itself, is that the relationship between a writing system and a language is a learned one, and hence this relationship can change over time as the relations among the many signifiers and signifieds evolves over time. Sometimes this can involve a radical shift that forces people to completely relearn these conventions, as happens when an existing writing system is pressed into use for a new language, or when the same language comes to be written in a new writing system. 

What has been said up to this point holds true, I think, for all writing systems, even the CJK systems that I'm trying to avoid. However, I think the concept of spelling makes sense only within a broadly alphabetic system (I'm including syllabaries, abjads, and abugidas into this one, and yes, I realize hiragana and katakana fall into this bucket), although I hypothesize that an analogous concept must exist within the CJK system as well. The reason is this: for the entire graphical system of signs to work as a method of communication, it should be possible for individuals to recognize graphical signifiers created by others. I'm using 'recognize' in two different, possibly chronologically successive, ways here: first, they need to be able to 'recognize' the graphical signifier as a signifier in general, that is as a visual artifact expressly designed to communicate a linguistic idea; second, they need to be able to 'recognize' the particular signified being referred to by the signifier. This may seem absurdly complex, but it should be clear that we need both steps.

Take a simple substitution cipher of the form that I used in order to pass on secret messages to friends when we were in the fifth grade. We treated the English alphabet as the addition group Z / 26 (and no, I had no fricking clue what that meant when I was in the sixth grade; I'm merely using unambiguous terminology that I picked up in college), and applied a simple translation of +1 or -2 or whatever took our fancy. This meant we could pass messages around that looked like "mfu't ejtdvtt ipx up ublf pwfs uif xpsme upebz bu mvodi." To the uninitiated, this may have looked like a garbled assortment of letters spelled out by a maniacal dyslexic monkey hammering away at a typewriter, but we cool kids (or nerds, as we were unfairly maligned) recognized these strings as intelligible messages and responded enthusiastically to them. Of course, once the jocks got wind of what was happening, they realized we were using codes of some sort; at this point they had attained enlightenment of the first order. However, as long as they didn't know precisely what the transformation was, the higher enlightenment eluded their grasp.

So now it should be clear what I'm driving at. Spelling is the arrangement of graphical units in particular sequences agreed upon by convention, and good spelling results in the production of well-formed graphical signs that unambiguously signify linguistic entities as agreed upon by convention. This definition of good spelling is vague, but should pass empirical tests. For instance, take one of the many bugbears of internet comments: the word "definately". Now this is indeed a well-formed graphical sign, for it's unlikely that someone will pass over it as a meaningless jumble of letters; and it certainly unambiguously refers to a linguistic concept of certainty. It falls short of achieving the status of "good spelling", however, as it fails to be recognized as one by convention. Take a different common error, this time the infuriating substitution of "you're" by "your", and vice versa. Now, this may be better described as a grammatical mistake; however there are certainly occasions when even individuals who are aware of the major grammatical distinction between the two words unintentionally substitute them, perhaps while typing in a rush. In such a situation, this substitution may be better understood as a spelling error, for while it passes the first and third requirements, it fails the unambiguity requirement because it unintentionally refers to a different concept while being intended for another. (I will confess that I haven't fully fleshed out the lack of ambiguity idea in my head, for there are situations where a language permits homographs and this definition in its current form would classify them as misspellings. Perhaps we ought to allow some room for intention here, or perhaps we should allow convention to override the lack of ambiguity requirement. In either case, I don't think this forms an insurmountable obstacle to the bigger picture of spelling that I'm trying to create.)

Great, now we have an idea of what good spelling is. But the next question is, how does good spelling arise? More precisely, what forces influence the conventions that define good spelling? Since it's true (or so we suppose) that the relationship between the graphical signifier and the linguistic signified is arbitrary, it is possible in theory to pick totally arbitrary symbols for every single possible signifier, and then to memorize the relation between signifier and signified. This is true to some extent of CJK ideograms (only as a partial and inaccurate oversimplification of the whole picture, I know, I know), and perhaps to a somewhat greater extent of Egyptian hieroglyphs. But unless you want to memorize a very large number of arbitrary symbols that lack even the structure and composition of CJK ideograms, this is not a great system. The alternative is to capture some approximation of the pronunciation of a linguistic entity, and not its semantic field. After all, if the graphical signifier reproduces the pronunciation of the linguistic entity, then any linguistic entity can be adequately represented in the writing system in a manner acceptable to all speakers / readers, right?

And with that, we're back to where we left off with the previous post. A purely pronunciation-based spelling is highly limited by spatial, ethnic, class, and temporal variations in pronunciation, and is also likely to conceal morphological structures within the language. Case in point: modern Turkish spelling. Because Turkish is governed by vowel harmony, its grammatical suffixes occur in a variety of "shapes" that are governed by the phonetic environments they occur in. The most complex example I'm aware of is the enclitic -dir, which functions as the third person copula. This copula takes no less than eight different shapes: voicing or devoicing the initial 'd', and then choosing among four vowel harmonizing variations of 'i'. Under modern Turkish spelling rules, there are thus eight different ways of writing this same suffix—even though the choice of variant is governed by two incredibly simple and entirely predictable phonological rules. Thus, it is easy for the learner to read Turkish, but to recognize that -tür and -dir are in fact the same grammatical suffix is slightly trickier. I wonder if it would not be more logical, in some sense, to use the same spelling for the suffix everywhere when it is a truth universally acknowledged (among native Turkish speakers anyway) that the pronunciation of this suffix must change in a certain entirely regular way.

Thus, even within one single dialect of a language arbitrarily chosen as the standard of pronunciation and good spelling, a strict pronunciation-based spelling pattern can obscure existing, meaningful morphological patterns within the language. And as the pronunciation of that dialect evolves over time, its readers face two difficult choices: either to change spellings over time, and thus to violate the initial conventions of good speech; or to retain spellings according to the older pronunciation, thus preserving valuable morphological and etymological information, but drifting away from contemporary pronunciation. Urdu is a prime example of this. Its Perso-Arabic script is gorgeous to look at, and far more suited to elegant calligraphy than staid Devanagari. Yet learning to read the language using the Perso-Arabic abjad (particularly in nasta`līq) is a nightmare, thanks to the script's inability to deal with the complex vowels and consonant clusters strewn throughout Hindustani / Urdu (the line between these two is indiscernible at times, and a bleeding gash at others). Devanagari, unsuited as it is to Hindustani, comes out far ahead in this race. As far as pronunciation goes, the script is a terrible choice. But when it comes to words of Persian or, even better, Arabic origin, the Perso-Arabic script beats Devanagari hollow. Possessing all the consonantal symbols needed to represent these words, the Perso-Arabic script faithfully reproduces the exact shape of the word as it was in the original, preserving a vast amount of information on the Semitic roots and stems that is utterly lost in modern (and probably medieval too) pronunciation, and that is simply unrepresentable in Devanagari in any meaningful way. Try writing فعل مضارع in Devanagari :D

This brings to mind a weird idea I held for the longest time as a kid. A certain word pronounced tĕh-kī-kāt in India and spelled tah kī kāt or tahkīkāt is the normal word used to refer to investigations, usually carried out by the police. I always thought of it as three different words that meant "the kāt of a tĕh", whatever those two words meant separately (since I could never find them in even the best dictionaries). I was utterly shocked, totally blown away, when I realized, perhaps seven or eight years later, that the word was in fact the Arabic word تحقيقات which had undergone some, uh, alterations in pronunciation in India. On the one hand the Devanagari script did a good job of representing the actual pronunciation of the word as it normally is in India (although there are diacritics used in Devanagari that can in theory represent the 'q' sound); on the other hand it wasn't merely useless but actually actively retarded my efforts to understand the word's meaning.

I've expended over three thousand words on this topic so far, but I'm still nowhere near proposing a flawless substitute to our existing problems of spelling when using alphabetic systems. The mathematician in me is willing to accept an infinite stream of arbitrary graphical symbols as representations of actual semantic content, thus allowing us to actually bypass linguistic structure (although I can already hear poststructuralists baying for my blood); the Sanskritist in me wants to forget the idea that language in all its fluidity can ever be graven in stone, and wants to revert to chains of oral transmission of knowledge. In some ways, I think Sanskrit does get the best deal of them all: a religious faith in the value of accurate recitation of the scriptures combined with exacting morpho-phonological rules (aka sandhi, the bane of all first year Sanskritists) that together do a great job of defining both spelling and pronunciation. On the other hand, this does mean you have to spend twelve years in a forest āśrama studying the language before you can use it for anything. Sigh. There really is no such thing as a free lunch.


No comments:

Post a Comment

Why pearls, and why strung at random?

In his translation of the famous "Turk of Shirazghazal of Hafez into florid English, Sir William Jones, the philologist and Sanskrit scholar and polyglot extraordinaire, transformed the following couplet:

غزل گفتی و در سفتی بیا و خوش بخوان حافظ

که بر نظم تو افشاند فلک عقد ثریا را


into:

Go boldly forth, my simple lay,
Whose accents flow with artless ease,
Like orient pearls at random strung.

The "translation" is terribly inaccurate, but worse, the phrase is a gross misrepresentation of the highly structured organization of Persian poetry. Regardless, I picked it as the name of my blog for a number of reasons: 
1) I don't expect the ordering of my posts to follow any rhyme or reason
2) Since "at random strung" is a rather meaningless phrase, I decided to go with the longer but more pompous "pearls at random strung". I rest assured that my readers are unlikely to deduce from this an effort on my part to arrogate some of Hafez's peerless brilliance!

About Me

My photo
Cambridge, Massachusetts, United States
What is this life if, full of care,
We have no time to stand and stare.
—W.H. Davies, “Leisure”