Interviews
Leave a comment

Word-Meaning At The Heart Of Linguistic Modelling


Indic Academy is a partner for the 6th International Sanskrit Computational Linguistics Symposium scheduled to take place between October 23-25 at IIT, Kharagpur, West Bengal

Modern linguists believe that the relationship between word and meaning is not natural but arbitrary. Shakespeare has pointed out that a rose by any other name would still smell as sweet. But Sanskrit does things differently. After all, this is the language where the word for a thing, “padaartha”, actually means, the “meaning of the word,” establishing an inseparable bond between word and meaning. The Indic way is to look at the world linguistically, where objects in the world are meanings of the words indicating them (Dr Nagaraj Paturi).

Iconicity is the root of all languages – an attempt to find a unity between the word, its sound and the thing itself. And in the Veda, it is achieved in full measure. “In extraordinary, detailed elaboration, in numerous correspondences and equivalences with both the phenomenal world, and an unseen world beyond, in energy, scope and ambition, and arguably, in sheer aural, textual, and gestural beauty, the Veda is unparalleled. That ambition is nothing less than what traditionalists have characterized as the “Universal Truth” in sound, form and meaning. And it includes as well, not just a formulation or representation of that Truth, but a valid, repeatable means of accessing and engaging with that truth.” (CK Shridhar)

While the Vedas themselves describe language as the divine feminine – Devi, with numerous hymns expressing a great focus on speech, its power, its mechanisms of production etc, Linguist Pranjal Koranne says the relatively modern study of language in India began with the need to preserve the Vedas themselves. Sanskrit scholars recognized the power of the language in all its minute details, and so equal energy was spent and efforts made to preserve the language of the Veda with great preciseness.

“Since the Vedas also contained hymns that were part of rituals, methods to maintain accuracy of speech were necessary, and a sophisticated system of phonetics was employed to codify the language of ritual to keep it free from change,” writes Koranne. The earliest attempts at codification can be seen in the Pada Paatha and Prathisakhya which studied the rules of word formation.

Vedāṇga or the six limbs of the Vedas – Śikṣā, Candas, Vyākaraṇa, Nirukta, Jyotiṣa, and Kalpa, while preserving the Vedas, also provided invaluable linguistic insights. The first four deal extensively and exclusively with the language and the rules governing correct pronunciation in order to unlock the power within the sound. The last two do not deal with language themselves, but allow an experience of the Vedas through ritual, giving precise details on time and process of execution of the same. And the Indian grammatical tradition with three schools of shabdabodha (speech recognition) viz. vyakarana, nyaya, and mımansa offer various approaches of linguistic analysis directly relevant to computational linguistics, write Akshar Bharati and Amba Kulkarni, Department of Sanskrit Studies, University of Hyderabad.

Three Sanskrit grammarians stand tall in their contribution to the foundations of grammar of languages worldwide – Panini, Katyayana and Patajali. Panini wrote the Ashtadhyayi, Katyayani expanded the work of Panini in his Vartikas and Patanjali wrote commentaries on both the above, known as Mahabhasya. Scholars refer to Panini’s work as the most ancient systematic work in grammar in the world.

Pranjal Koranne says for Computational Linguistics, the Paninian tradition has insights to offer in many areas of research. “The Ashtadhyayi and the tradition of commentaries that has developed over the ages provides us with perhaps the most comprehensive grammar ever developed for any language. So, though knowledge of Sanskrit is necessary to understand and apply this ‘grammar’, the framework developed is general enough to be applicable to many modern Indian languages as well as languages of other families. The other area of research is the format in which the Ashtadhyayi is presented and not just the use of coded language in the concise ‘sutra’ format but also the arrangement of rules by Panini has much to add to our knowledge of how algorithms, particularly for language processing and generation, are written.”

Amba Kulkarni, who will be presenting a paper at the conference has written, “The importance of Ashtadhyayi is three fold. The first one, as is well known, as an almost exhaustive grammar for any natural language with meticulous details yet small enough to memorize. Though Ashtadhyayi is written to describe the then prevalent Sanskrit language, it provides a grammatical framework which is general enough to analyse other languages as well. This makes the study of Ashtadhyayi from the point of view of concepts it uses for language analysis important. The third aspect of Ashtadhyayi is its organization. The set of less than 4000 sutras is similar to any computer program with one major difference the program being written for a human being and not for a machine thereby allowing some non-formal or semi-formal sutras which require a human being to interpret and implement them. Nevertheless, we believe that the study of Ashtadhyayi from programming point of view may lead to a new programming paradigm because of its rich structure.”

Some of the topics that will be covered in the October conference include Sanskrit Sentence Generator, Dependency Parser for Sanskrit Verses, Revisiting the Role of Feature Engineering for Compound Type Identification in Sanskrit, LDA Topic Modeling for pramāṇa Texts: A Case Study in Sanskrit NLP Corpus Building, On Sanskrit and Information Retrieval, Introduction to Sanskrit Shabdamitra: An Educational Application of Sanskrit Wordnet, Utilizing Word Embeddings based Features for Phylogenetic Tree Generation of Sanskrit Texts, An Introduction to the Textual History Tool, Framework for Question-Answering in Samskrta through Automated Construction of Knowledge Graphs, A Machine Learning Approach for Identifying Compound Words from a Sanskrit Text, Vaijayantīkośa Knowledge-Net, A Platform for Community-sourced Indic Knowledge Processing at Scale and Pāli Sandhi — A computational approach.

Dr Pawan Goyal, Assistant Professor, Department of Computer Science and Engineering, IIT Kharagpur, in this interview talks about the connect between Sanskrit and the Computational Linguistics.

In the general area of language, linguistics and computation does Sanskrit, its structure and vocabulary present any new insights? Would you say it is easier to work with or more difficult to work with?

PG: The importance of Sanskrit in general linguistics has been recognized in the West since the beginning of the 19th century. The discovery of Sanskrit in the West was the start of historical linguistics. The discovery of Panini’s grammar was key to the structuralist interpretations of language, and the beginning of general linguistics with de Saussure, who started his research on Sanskrit. Later theories of enunciation, putting emphasis on speech acts, basically reinvented notions discussed in India since Bhartṛhari. More recently, the Paninian computational system was recognized as a pioneer in information theory and informatics.

Sanskrit is a semi-formal system, fixed essentially by Panini as a high-register prakrit (prakrits were the natural languages spoken by North-Indians and descending from Vedic), since his generative grammar, admirably precise as a formal descriptive apparatus, became normative. This means that the enormous intellectual production of ancient India is available as a remarkably homogeneous corpus, spanning 25 centuries. The design and implementation of computer-aided processing tools is thus of paramount importance to analyse the enormous store of knowledge and literature available as Sanskrit text.

However, Sanskrit analysis is a non-trivial task in computational linguistics and philology. Thus the need for scholar gatherings such as our series of symposia, where traditional scholars meet computer experts in joint endeavours.

Earlier people have talked about how Sanskrit is a very good language for programming. Is this really true? Are there any special benefits it may provide?

PG: This is mostly fake news. It all started with an article by Rick Briggs, a NASA scholar, in the AI Magazine in 1984. The article explains how concepts from traditional darśanas could be used for knowledge representation in artificial intelligence research. The article itself is rigorous and interesting, but it spread on Internet as uncontrollable disinformation. At some point it was said that an article in Esquire Magazine explained that Sanskrit was the ultimate programming language, studied in secret NASA laboratories. This is just crazy rumours piling on conspiracy theories. Sanskrit is definitely not usable as a programming language. Such nonsense is detrimental to the respect that the ancient science of Vyākaraṇa genuinely deserves.

Is there a role that Sanskrit can play in a potential integrated world of IOT (Internet of Things)? Can it contribute/shape the direction that digitization and AI will take human society in the future?

PG: Perhaps. This is a sensible program of research. But Sanskrit cannot be reduced to a universal system of signs, it is also co-extensive with Indian culture.

Structural semantics is a long-term research topic, whereas IOT technologies are short-term solutions, cutting many corners. It seems hard to synchronize their communities.

In all the years that you have worked in this area is there any special insight you have gleaned on the semantic aspect of the Sanskrit language i.e. the relationship between word and meaning in Sanskrit.

PG: Well, the relationship between word and meaning is at the heart of linguistic modelling. But lexical semantics is only a piece in the semantics puzzle, in order to understand the meaning of sentences, of texts, of communicative dialogs, etc. The meaning of sentences in the Paninian model, through the kāraka/ākāṅkṣā theory, is an elegant way of expressing the dependency structure of linguistic notions, an essential step on the way of meaning analysis (śabdabodha).


Aparna Sridhar
Aparna M Sridhar is a senior journalist, editor for Center for Soft Power (www.centerforsoftpower.org) and consulting editor for arts at IndicToday.

Leave a Reply

Your email address will not be published. Required fields are marked *

eleven + 5 =