A Living Chronicle · 6,000 Years

भाषाओं का इतिहास The Languages of India

From the reconstructed whispers of Proto-Dravidian in the fourth millennium BCE to the twenty-two scheduled languages of the modern Republic — the story of how a subcontinent spoke, wrote, fought over, and preserved the richest linguistic diversity on Earth.

780+
Living Languages
4
Language Families
22
Scheduled Languages
13
Script Families
↓ Scroll to explore

The Four Language Families

India's linguistic landscape is dominated by four great language families, each with distinct origins, structures, and histories. Together they account for over 99% of India's 1.4 billion speakers — yet the relationships between them remain one of the great puzzles of historical linguistics.

Indo-European

Indo-Aryan Branch
~78%

The Indo-Aryan languages descend from Proto-Indo-European via Proto-Indo-Iranian and Proto-Indo-Aryan. Speakers entered the northwestern subcontinent around 1500 BCE, bringing Vedic Sanskrit — the oldest attested form. Over three millennia, this branch diversified through Prakrits and Apabhramsha dialects into the modern languages that dominate northern, central, and western India.

The branch includes the world's fourth most-spoken language (Hindi-Urdu, ~530 million first-language speakers) and Bengali (97 million), which has one of the richest literary traditions in Asia. Classical Sanskrit, codified by Pāṇini in his Aṣṭādhyāyī (~500 BCE), remained the prestige language of scholarship and religion for over two millennia.

The sheer internal diversity is staggering: a Kashmiri speaker and a Bengali speaker — both Indo-Aryan — are more different from each other linguistically than a Spanish speaker is from a Romanian speaker. Rajasthani alone has been argued to contain over 70 distinct dialects. The Hindi Belt's apparent uniformity masks a continuum of Bhojpuri, Awadhi, Chhattisgarhi, Marwari, and other varieties, each with millions of speakers and rich oral traditions, that are classified as "Hindi" for census purposes.

Proto-Indo-European Proto-Indo-Iranian Proto-Indo-Aryan Vedic Sanskrit (~1500 BCE) Classical Sanskrit (~500 BCE) Prakrits (~300 BCE) Apabhramsha (~600–1300 CE) Modern Indo-Aryan Languages
Hindi 528MBengali 97MMarathi 83M Urdu 51MGujarati 55MOdia 37M Punjabi 33MAssamese 15MMaithili 14M Kashmiri 7MSindhi 2.7MKonkani 2.3M Nepali 2.9MDogri 2.6MSanskrit 0.02M

Dravidian

Indigenous to the Subcontinent
~20%

The Dravidian family is indigenous to the Indian subcontinent, with Proto-Dravidian reconstructed to ~4000–3000 BCE. Before the Indo-Aryan expansion, Dravidian languages were likely spoken across a much larger area — evidenced by retroflex consonants borrowed into Vedic Sanskrit, Dravidian loanwords in the Rigveda, and surviving northern pockets like Brahui in Balochistan, Kurukh in Jharkhand, and Malto in the Rajmahal Hills.

Tamil boasts the oldest continuous literary tradition among Dravidian languages, with Sangam literature dating to ~300 BCE–300 CE. The Tolkappiyam, the oldest extant Tamil grammar, is one of the most remarkable linguistic documents of the ancient world. Some scholars have proposed that the undeciphered Indus script records a Dravidian language, though this remains unproven.

Dravidian languages are typologically distinct from Indo-Aryan in fundamental ways: they are agglutinating (words built by stringing morphemes together), strictly verb-final, use postpositions instead of prepositions, and have a "clusivity" distinction in pronouns (separating "we including you" from "we excluding you"). The retroflex consonants (ட, ண, ळ) that feel so characteristic of Indian languages to foreign ears are originally a Dravidian feature — borrowed into Indo-Aryan through centuries of contact.

Proto-Dravidian (~4000 BCE) Proto-South Dravidian / Proto-North Dravidian (~1500 BCE) Old Tamil (~600 BCE) · Proto-Kannada · Proto-Telugu Medieval literary languages Modern Dravidian
Telugu 81MTamil 69MKannada 44M Malayalam 35MTulu 1.8MGondi 2.9M Brahui (Pakistan)KurukhKodava

Austro-Asiatic

Munda Branch
~1.5%

The Munda languages form the westernmost branch of the vast Austro-Asiatic family, which stretches from central India to Vietnam. Distantly related to Khmer and Vietnamese, the Munda speakers arrived in India around 1500 BCE, most likely via a maritime route from Southeast Asia to the Mahanadi River Delta in modern Odisha, subsequently spreading into the Chota Nagpur Plateau.

With ~11 million speakers across Jharkhand, Odisha, West Bengal, Bihar, and Chhattisgarh, the Munda languages are characterized by agglutinating morphology, three grammatical numbers (singular, dual, plural), two genders (animate/inanimate), and inclusive/exclusive first-person plural distinctions. Santali, the largest Munda language (7.4 million speakers), was added to India's Eighth Schedule in 2003 and has its own script — Ol Chiki — created by Pandit Raghunath Murmu in 1925.

The Munda connection to Southeast Asia is proven by shared vocabulary for rice cultivation, kinship, and body parts with Khmer and Vietnamese. DNA evidence confirms a Southeast Asian origin: Munda speakers carry the O2a Y-chromosome haplogroup common in mainland Southeast Asia. The Sora people of Odisha practice a form of spirit mediumship where the dead speak through living mediums — an entire corpus of "language of the dead" has been documented by anthropologists.

Santali 7.4MMundari 1.6MHo 1.4M Korku 727KSora 410KKharia 298K Juang 30KBonda 9K

Sino-Tibetan

Tibeto-Burman Branch
~1%

Northeast India harbors the epicenter of phylogenetic diversity within the Tibeto-Burman branch of Sino-Tibetan — perhaps 20 independent subgroups and 100 to 300 individual languages across the seven northeastern states. This extraordinary diversity, concentrated in the highlands surrounding the Brahmaputra River valley, is only now beginning to be fully documented.

Major groups include Bodo-Garo (spoken in Assam and Meghalaya), Kuki-Chin-Mizo (Mizoram, Manipur, Myanmar border), the diverse "Naga" languages (Nagaland — over 16 distinct languages), Meitei/Manipuri (the state language of Manipur, and the only Tibeto-Burman language in India's Eighth Schedule alongside Bodo), and the Tani languages of Arunachal Pradesh. Many languages remain poorly described, and new languages continue to be identified in the 21st century.

Arunachal Pradesh alone — population 1.4 million — hosts an estimated 30–50 languages from at least 5 different Tibeto-Burman subgroups. The Koro language, with only ~800 speakers, was discovered by linguists as recently as 2010 in a remote valley — a reminder that India's linguistic map is still being drawn. Many of these languages are tonal (using pitch to distinguish meaning), a feature absent from most other Indian language families.

Meitei 1.8MBodo 1.5MMizo 830K Karbi 490KGaro 1.1MTangkhul 158K Ao Naga 260KAngami 150KKokborok 950K
🔎 Did you know?

Pāṇini's grammar (~500 BCE) is arguably the first formal language in human history. His 4,000 sutras describe Sanskrit with such precision that linguists have compared its rule-based structure to a programming language. Noam Chomsky acknowledged Pāṇini as a precursor to modern generative grammar — 2,500 years before the field existed.

The word "sugar" comes from Sanskrit śarkarā → Prakrit sakkarā → Arabic sukkar → Latin saccharum → Old French sucre → English "sugar." Similarly, "orange" traces back to Sanskrit nāraṅga, "jungle" from Hindi jaṅgal, and "shampoo" from Hindi chāmpō (to press/massage).

Language Isolates

Nihali — spoken by ~2,000 people in Maharashtra's Jalgaon district, with no proven relation to any family. Kusunda — a near-extinct isolate in Nepal. Burushaski — spoken in northern Pakistan, unrelated to any known family. These orphan languages hint at an even more complex pre-history.

Andamanese Languages

The languages of the Andaman Islands — Great Andamanese (nearly extinct), Onge, Jarawa, and the language of the uncontacted Sentinelese — may represent remnants of the earliest Out-of-Africa migrations (~65,000 years ago). They form at least two distinct families with no proven external connections.

Tai-Kadai & Khasian

Northeast India also hosts Tai-Kadai languages (Ahom, Tai Phake, Khamti — related to Thai/Lao) brought by medieval migrations, and Khasian languages (Khasi, Pnar — part of the Austro-Asiatic family, not Munda), spoken by ~1.4 million in Meghalaya's matrilineal Khasi society.


Timeline of Indian Languages

Six thousand years of linguistic evolution — from the reconstructed proto-languages of prehistory, through the great literary traditions, colonial encounters, and post-independence language movements that shaped the modern subcontinent.

Prehistoric Era~6000–2600 BCE

~6000 BCE
Proto-Dravidian Spoken
Linguistic reconstruction suggests Proto-Dravidian was spoken around the sixth millennium BCE. The culture is associated with the Neolithic complexes of South India — a rural economy with agriculture, animal husbandry, hunting, and early metallurgy.
~4000–3000 BCE
Proto-Dravidian Diversifies
Proto-Dravidian begins to split into Proto-North Dravidian, Proto-Central Dravidian, and Proto-South Dravidian. The botanical vocabulary suggests speakers inhabited the dry deciduous forests of central and peninsular India, from Saurashtra to South India.

Indus Valley Civilization2600–1900 BCE

2600–1900 BCE
Indus Script in Use
The mature Indus Valley Civilization flourishes across the Indus and Ghaggar-Hakra river systems. The Indus script — roughly 400 distinct symbols found on seals, pottery, and tablets — remains undeciphered. Some scholars propose it recorded a Dravidian language; others suggest it may be only proto-writing. Dravidian loanwords appear in Sumerian trade records: "zuamsi" (ivory) and "ĝeš-i₃" (sesame).

Vedic Period1500–500 BCE

~1500 BCE
Indo-Aryans Enter the Subcontinent
Indo-Aryan speakers migrate into the northwestern subcontinent from the Bactria-Margiana region, bringing Vedic Sanskrit. The Rigveda, composed orally over centuries, contains the oldest Indo-Aryan hymns. Already at this stage, over a dozen words are borrowed from Dravidian, and retroflex consonants (~88 words in the Rigveda) suggest deep contact with a Dravidian substrate population.
~1500 BCE
Munda Speakers Arrive
Proto-Munda speakers, ancestral to today's Santali, Mundari, and Ho communities, arrive on the coast of Odisha from Southeast Asia via a maritime route. They bring rice and millet cultivation and settle the Mahanadi Delta, later spreading into the Chota Nagpur Plateau and adjacent highland regions.
~1200 BCE
Rigvedic Period Complete
The text of the Rigveda is essentially complete. Its archaic language preserves cognates with Avestan (Old Iranian) that disappear from later Vedic texts. Five chronological strata are identified in Vedic literature: Rig-vedic, Mantra, Samhita prose, Brahmana prose, and Sutra language.
~1100 BCE
Kuru Kingdom & Sanskritization
The establishment of the Kuru Kingdom in the Ganges-Yamuna Doab initiates a process of Sanskritization in northern India. Over centuries, native Dravidian speakers in the north shift to Indo-Aryan languages through elite dominance, retaining Dravidian structural features (gerunds, retroflex consonants) in the new languages — a process of language shift, not replacement.
~600 BCE
Old Tamil Emerges
Old Tamil emerges as a distinct language from Proto-South Dravidian. The earliest Tamil-Brahmi inscriptions — cave inscriptions in the Madurai and Tirunelveli districts — date to the 2nd century BCE, making Tamil one of the earliest attested Dravidian languages.
~500 BCE
Pāṇini Codifies Sanskrit
Pāṇini composes the Aṣṭādhyāyī ("Eight-Chapter Grammar") — an astonishing work of ~4,000 sutras that precisely defines Classical Sanskrit grammar. It is essentially a prescriptive grammar that standardizes the language, though it also describes Vedic forms already passing out of use. Around this time, Sanskrit transitions from a spoken first language to a learned language of religion, scholarship, and the court.

Classical Period500 BCE – 300 CE

~300 BCE
Prakrits & Ashoka's Brahmi Edicts
Emperor Ashoka's rock and pillar edicts, inscribed in Brahmi script across the Mauryan Empire (~250 BCE), represent the earliest indisputable Indian writing. Written in various Prakrits (the vernacular speech, as opposed to "refined" Sanskrit), they are the foundation of all subsequent Indian epigraphy. In the northwest, the Kharosthi script (written right-to-left, derived from Aramaic) is used alongside Brahmi.
~300 BCE – 300 CE
Sangam Literature in Tamil
The Sangam period produces the oldest extant body of secular literature in any Dravidian language — over 2,000 poems in Tamil covering love (akam) and war (puram). These texts, composed by over 470 poets including women, reveal a sophisticated literary culture in the deep south contemporaneous with the Mauryan and post-Mauryan periods in the north.
~300 BCE
Pali & Buddhist Canon
Pali, a Middle Indo-Aryan language, becomes the vehicle for the Theravada Buddhist scriptures. The dramatic Prakrits also emerge — Sauraseni (basis for some later Hindi dialects), Magadhi (eastern dialect), and Maharashtri (literary Prakrit associated with western India, ancestor of Marathi). In Sanskrit drama, characters speak different Prakrits based on their social status.
~100 CE
Tolkappiyam Takes Form
The Tolkappiyam, the oldest extant Tamil grammar, is compiled (dating debated — possibly 3rd century BCE to 5th century CE). It describes Tamil phonology, morphology, and poetics in extraordinary detail, and serves as evidence of a sophisticated indigenous grammatical tradition independent of the Sanskrit grammatical tradition of Pāṇini.

Medieval Period300–1200 CE

300–500 CE
Gupta Golden Age of Sanskrit
The Gupta Empire presides over the "Golden Age" of Classical Sanskrit literature — Kalidasa's Shakuntala and Meghaduta, the Panchatantra, the Kamasutra, and major works of mathematics and astronomy. The Gupta script evolves from Brahmi and will later give rise to Nagari, Sharada, Siddham, and eventually all modern North Indian scripts including Devanagari.
~450 CE
Kannada's Earliest Inscription
The Halmidi inscription (~450 CE) in Karnataka is the earliest known Kannada inscription. Kannada and Telugu begin their journeys as distinct literary languages, eventually developing rich literary traditions. The southern Brahmic scripts begin diverging into the more rounded forms characteristic of South Indian writing.
600–1300 CE
Apabhramsha — The Bridge
The Prakrits gradually transform into Apabhramsha ("fallen away") — transitional dialects connecting Middle Indo-Aryan to Early Modern Indo-Aryan. Significant Apabhramsha literature survives in Jain libraries. Poets like Pushpadanta (9th c.), Hemachandra of Patan, and Sarahapad of Kamarupa compose in these evolving dialects. The largest modern languages — Bengali, Hindi, Marathi, Gujarati, Punjabi, Odia — all crystallize from these Apabhramsha varieties.
~800 CE
Devanagari Script Emerges
The Devanagari script — descended from Brahmi through Gupta and Nagari — takes recognizable form around the 8th–10th centuries. It would become the dominant script for Sanskrit and later for Hindi, Marathi, and several other North Indian languages. Meanwhile, the Odia script develops its characteristic rounded forms adapted for writing on palm leaves (straight lines would tear the leaf).
~933 CE
First Hindi Text
The Shravakachar by Devasena of Dhar (dated 930s CE), a Jain text, is now considered the first book written in Hindi. Meanwhile, Malayalam begins to diverge from Middle Tamil around the 10th century, influenced by the geographic isolation of the Western Ghats and heavy Sanskrit borrowing.

Islamic Period1200–1700 CE

~1200 CE
Persian Becomes Court Language
With the establishment of the Delhi Sultanate, Persian becomes the language of administration, court, and high culture across much of North India. This begins a profound and lasting influence on North Indian languages — Persian and Arabic loanwords flood into local speech, fundamentally reshaping vocabulary for governance, law, culture, food, and daily life.
~1300 CE
Hindustani Emerges
The Khariboli dialect of the Delhi region begins gaining prestige, absorbing Persian and Arabic vocabulary to become "Hindustani" — the ancestor of both modern Hindi and Urdu. Amir Khusrow writes poetry in both Khariboli and Brajbhasha, calling the language "Hindavi." For centuries, Hindi and Urdu remain a single language with different registers — the formal split along religious and script lines comes much later.
1400–1600 CE
Bhakti Movement — Vernacular Explosion
The Bhakti devotional movement produces an explosion of literature in vernacular languages. Kabir composes in Sadhukkadi (a mixed Hindi); Tulsidas writes the Ramcharitmanas in Awadhi (~1574); Meerabai sings in Rajasthani Braj; Surdas composes in Braj Bhasha. In the south, Alvars and Nayanars had already composed devotional poetry in Tamil centuries earlier. These movements democratize literature, taking it from the Sanskrit elite to the masses.
~1540 CE
Gurmukhi Script Standardized
Guru Angad Dev, the second Sikh Guru, standardizes the Gurmukhi script for writing Punjabi. The Guru Granth Sahib, compiled by Guru Arjan Dev (1604), is written in multiple languages — Punjabi, Braj, Khariboli, Sindhi, Sanskrit, and even Persian — a living testament to India's multilingual reality. The Gurmukhi script descends from the Landa mercantile scripts of the Punjab.

Colonial Period1700–1947

1786
Sir William Jones — Indo-European Discovery
Sir William Jones delivers his famous discourse to the Asiatic Society of Bengal, observing that Sanskrit, Greek, and Latin share a common ancestor — effectively founding comparative linguistics and the concept of the Indo-European language family. This insight would revolutionize our understanding of human migration and linguistic evolution.
1836
Brahmi Script Deciphered
Norwegian scholar Christian Lassen uses bilingual Greek-Brahmi coins of the Indo-Greek king Agathocles to achieve the first secure decipherment of Brahmi letters. James Prinsep completes the decipherment shortly after, unlocking thousands of years of Indian epigraphic history. Ashoka's edicts can finally be read for the first time in over a millennium.
1856
Caldwell's Dravidian Grammar
Robert Caldwell publishes "A Comparative Grammar of the Dravidian or South Indian Family of Languages" — the foundational work that establishes Dravidian as an independent language family, distinct from Indo-European. This is a watershed moment: it proves that South India's languages are not "corrupted Sanskrit" but belong to an entirely separate, ancient family.
1894–1928
Grierson's Linguistic Survey of India
George Abraham Grierson leads the massive Linguistic Survey of India, ultimately documenting 364 languages and dialects across 179 languages written in 190+ scripts. Published in 19 volumes, it remains one of the most ambitious linguistic documentation projects ever undertaken — a snapshot of Indian linguistic diversity at the turn of the 20th century.
Early 1900s
Hindi-Urdu Split Formalizes
The single Hindustani language formally splits into "Hindi" (Sanskritized vocabulary, Devanagari script) and "Urdu" (Persianized vocabulary, Nastaliq script) along communal lines. This politicization of language — where two registers of the same speech become markers of religious identity — would have profound consequences through Partition and beyond. The fundamental grammar remains identical.
1925
Ol Chiki Script Created
Pandit Raghunath Murmu, a Santal educator, creates the Ol Chiki script specifically for the Santali language. Unlike Devanagari or Latin adaptations, Ol Chiki is designed from scratch to represent Santali's unique phonology — its vowel length, glottal stops, and nasal vowels. It becomes a powerful symbol of Santal cultural identity and is officially adopted for Santali in 2003.

Post-Independence1947–Present

1950
Constitution — 14 Scheduled Languages
India's Constitution establishes Hindi (in Devanagari) as the official language with English as an additional language for 15 years. The Eighth Schedule lists 14 languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sanskrit, Tamil, Telugu, and Urdu. Article 29 protects the right of minorities to preserve their languages and scripts.
1952–1956
Linguistic States Reorganization
Potti Sreeramulu's death after a 56-day fast for a Telugu state triggers the creation of Andhra State (1953). The States Reorganisation Commission (Fazal Ali Commission) follows, and the 1956 Act reorganizes India along linguistic lines — the most extensive redrawing of state boundaries since independence. Language becomes the primary organizing principle of Indian federalism.
1965
Anti-Hindi Agitation in Tamil Nadu
As the 15-year transition period for English ends, massive protests erupt in Tamil Nadu against the imposition of Hindi. Students and political leaders argue that replacing English with Hindi would disadvantage non-Hindi speakers in education and government employment. The agitation leads to the Official Languages Act amendment, ensuring English continues indefinitely as a co-official language — a pivotal moment in India's language politics.
1967–2003
Eighth Schedule Expands to 22
The Eighth Schedule grows through three amendments: Sindhi (1967, 21st Amendment), Konkani + Manipuri + Nepali (1992, 71st Amendment), and Bodo + Dogri + Maithili + Santali (2003, 92nd Amendment). The inclusion of Santali marks the first Austro-Asiatic language and first tribal language to receive constitutional recognition. As of 2025, 38 more languages seek inclusion.
2004–Present
Classical Language Status
The Government of India begins granting "Classical Language" status: Tamil (2004), Sanskrit (2005), Kannada (2008), Telugu (2008), Malayalam (2013), and Odia (2014). This status recognizes languages with ancient literary heritage and provides funding for preservation. Meanwhile, UNESCO's Atlas of World's Languages in Danger lists 197 Indian languages as endangered — the tension between preservation and loss defines the modern linguistic landscape.
🔎 Did you know?

India has more English speakers than England. With ~125 million English speakers (2011 census), India is the world's second-largest English-speaking country. English in India has developed its own distinctive grammar, vocabulary, and idioms — "prepone" (the opposite of postpone), "do the needful," "kindly revert back" — that are perfectly valid Indian English, not errors.

The Rigveda was transmitted orally for ~3,000 years before being written down, using elaborate mnemonic techniques (padapatha, kramapatha, jatapatha) where verses were recited forward, backward, and in interlocking patterns to prevent even a single syllable from changing. When finally written, the oral version proved virtually identical across geographic regions — one of the most remarkable feats of human memory.


Maps — Where Languages Live

Geography has always been the silent architect of language. Mountains isolate; rivers connect; plains spread. These maps trace how India's languages distributed themselves across the subcontinent — and how political boundaries were later drawn to follow them.

Modern Language Distribution

The 22 scheduled languages of India mapped by approximate geographic center, with circle size indicating speaker population. Dashed regions show the approximate territory of each language family.

Indo-Aryan (~78%)
Dravidian (~20%)
Austro-Asiatic (~1.5%)
Sino-Tibetan (~1%)

Prehistoric Migrations & Homeland Theories

Approximate migration routes and homeland regions based on linguistic reconstruction, comparative vocabulary, and archaeological evidence. All dates are scholarly estimates and remain debated.

Indo-Aryan Migration (~1500 BCE)
Dravidian Homeland & Southward Shift
Munda Maritime Route (~1500 BCE)
Northern Dravidian Pockets

Scripts of India

The major writing systems in use across India today — almost all descended from the ancient Brahmi script (3rd century BCE). India uses more distinct scripts than any other country on Earth.

Nagari-derived scripts
South Indian Brahmic scripts
Tribal/independent scripts
Perso-Arabic script

The Writing Systems

Almost every script used in India today descends from Brahmi — a single ancestral writing system first attested in Ashoka's edicts (~250 BCE). From that common root, regional variations diverged into the angular scripts of the north and the rounded scripts of the south, producing one of the world's richest families of writing systems.

Brahmi

~3rd century BCE

Ancestor of all Indian scripts. First attested in Ashoka's edicts. Deciphered by Lassen & Prinsep (1836–38). The letter "ba" (𑀩) became a symbol of India's epigraphic heritage.

Tamil

~5th century CE

From Tamil-Brahmi via Vatteluttu. Rounded forms. 12 vowels, 18 consonants, one aytam. One of the oldest scripts still in active use.

Kannada

~5th century CE

Shares ancestry with Telugu script. Kadamba and Chalukya dynasty inscriptions show its evolution.

Telugu

~7th century CE

From Bhattiprolu variant of Brahmi. Closely related to Kannada script. Known for its rounded aesthetics.

Devanagari

~8th century CE

Via Gupta → Nagari. Used for Hindi, Marathi, Sanskrit, Nepali, Konkani. 48 primary characters.

Malayalam

~9th century CE

Blend of Vatteluttu + Grantha scripts. Large character set to represent both Indo-Aryan and Dravidian sounds.

Bengali-Assamese

~11th century CE

Eastern Nagari variant. Distinctive spirals and loops. Used for Bengali, Assamese, Bishnupriya Manipuri.

Odia

~14th century CE

Distinctive rounded forms — adapted for palm-leaf manuscripts (straight lines would tear the leaf). Curvilinear beauty.

Gujarati

~16th century CE

Nagari variant without the headline (shirorekha). Influenced by mercantile writing traditions of Gujarat.

Gurmukhi

~1540 CE

Standardized by Guru Angad for Punjabi. From Landa mercantile scripts. Used for Guru Granth Sahib.

Ol Chiki

1925 CE

Created by Pandit Raghunath Murmu for Santali. Not derived from Brahmi — designed from scratch for Munda phonology.

Meitei Mayek

Revived 20th c.

Historical Manipuri script revived for Meitei language. Tibetan group origin via Gupta Brahmi.

🔎 Did you know?

Indian currency notes are printed in 17 languages — Hindi and English on the front, and 15 others (including Assamese, Bengali, Gujarati, Kannada, Kashmiri, Konkani, Malayalam, Marathi, Nepali, Odia, Punjabi, Sanskrit, Tamil, Telugu, and Urdu) in a language panel on the back. It's the most multilingual banknote in the world.

Odia script is round because of palm leaves. Ancient Odisha used palm leaves as writing material. Straight lines would split the leaf along its grain, so scribes developed the distinctively curved, circular letterforms that make Odia one of the most visually unique scripts in the world. The same constraint shaped the rounded scripts of South India.

"Malayalam" is the longest palindromic language name in English — it reads the same forwards and backwards.


The Peoples & Their Languages

Languages don't exist in abstraction — they are spoken by peoples with histories, migrations, social structures, and struggles. Here are the stories of some of the communities whose languages have shaped the subcontinent.

The Vedic Aryans

The Indo-Aryan speakers who entered the northwest around 1500 BCE brought a pastoral-agricultural culture centered on cattle, fire rituals, and oral hymn traditions. Their language, Vedic Sanskrit, was preserved with extraordinary fidelity through elaborate oral transmission techniques for over a millennium before writing arrived. The social system they established — with Sanskrit as the language of ritual and learning — created a prestige hierarchy that shaped Indian multilingualism for three thousand years.

Key linguistic legacy

The distinction between Sanskrit (the "refined" language of elites) and Prakrit (the "natural" vernacular of the common people) established a pattern of diglossia that persists across India to this day — formal literary registers vs. colloquial speech.

The Sangam Tamils

The Tamil-speaking peoples of the deep south — under the Chera, Chola, and Pandya dynasties — produced one of the ancient world's great literary traditions during the Sangam period (~300 BCE–300 CE). Over 2,000 poems by 470+ poets (including women like Avvaiyar) describe a sophisticated civilization with bustling ports, maritime trade with Rome, and a culture that valued both love poetry and heroic valor.

Cultural significance

Tamil identity is deeply intertwined with the language itself — the concept of "Tamil Tay" (Mother Tamil) treats the language as a living goddess. The anti-Hindi agitation of 1965 and the Dravidian political movement are direct expressions of this linguistic nationalism.

The Santal People

The Santals — India's largest tribal community (~7 million) — speak Santali, a Munda language whose ancestors arrived from Southeast Asia ~3,500 years ago. They maintain a rich oral tradition of myths, songs, and stories, and their social structure revolves around clan exogamy and village councils. The Santal Rebellion of 1855–56 against British colonial exploitation was one of India's earliest mass uprisings.

The Ol Chiki revolution

When Pandit Raghunath Murmu created the Ol Chiki script in 1925, he gave the Santals something no imposed script could — a writing system born from their own phonology and culture. Each letter has a name from nature (e.g., the letter for "la" is named after a creeper vine). It became a powerful tool of cultural assertion.

The Naga Peoples

Nagaland's motto — "Unity in Diversity" — reflects the reality of 16+ distinct Tibeto-Burman languages spoken by different Naga tribes within a single small state. Tribes like the Angami, Ao, Sema, Lotha, and Tangkhul have languages so different from each other that English and Nagamese (an Assamese-based creole) serve as lingua francas. Each tribe has its own distinct cultural practices, traditional dress, and oral literary traditions.

Linguistic puzzle

"Naga" is a geographic and political label, not a linguistic one. The languages called "Naga" belong to at least 5 different subgroups of Tibeto-Burman — they are not more closely related to each other than they are to Burmese or Tibetan.

The Bengali Renaissance

The 19th-century Bengali Renaissance — centered in Calcutta under figures like Ram Mohan Roy, Ishwar Chandra Vidyasagar, and Rabindranath Tagore — transformed Bengali from a regional language into a vehicle for modern literature, philosophy, journalism, and political thought. Tagore's Nobel Prize for Gitanjali (1913) marked the first time a non-European language received literary recognition at that level.

Script reform

Vidyasagar's rationalization of Bengali typography in the 1850s — reducing the bewildering variety of conjunct characters to a manageable set — made Bengali printing practical and literacy achievable, directly enabling the Renaissance's literary explosion.

The Sentinelese

The Sentinelese of North Sentinel Island — one of the last uncontacted peoples on Earth — speak a language that has never been recorded or analyzed. Their isolation for an estimated 60,000 years means their language could preserve features from the earliest human migrations out of Africa. It belongs to no known language family. With a population of perhaps 50–200, it is one of the most critically endangered languages in existence — and one we may never document.

Why it matters

The Sentinelese remind us that linguistic diversity is not merely about numbers — each language encodes a unique way of understanding the world, developed over millennia. When a language dies with no record, an irreplaceable piece of human heritage vanishes forever.

🔎 Did you know?

India loses a language roughly every two weeks. Of the 780+ languages recorded in India, UNESCO classifies 197 as endangered. The People's Linguistic Survey of India (2010–2013) found that India had lost 250 languages in the last 50 years — mostly unrecorded tribal languages in the northeast and central highlands.

Nagamese — the lingua franca of Nagaland — is a creole that spontaneously emerged from contact between Assamese traders and Naga tribes. It has no native speakers; everyone speaks it as a second language. Nagaland is the only state where the official language (English) is nobody's mother tongue.

The Three-Language Formula (1968) recommends every Indian student learn three languages: Hindi (or the regional language), English, and a modern Indian language from another part of India. In practice, implementation varies wildly — many southern states teach their regional language, English, and Sanskrit instead of Hindi.


The 22 Scheduled Languages

The Eighth Schedule of the Indian Constitution lists languages officially recognized by the Republic. Originally 14 in 1950, the list has grown to 22 through four constitutional amendments — a political process as much as a linguistic one, with 38 more languages currently demanding inclusion.

LanguageFamilySpeakers (2011)ScriptYear AddedOfficial In
HindiIndo-Aryan528MDevanagari19509 states + NCT Delhi
BengaliIndo-Aryan97.2MBengali-Assamese1950West Bengal, Tripura, Jharkhand
MarathiIndo-Aryan83MDevanagari1950Maharashtra, Goa
TeluguDravidian81.1MTelugu1950Andhra Pradesh, Telangana
TamilDravidian69MTamil1950Tamil Nadu, Puducherry
GujaratiIndo-Aryan55.5MGujarati1950Gujarat
UrduIndo-Aryan50.7MPerso-Arabic1950J&K, Bihar, Telangana, UP + more
KannadaDravidian43.7MKannada1950Karnataka
OdiaIndo-Aryan37.5MOdia1950Odisha
MalayalamDravidian34.8MMalayalam1950Kerala, Lakshadweep
PunjabiIndo-Aryan33.1MGurmukhi1950Punjab
AssameseIndo-Aryan15.3MBengali-Assamese1950Assam
MaithiliIndo-Aryan13.6MDevanagari/Tirhuta2003Bihar, Jharkhand
SantaliAustro-Asiatic7.4MOl Chiki2003Jharkhand, West Bengal
KashmiriIndo-Aryan6.8MPerso-Arabic1950Jammu & Kashmir
NepaliIndo-Aryan2.9MDevanagari1992Sikkim
SindhiIndo-Aryan2.7MDevanagari/Perso-Arabic1967No single state
DogriIndo-Aryan2.6MDevanagari2003Jammu & Kashmir
KonkaniIndo-Aryan2.25MDevanagari1992Goa
Manipuri (Meitei)Sino-Tibetan1.8MMeitei Mayek1992Manipur
BodoSino-Tibetan1.48MDevanagari2003Assam
SanskritIndo-Aryan~24,000Devanagari1950Himachal Pradesh, Uttarakhand
🔎 Final trivia

Sanskrit has the largest number of words for "elephant" — over 100, including gaja, hastin, dvipa, nāga, and kuñjara. The richness of vocabulary for a single concept reflects how culturally central the animal was.

Bollywood invented "filmi" languages. Hindi film songs routinely mix Hindi, Urdu, Punjabi, English, and sometimes Arabic or Persian in a single lyric — a practice called "code-mixing" by linguists, but just called "normal" by 1.4 billion people.

Every 10th person on Earth speaks an Indian language as their mother tongue. If India's Hindi speakers formed their own country, it would be the 4th most populous nation. If you added all Indian languages together, the subcontinent accounts for more linguistic diversity than Europe, the Middle East, and Central Asia combined.