Welcome to Mysooru News

mysore latest news

Pioneering the future of Indian languages in AI: Conference at CIIL, Mysuru begins

3 min read

Mysuru: The Linguistic Data Consortium for Indian Languages (LDC-IL) of the Central Institute of Indian Languages (CIIL), Mysuru, inaugurated a two-day AI Benchmarking Conference on Thursday.

The two-day conference,  aims to address critical issues in the evaluation and efficacy of AI and Generative AI-based applications for Indian languages. The event brings together leading experts, researchers and industry stakeholders to establish benchmarks, develop objective evaluation metrics and explore innovative solutions to enhance
language technologies for India’s diverse linguistic landscape.

The conference was inaugurated with a traditional lamp-lighting ceremony led by Prof N Deiva Sundaram, former professor at the University of Madras and Managing Director of NDS Lingsoft Solutions Pvt Ltd., Chennai, alongside other dignitaries.

Dr Narayan Choudhary, Officer In-Charge, LDC-IL, welcomed the guests and attendees and
emphasised the urgent need for benchmarking AI systems in India.

He highlighted CIIL’s pivotal role in developing datasets for AI, given its responsibility towards thousands of languages and
dialects across the country.

Prof Umarani Pappuswami, Professor-cum-Deputy Director, CIIL, introduced the conference theme, outlining its vision and objectives.

She stressed the importance of developing Large Language Models (LLMs) for Indian languages and expanding CIIL’s legacy in language technology.

Prof Pappuswami also underscored the need for Text-to-Speech (TTS) models for
northeastern languages and identified datasets, evaluation, and ranking as the three pillars of AI benchmarking.

She concluded by emphasising the importance of inclusive ethical considerations
in AI development.

In his inaugural address, Prof N Deiva Sundaram highlighted the challenges faced by lowresource languages like Tamil, despite their rich historical legacy.

He emphasised the necessity of benchmarking to identify the datasets and linguistic resources required to fine-tune language models.

Prof Sundaram expressed optimism that the conference would provide actionable
insights to guide future advancements in AI for Indian languages.

Prof P R Dharmesh Fernandez, Professor-cum-Deputy Director, CIIL, delivered the general remarks, focusing on the role of collaboration and standardisation in building robust linguistic datasets.

He emphasised the need for linguistic intelligence to ensure high-quality data and
reiterated CIIL’s social responsibility to promote Indian languages in AI.

A key highlight of the conference was the release of 15 newly developed datasets by LDC-IL.

These datasets, released by Prof. Shailendra Mohan, Director, CIIL, and other dignitaries, mark a significant milestone in LDC-IL’s contributions to linguistic research and technology development.

These were introduced by Dr Rejitha K S, Resource Person, LDC-IL.

The datasets include:

1 Mother Tongue Parallel Text Corpus of India (147 mother tongues)
2 Gold Standard Rajasthani Raw Text Corpus
3 Gold Standard Chhattisgarhi Raw Text Corpus Vol. II
4 Gold Standard Kashmiri Raw Text Corpus Vol. II
5 Gold Standard Maithili Raw Text Corpus Vol. II
6 Gold Standard Telugu Raw Text Corpus Vol. II
7 Maithili Raw Speech Corpus Vol. II
8 Dogri Sentence Aligned Speech Corpus
9 Maithili Sentence Aligned Speech Corpus (Tirhuta Script)
10 Manipuri Sentence Aligned Speech Corpus (Bengali Script)
11 Manipuri Sentence Aligned Speech Corpus (Meetei Mayek)
12 Punjabi Sentence Aligned Speech Corpus
13 Telugu Sentence Aligned Speech Corpus
14 Assamese Text-to-Speech Corpus
15 Maithili Text-to-Speech Corpus

In addition, LDC-IL launched several AI applications designed to serve Indian languages, introduced by Dr Narayan Choudhary.

These applications, now available for public use at medha.ciil.org, include:
• Anuvadika (Machine Translator)
• Lipyantara (Transliterator)
• Lipidha (Optical Character Recognizer)
• Anulekhika (Automatic Speech Recognition for Indian Languages)
• Anuvachika (Text-to-Speech Recognition for Indian Languages)
• Dhvani Parivartka (Media Converter)
• Aksharanka (Unicode Value Identifier)
• Shabdasandhan, Nudiyalavi, Paatantara, and other desktop applications

Prof Shailendra Mohan, Director, CIIL, delivered the presidential address, sharing his vision for CIIL’s growth and its role in advancing collaborative technologies for Indian languages.

The inaugural session concluded with a vote of thanks by Amom Nandaraj Meetei,
Resource Person, LDC-IL, who expressed gratitude to the dignitaries, participants, collaborators, and media for their contributions to the conference’s success.

The conference features 2 plenary talks, 15 research papers and 1 Panel discussion from top academic institutions and industry leaders, including representatives from ChatGPT, OpenAI, Ola Krutrim, Bhashini and BharatGen.

Over the next two days, presenters will deliberate on the major challenges facing Indian languages in AI and explore strategies to enhance their representation and functionality in the digital age.

For more information about this conference, please visit on the following website:
www.ldcil.org/benchmarking

– Team Mysoorunews 

Mysooru News

Share this