@ Eren Li from Pexels
FAIR Forward and Prosa.ai discover Indonesia’s linguistic diversity for inclusive AI technology.
Spanning three time zones, Indonesia is a country full of diversity, different histories and perspectives. However, the country’s many regional languages are often overlooked. It is estimated that there are over 700 languages that are not represented in the digital space.
FAIR Forward and its partners from Prosa.ai demonstrated at the 10th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) in Lombok that the collection of language data using artificial intelligence (AI) is an important prerequisite for all people to be able to participate in technological progress.
Prosa.ai is an Indonesian start-up that specializes in NLP solutions for a wide range of clients, especially for Indonesian languages. FAIR Forward and Prosa.ai collect data and train models for three digitally underrepresented languages: Balinese, Bugis and Minangkabau. The aim is to develop local applications in Indonesia. Other regional languages are to be considered in the future.
The development and application of AI language technology requires collaboration with local communities. Data collection shows how important underrepresented languages are for language and culture preservation and for the development of digital solutions tailored to local challenges. “Data annotation” also offers new professional opportunities for rural dwellers, women and other marginalized groups. By writing short texts in their native language, translating and quality checking existing texts, they contribute to valuable NLP datasets in underrepresented languages. Working remotely, they can flexibly incorporate such work into their everyday lives.
The audience at the conference showed great enthusiasm for the topic and pointed out important aspects such as data accuracy, consent and privacy in data collection, especially in the Indonesian context. There is hope for further collaboration with the attendees in the future.