How Africa's Languages Are Reshaping AI Translation One Community at a Time
Meet Lesan AI, the Ethiopian Startup Building AI That Speaks Amharic and Tigrinya
The first Embedded episode of the year. I had the real pleasure of speaking to Asmelash Hadgu of Lesan AI. Listen here on Spotify
The limitations of current AI models in handling African languages or dialects like Amharic (Ethiopia) and Tigrinya (Eritrea) highlight the digital divide. I spoke to Asme Hadgu, the cofounder of Lesan AI. I’ve been thinking about how one would handle the vast number of languages in Kenya in data sets, capturing oral stories, and handling nuanced references that are lost in translation.
All of us on the continent, or in the diaspora, want to ensure that AI and machine translation do not leave behind the rich linguistic diversity of Africa. The highlight for me here is that serving underrepresented languages isn't just about technical capability—it's about understanding and respecting the communities who speak them.
Our latest podcast is about how Lesan and Asme are approaching this problem and trying to break translation technology that can serve communities.
One of our biggest achievements so far is a machine translation system that outperforms these big models from Google Translate or Facebook's kind of MMS and Microsoft Translator... The way you create machine translation is you start by collecting data, and again, just for comparing and contrasting. Let's start how the multinational corporations do it. It's through convenient data gathering... Lesan, in contrast, kind of approach this problem differently. So for Tigrinya or Amharic, we ask, Okay, how can we collect the fundamental building blocks of our language—the words, the sentences, and the questions?
Lesan’s approach to data collection : communities
In Embedded, I learn that what sets Lesan apart is their methodological approach to data collection. Instead of relying solely on web scraping and convenience sampling, they've developed a ground-up strategy that begins with community engagement:
They start by identifying fundamental building blocks of each language
They actively involve community members in the development process
They maintain transparency about their goals and methods
They prioritize consent and collaboration in data collection
"The way different companies approach language is very different... big multinational corporations spend billions of dollars to work on a handful of languages because those are mainly the languages spoken by the people in which countries that spend—so that leaves out billions of people that don't necessarily speak these languages."
The impact of Lesan's work extends far beyond technical achievements, like in Tigray. They have:
Enabled community members to share their stories globally
Facilitated the translation of educational materials
Bridged communication gaps
Supported humanitarian communication efforts
Any paid support helps us produce podcasts like this one. For $8 a month, we would be so appreciative to keep going and bootstrapping this thing Thank you in advance.