Ever sat in a Bangladeshi clinic and heard someone say something like:
“Apni aagei bolen chilen na je chest-e ekta tightness feel hocche?”
That right there is code-switching—slipping between Bengali and English mid-sentence, like it’s no big deal. It happens all the time, especially in doctor-patient conversations. But while humans handle this effortlessly, most language models? They get confused faster than a junior doctor on their first ER shift.
And in healthcare, that’s a problem. You can’t afford to misinterpret symptoms because a model panicked at the word "tightness" sandwiched between two Bengali clauses.
So I decided to do something about it.
Enter: MediBeng Whisper Tiny
I built a small but mighty tool: a fine-tuned version of OpenAI’s Whisper Tiny model trained specifically to handle Bengali-English code-switched medical speech and translate it into clean English.
Yep, that means this model can listen to real mixed-language clinical convos and output clear, usable English—perfect for documentation, analysis, or even aiding with diagnosis.
Why I Did This
In real-world clinics across Bangladesh (and frankly, anywhere multilinguals exist), people constantly mix languages. It’s second nature. But no existing ASR model was cutting it when it came to understanding this hybrid speech, let alone translating it. The goal was simple:
Input: Bengali-English speech
Output: Clear English transcription
And for this? I had to train Whisper to stop freaking out every time it heard a Bengali sentence ending with an English medical term.
How I Did It (a.k.a. The Nerdy Bit)
- Built a Custom Dataset – MediBeng
I couldn’t find a dataset that captured the quirky, authentic blend of Bengali and English that you hear in clinics. So I built my own: MediBeng.
It’s a synthetic but realistic dataset of code-switched speech based on how real people actually talk in clinical settings.
- Fine-Tuned Whisper Tiny
I used MediBeng to fine-tune Whisper Tiny, teaching it how to make sense of bilingual speech. That means adjusting the model's brain (aka weights) to understand when someone says:
“Take korar por ekdom bhalo feel kori nai.”
...they probably meant something worth documenting.
✅ 3. Tested It on Real Conversations
After fine-tuning, I ran the model on real examples—and the improvement was obvious. It could now transcribe and translate code-switched speech much more accurately, giving output that was actually usable for patient records.
Why This Actually Matters
This isn’t just a fun project or an academic exercise—it solves a real problem. Healthcare professionals are already overwhelmed. Asking them to decode bilingual gibberish in transcripts just makes their job harder.
With MediBeng Whisper Tiny, we’re talking:
Fewer transcription errors
Better documentation
Faster decision-making
Happier doctors (and probably patients too)
The Big Picture
This project proves that even lightweight models like Whisper Tiny can be smart—if you give them the right data. It also shows the massive potential of domain-specific, culturally aware AI tools.
Imagine tailoring models like this for other regions: Hindi-English, Arabic-French, Spanish-Spanglish—you get the idea.
I started with Bengali-English medical speech because that’s where the pain was. But this blueprint? It’s open-source, replicable, and ready to be extended by anyone who wants to build smarter speech models for multilingual communities.
Check out the repo to dive into the code or replicate this yourself:
MediBeng Whisper Tiny Model
MediBeng Dataset
GitHub Repository
TL;DR
People code-switch. A lot. Especially in clinics.
ASR models weren’t handling it well.
So I trained Whisper Tiny on Bengali-English medical speech.
It now translates mixed speech into clean English.
It’s free and open for you to build on.
If you’re working in multilingual healthcare tech—or just love solving messy real-world AI problems—this one’s for you.
Let’s make speech models speak our language(s).