DevLog 20250706: Speech to Text Transcription using OpenAI Whisper

Question

DevLog 20250706: Speech to Text Transcription using OpenAI Whisper

MethodoxBackerLeader posted Jul 6 Originally published at dev.to 1 min read

Overview

Mobile phones have had audio input for a long time, but none of the default options are particularly satisfactory. And despite the rise of capable online AI-based transcription services, for very simple scenarios like "turn this recording into some text," there's still no easy tool.

OpenAI released Whisper in 2022, a powerful model capable of transcribing many languages - but even now, there's no straightforward way to use it without invoking the API directly.

The API

Under the hood, Whisper is a deep neural network trained end-to-end to map raw audio to text. Conceptually, you:

Provide an audio input - the model analyzes the waveform to extract linguistic and acoustic features.
Leverage learned representations - its multi-layer architecture handles background noise, varied accents, and low-quality recordings.
Produce a transcription - Whisper outputs a sequence of text that you can display, store, or post-process.

This high-level interaction keeps things simple: feed in speech, get back text - no need to manage model internals or low-level signal processing.

The Utility

Today I'm sharing our free Transcriber tool, which I've been using for almost half a year. It does a solid job at what it's meant to do:
https://methodox.itch.io/transcriber

We likely won't have time to develop it further, but sharing it online makes it more accessible for others looking for a similar solution.

Screenshot

Challenges

Currently, there's a limit on audio length due to OpenAI API restrictions. It would also be ideal to add real-time transcription - something like Google Voice IME.

References

Utility download: Transcriber

If you read this far, tweet to the author to show them you care. Tweet a Thanks

chevron_left

Ben Kiehl · Answer 1 · 2025-07-08T02:24:04+0000

Ben Kiehl • Jul 7

Thanks for the clear overview and sharing the Transcriber tool! A bit more depth on how you handle API limits or plans for real-time transcription would be really helpful—any thoughts on that?

Methodox · Answer 2 · 2025-07-08T10:52:28+0000

Hi Ben, thanks for the question! The Transcriber tool doesn't work around API limits natively - it's merely a shell around OpenAI's Whisper API.

When it comes to the plans, those are some ideas:

Whisper limits file size to 25Mb, and it accepts mp3, mp4, mpweg, mpga, m4a, wav, and webm - instead of sending wav, one can try sending as mp3 which has good compression. For human voices the loss in auido quality won't matter.
A naive way of implementing real-time transcription would be to periodically request transcription of partial recordings up to that point to OpenAI, this should result in good context-awareness but is less efficient on API use - on the other hand, Whisper does offer a streaming API that should definitely be looked into: https://platform.openai.com/docs/guides/speech-to-text#streaming-the-transcription-of-an-ongoing-audio-recording Streaming requires the use of web sockets though.

It turns out OpenAI has since updated their doc and have quite extensive guidance on handling longer inputs:

Please let me know if you find any technique particularly helpful!

	Whisper Transcriber - v0.5 Usage Note - Run Whisper Models Locally \| Desktop AI Methodox - Aug 24
	DevLog 20250710: ComfyUI API Methodox - Jul 10
	DevLog 20250613: Ol'ista Web Framework Methodox - Jun 13
	DevLog 20250706: Analyzing (C#) Project Dependencies Methodox - Jul 6
	DevLog 20250610: Plotting in Divooka Methodox - Jun 10

DevLog 20250706: Speech to Text Transcription using OpenAI Whisper

Overview

The API

The Utility

Challenges

References

0 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Whisper Transcriber - v0.5 Usage Note - Run Whisper Models Locally | Desktop AI

DevLog 20250710: ComfyUI API

DevLog 20250613: Ol'ista Web Framework

DevLog 20250706: Analyzing (C#) Project Dependencies

DevLog 20250610: Plotting in Divooka

More From Methodox

DevLog 20250927: Use Fossil as Binary Asset VCS

DevLog 20250507 Bringing Smart Suggestions and Smarter Search to Divooka’s Visual Coding

DevLog 20250829: Business Accounting, Bookkeeping, Tax Return - Challenges & Lessons Learnt

Welcome to Coder Legion Community

with 2,570 amazing developers

Connect with

Already have an account? Log in

DevLog 20250706: Speech to Text Transcription using OpenAI Whisper

Overview

The API

The Utility

Challenges

References

0 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Methodox