Real-time Braille reader that uses the phone camera to transcribe embossed Braille into English text and speech.


DotVision is built to solve a real-world accessibility gap in which physical Braille text exists on paper, labels, classroom materials, product packaging, or embossed documents, but most sighted people, caregivers, teachers, shopkeepers, and even support staff cannot read it. This creates everyday problems in education, healthcare, documentation, and public accessibility, because Braille is present but not immediately understandable to the people who need to verify or interpret it. Our goal is to bridge that gap with a practical, camera-based Braille reading system that works in real time on a mobile device.
We built DotVision as a full end-to-end pipeline rather than a single model. The solution uses a mobile-first web application with a live camera reader route, where the user points the phone at physical embossed Braille and the app begins scanning immediately. For local processing, we use OpenCV.js in the browser to capture frames and refine the image by improving contrast, reducing noise, and making Braille dot patterns clearer. For on-device detection, we trained a local YOLO-based model to identify Braille regions and improve reliable recognition of Braille structure from real camera input. For more complex or uncertain cases, the pipeline can also use Gemini through OpenRouter on the server side to support transcription and enhance robustness when the input is noisy or difficult to read.
The architecture is designed to balance speed, accuracy, and practicality. The app starts with the camera, processes the live Braille image locally for fast preprocessing, sends the cleaned result through the transcription pipeline, and returns readable English text to the user. The final output is shown instantly on screen and can also be read aloud using browser-based text-to-speech for accessibility. This gives users both visual and audio feedback, making the system useful for demonstrations, classroom use, and real-world assistive scenarios.
In short, our solution turns physical Braille into usable English text and speech with a combination of browser-based image refinement, local machine learning for Braille detection, and cloud AI for difficult cases. It is intended to be fast, mobile-friendly, accessible, and practical enough to solve a real problem rather than just produce a demo.
During the hackathon, we focused on building a mobile-first live Braille reader called DotVision. We set up the core Next.js application, defined the /reader scanning route, integrated the camera flow, and built the preprocessing and transcription pipeline step by step. We also added the landing page CTA, structured the API route for server-side Braille transcription, and designed the browser speech output flow. Progress was made toward a stable, demo-ready experience with auto-scan, manual transcribe fallback, and accessible UI for real-world use.
NA