Back to Works
2024 ML / CV Python PyTorch CNN LSTM

Lip Reading CNN

A deep learning lip-reading system that analyses lip movement in video frames and infers spoken content from silent footage — no audio required.

Full write-up coming soon — Detailed documentation for this project is currently being prepared.

Overview

Developed as part of my undergraduate research, this system detects and crops lip regions from video using face detection, then passes the sequence of frames through a CNN-LSTM architecture to classify the spoken words or characters. The goal was to explore whether reliable lip reading was achievable with a moderate-sized dataset and standard academic compute.

Key Technologies

Status

Completed as undergraduate thesis research. Architecture diagrams, evaluation metrics and a link to the code repository will be added to this page shortly.