Voice assisted real time traffic sign detection and recognition system under adverse environmental conditions based on YOLOV8

Kumari, P. H. S.; Naleer, H. M. M.

Voice assisted real time traffic sign detection and recognition system under adverse environmental conditions based on YOLOV8

Kumari, P. H. S.; Naleer, H. M. M.

URI: http://ir.lib.seu.ac.lk/handle/123456789/7887

Date: 2025-10-30

Abstract:

The world is currently in an era of AI and automation. Therefore, the development of AI technologies is increasing rapidly. Automated navigation technology is one aspect of rapidly developing AI technology. In consequence, many researchers focus on TSDR systems. A key factor affecting the accuracy of the TSDR system is the clarity of traffic signs. Haze, low light, and other atmospheric conditions can significantly degrade the visibility of signs. Traffic accidents often occur due to the inability of drivers to recognize road signs while driving. Therefore, clearly and accurately recognizing signs is important for both drivers and pedestrians, thereby ensuring their safety. Drivers can lose focus while driving for various reasons. In such cases, providing a voice alert about a traffic sign can help bring the driver's attention back to driving. However, existing methods fail to perform haze removal, TSDR, and voice warning simultaneously. Therefore, in this work, a TSDR system has been developed with a deep learning-based HRU-Net algorithm with a voice assistant. According to the proposed pipeline, the HRU-Net model takes haze images as input and produces a dehazed image as output. The TSDR model then uses this haze-free image as input. After detecting and classifying that image, the traffic sign is fed into a gTTS. It generates a concise, real-time voice alert. It enables drivers to receive critical sign information without diverting their attention from the road. The proposed system was evaluated using the CURE-TSD dataset, which contains roughly 45,000 traffic sign instances based on 43 categories. Those images were captured under a wide range of environmental conditions. In the dehazing stage, the model achieved a Mean Absolute Error (MAE) of 0.0526, a Structural Similarity Index Measure (SSIM) of 0.8442, and a Peak Signal-to-Noise Ratio (PSNR) of 20.55 dB, around 50 epochs. In the YOLOv8 detection and classification stage, enhanced images are used for training, which results from the dehazing step. In this step, the model reached 99.07% precision, 99.13% recall, mAP@0.5 of 99.38%, and mAP@0.5:0.95 of 85.69% at the optimal 40th training epoch. The voice alert module achieved an average latency of ~230 ms between detection and audio playback. This voice alert module provides clear, concise feedback. When compared to existing methods, the proposed system provides superior accuracy and responsiveness. This model gives a robust and practical solution for advanced driver- assistance systems in adverse visual environmental conditions.

Show full item record