At the end of last year, I have defended my master’s dissertation: A Performance Increment Strategy for Semantic Segmentation of Low-Resolution Images from Damaged Roads. This year, fortunately, I have been 1st-place awarded in two contests Workshop de Visão Computacional (WVC 2023) and Workshop-Escola de Informática Teórica (WEIT 2023).
My dissertation can be found here.
Abstract: Semantic segmentation is vital for understanding a road scene and, consequently, achieving autonomous driving. However, new challenges arise when attempting these tasks in emerging countries, given the lack of high-quality infrastructure or limited com- putational resources. Recently, the Brazilian National Transport Confederation (CNT) reported that 85% of the Brazilian roads present some damage like cracks, holes, and patches; these damages are usually not regarded by the state-of-the-art deep learn- ing models of road semantic segmentation, which are trained to meet the developed countries infra-structure in high-resolution datasets like Cityscapes (2048x1024) and CamVid (920x720). In 2019, the Road Transverse Knowledge (RTK) was specially de- signed to meet the emerging country reality; it consists of 701 fine-annotated images of low-resolution (352x288) and 12 classes with different road surfaces and damages like potholes, water puddles, and cracks. Based on the RTK dataset, this work points out the main challenges for emerging country roads: 1) small objects given low-resolution images, 2) multiscale objects given irregular-shaped objects, and 3) highly imbalanced classes given road-damages small size. Finally, this work proposes the performance increment strategy to enhance results in emerging country datasets; the strategy con- sists of a series of 15 experiments to choose the best option for each training setup like data augmentation, loss function, and optimizer. Furthermore, the strategy suggests architecture modifications such as the max-pooling layer removal from ResNet and hybrid and digressive dilation rates. In the end, the strategy raised the RTK benchmark from 0.547 to 0.798 mIoU on the validation set; and reached 0.688 mIoU in the TAS500 test set, the best results published so far.
One of the dissertation’s highlight takeaways is finding that reducing the output stride (OS) or removing the ResNet’s max-pooling layer on the DeepLabV3+ architectures avoids the presence of false-positive island across the background.