By Dr Stuart Golodetz, Postdoctoral Research Associate at the Nuffield Department of Clinical Neurosciences, University of Oxford, and head of object detection and tracking for the Smart Glasses Project.
This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.
People who are visually-impaired face numerous daily challenges, from how to find where to go and the best way to avoid obstacles on the way, to how best to locate, recognise and interact with other people and objects. This can have a significant impact on their independence, confidence and overall quality of life. However, although visual impairments can prevent people from making use of visual signals from the world around them, only a small percentage of visually-impaired people are completely blind in the sense that they receive no useful visual inputs at all.
It is far more common instead for people to retain some level of residual vision, whether that amounts to small regions of their visual field in which they can see, or the more limited ability to distinguish between light and dark. In some cases, the real issue is one of visual signals being drowned out by ‘noise’, and by boosting the signal-to-noise ratio in those regions it is sometimes possible to provide people with at least some ability to perceive what they are looking at.
The goal of our research at the Oxford SmartSpecs Project is to develop smart glasses that can enhance whatever sight visually impaired people may have left, and provide them with additional information about their surroundings. Our glasses capture images of the world around the wearer with a depth camera, and use them to generate high-contrast images in real-time that are then presented to the wearer on transparent displays so as to augment rather than replace what remains of their vision. Transparent displays are also of benefit for social and aesthetic reasons, since they allow other people to see the wearer's eyes.
Initial tests have shown great promise, as the glasses allow even legally blind people to perceive objects in higher detail than previously possible. We believe they will ultimately be of huge benefit to the visually-impaired, with the potential to allow millions of people find greater independence and quality of life.
The software that controls the glasses is written in C++ via Qt, and designed with a pipeline architecture. Images are captured from one or more cameras and then used as the initial inputs to a sequence of parameterised transformation steps. The individual steps can implement arbitrary data transformations, have any number of inputs and outputs, and be reordered freely, subject to compatibility constraints. The parameters allow the operation of the glasses to be customised for individual wearers based on their visual abilities. Suitable pipelines ultimately produce high-contrast output images to be rendered on the transparent displays.
In the current incarnation of the glasses, we use a pipeline that implements three separate image enhancement techniques. First, a depth camera is used to determine the position of the objects to a range of about three metres. The edges of any surfaces within this distance are traced with a strong bright outline to help separate them from other objects. Second, surfaces are shown in greyscale, and become brighter the closer they are to the user.
This helps to indicate the presence of objects, in particular hazards such as walls and obstacles that might pose a collision risk. Third, the RGB image from the camera undergoes an edge enhancement treatment to bring out specific details. The presentation of the edges is under the control of the depth camera and so details are only shown on nearby objects. The combination of these three processes allows the wearer to concentrate their attention on nearby objects, such as people and obstacles, but avoids any visual clutter in the background.
Screenshot of the live video fed to the glasses. This shows object segmentation using depth and edge enhancement which is able to bring out certain features of a person's face. Image by the Oxford SmartSpecs Project.
Initial tests, most notably in Oxford's covered market, have indicated that our glasses excel at the recognition and enhancement of facial expressions and bodily movements and gestures, which is of particular importance to many blind individuals. The three dimensionality of structures is also well preserved. This improves hand-eye coordination and allows a person to have a relatively natural interaction with objects that they can see. The distance can be extended up to the range of the depth camera, and parameters such as edge density, contrast, field of view and the inclusion of colour can all be adjusted to suit the user's own visual abilities.
Our work now focuses on three key challenges The first of these will be how best to turn the glasses into an affordable commercial product for the visually-impaired. Secondly, we want to extend the software with new image enhancement methods. We have already made progress in this area with robust techniques in order to highlight floor-level obstacles and trip hazards. Finally we want to extend the software with computer vision techniques that can extract semantic information from images, such as through the detection of objects, signs or people's faces. This phase of the project is in collaboration with the Torr Vision Group in Oxford's Department of Engineering Science.