Image quality analysis with machine learning – Electronic design services with Argon Design Ltd.

Image quality analysis with machine learning

Skill sets used

Image and video processing, Algorithms and mathematics, User interfaces

Image classification - assigning an image to a category based on its visual content - is a general problem with many applications. Examples include the fields of object recognition, remote sensing, content-based indexing and quality control. Argon Design has been involved in the development of a system designed to enable users to create realistic 3D virtual avatars of themselves based on video from their mobile phone camera.

The Problem

To provide a truly personal experience, the process involves creating an accurate 3D model of the user’s face. To create the face model, users have to take a short video of their head turning from one side to the other, typically using the front facing camera of a mobile device. This video is then input to various computer vision algorithms (Structure from Motion, Texture Extraction), which extract a 3D mesh model of the face and a corresponding texture map.

The accuracy of the extracted 3D model is dependent on the quality of the input video. This is due to the internals of the vision algorithms involved. The system must ensure that the end result is lifelike and pleasing, otherwise there is a risk of the user abandoning the service. It is also important that the user can achieve the desired results without a large amount of trial and error, which again risks causing frustration and ultimately results in bad user experience.


We created a real-time feedback mechanism to assist the user in the acquisition of good quality video. Prior to starting the video capture, the user is presented a live view of the camera feed. The feedback mechanism analyses the live view, and if necessary, provides the user with recommendations on how to improve the conditions in order to achieve a high quality end result.

Input quality feedback


After analysing the sensitivities of the model extraction algorithms we can identify two broad groups of input quality issues: illumination related problems and contextual problems.

Incorrect illumination can cause problems both for structure and texture extraction. For example, the location of strong highlights on the face is dependent on the direction of the incident light, and tends not to move together with the rest of the face landmarks as the user rotates their head on the video. This effect is problematic for structure extraction algorithms as the highlights can be misinterpreted as static landmarks, or obstruct real facial features. Similarly, strong directional lighting, or light sources with high colour content can result in uneven and unnatural skin tones after texture extraction.

Contextual problems cause difficulty mostly during structure extraction and arise due to the assumptions and limitations of the algorithms involved. For example, if the user’s fringe is covering a portion of their forehead, or if the user is wearing glasses, these structures will be incorporated into the extracted 3D mesh, which as a result shows little resemblance to the shape of a human face.

Wearing glassesMouth openExcellent

Quality analysis of the input image is then an image classification problem. We must decide whether any of the problematic conditions is present in the input image. Given enough reference data, we can use machine learning techniques to train classifiers that identify these quality issues. Availability of reference images already classified through other means (typically by humans) is a key necessity when applying machine learning. While it is trivial for a human to judge whether someone is wearing glasses or not, it is more difficult to objectively assess illumination problems when manually classifying reference input. Also, the quality analysis must run sufficiently fast on a mobile device to provide real-time feedback during the live video preview. This performance requirement limits our use of some computationally expensive machine learning techniques.

The trade-off we have made is to use machine learning techniques to identify contextual issues, and use heuristics for illumination-related problems. Contextual issues tend to vary relatively slowly (e.g. it is unlikely that the user will keep taking their glasses on an off at high frequency), and hence separate issues can be analysed during alternate frames. Observed illumination can change faster, for example as the user moves through a room, or as the automatic exposure and white balance control of the camera adapts to the lighting conditions. This necessitates analysing illumination at a higher rate, in order to keep the system responsive. We can utilize a lot of prior information based on knowing that the input image contains a frontal face. We can for example use average facial proportions and shape, average chromaticity of skin colour and typical skin texture in specific areas.

Putting it all together, we have implemented the quality analysis as a multistage algorithm, which analyses the live video preview one frame at a time, but utilizes the inter frame correlation for increasing efficiency. The output is a set of scores in a predefined range which are indicative of the presence of a quality issue. The steps the algorithm follows are roughly:

  1. Use Viola-Jones face detection for finding the coarse location of the face in the frame. The search is restricted to a region around the location of the face in the previous frame.
  2. We use a fast, machine learning based algorithm to accurately locate a few important facial landmarks. Using known typical spatial relations amongst the facial landmarks, we compute a more precise estimate for the location and orientation of the face within the frame.
  3. Based on the facial landmarks and precise position estimate, we can extract normalized sub regions from the frame.
  4. The normalized sub regions are scored for illumination problems using heuristics designed for the specific conditions as they would manifest in a normalized image.
  5. Individual contextual problems are scored on alternate frames using convolutional neural networks.
  6. A set of standardized quality scores is presented to the recommendation logic.

The problem of computing an improvement recommendation can be considered another classification problem. The inputs in this case are the scores from the image quality analysis algorithm, and the possible output classes represent the most pressing problem with the image. This final classification can then be used to provide the user with feedback on how to alter the input to achieve good results.

The outcome

Vincent van Gogh, brighter towards top

We have implemented our quality analysis algorithm as a native library, which can be used by application developers to integrate into any application that would benefit from its functionality. We also developed a demo application to test the responsiveness and integration of the algorithm with a live video preview. We make use of the BSD licensed OpenCV library, which enables us to take advantage of optimised implementations of image processing and computer vision algorithms developed by the community. While not a mandatory dependency, our own algorithms can also take advantage of a high performance BLAS library to increase performance. As a result, our library is capable of analysing at typical video framerates on a modern mobile platform.

Broader applications

The use of machine learning for image classification is a well recognised technique. There are applications in many fields including security, machine vision, automotive and healthcare. A couple of recent examples in that last category are:

As demonstrated in this case study, Argon Design has the skills to identify appropriate algorithms and to develop implementations for operation within particular platform constraints in the real world.

Related case studies

Contact us

Do you have a project that you would like to discuss with us? Or have a general enquiry? Please feel free to contact us

Contact us