Solving a Mystery with OpenCV


Using OpenCv and my webcam track and draw on my screen.

Published on July 21, 2021 by Claudia Chajon

Python computer vision webcam machine learning

3 min READ


Using Machine Learning for My Own Purposes


See the work

Age and gender classification has been around for a few years now. While it might seem like a generally simple task for a computer, even humans get it wrong. Entire 90’s era talk shows have dedicated full episodes on the topic of gender alone.

As far as age prediction goes, so much depends on whether or not the person has ever used moisturizer and sunscreen. “Guess How Old I Am” is a common game played by humans.

I decided to use and analyze a well-known model trained by Gil Levi and Tal Hassner to predict both age and gender on several pictures. With the end goal being a successful prediction on a picture of a stranger. More on that mystery here.

About the model:

The structure of the Convolutional Neural Network used by Levi and Hassner is made up of:

  • Conv1 : The first convolutional layer has 96 filters/nodes of 7x7 pixels.
  • Conv2 : The second convolutional layer has 256 filters/nodes with 5x5 pixels.
  • Conv3 : The third convolutional layer has 384 filters/nodes with of 3x3 pixels.
  • Two fully connected layers containing 512 nodes each.

The prediction is based on the class with max probability for the image.

Step 1: Face Detection

First things first, where is your face?

def get_face(net, frame, conf_threshold=0.7):
    frameOpencvDnn = frame.copy()
    frameHeight = frameOpencvDnn.shape[0]
    frameWidth = frameOpencvDnn.shape[1]
    blob = cv.dnn.blobFromImage(frameOpencvDnn, 1, (300, 300), [104, 117, 123], True, False)

    net.setInput(blob)
    detections = net.forward()
    bboxes = []
    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        if confidence > conf_threshold:
            x1 = int(detections[0, 0, i, 3] * frameWidth)
            y1 = int(detections[0, 0, i, 4] * frameHeight)
            x2 = int(detections[0, 0, i, 5] * frameWidth)
            y2 = int(detections[0, 0, i, 6] * frameHeight)
            bboxes.append([x1, y1, x2, y2])
            cv.rectangle(frameOpencvDnn, (x1, y1), (x2, y2), (255, 0, 255), int(round(frameHeight/150)), 2)
    return frameOpencvDnn, bboxes

Using OpenCV’s built-in face detector DNN, function get_face detects and places a frame around each detected face.

Step 2: Predict Gender

Everything is either a 0 or a 1 to a computer. Any more than that and their fans really start blowing. For now gender predictions will have to stay binary.

genders = ['Male', 'Female']

After faces are detected they are sent through the gender network to be classified.

Step 3: Predict Age

ages = ['(0 - 2)', '(4 - 6)', '(8 - 12)', '(15 - 20)', 
	'(25 - 32)', '(38 - 43)', '(48 - 53)', '(60 - 100)']

Age network is similar to gender network, therefore take the max out of all the outputs to get the predicted age group.

Output

result

result

Results

The gender detection results are pretty good! Age predictions have been inconsistent and kinda fun to look at. The Levi and Hassner paper provided a confusion matrix for the age prediction model.

matrix

  • age groups 0-2, 4-6, 8-13, and 25-32 have high accuracy scores, in contrast to age groups 15-20, 38-43, 48-53 and 60+

In conclusion, the CNN was pretty OK at predicting both gender and age. And really, how good are any of us at doing that?

Using a much larger dataset is one good experiment to try to boost the model. However, the idea of putting people into little boxes doesn’t really sit right with me. For academic purposes, its fine.