When Architech developer Jin Sung Kang first conceived of The Face of Toronto, the idea was to morph together the photos of a thousand Nuit Blanche participants in a real-time, big-screen exhibition that would reveal the ultimate blended face at the end of the night.
We loved the idea.
Jin went into extreme lab mode to build out the program that would capture all the images taken during the night and move them through the morphing process. At the time, he had to manually label each facial keypoint for the system to recognize what went where. While there were standard image processing methods we could use, there was no way we could trust these systems to work in the wild. On a night like Nuit Blanche, with too many uncontrolled conditions, there was simply no room for error.
That was January. In February, Architech Labs welcomed machine-learning whiz, Yanshuai Cao, to the family. In addition to being brilliant, Yanshuai brought additional expertise in deep learning, a subfield of machine learning that teaches computers to learn like humans through a network that has been inspired by the way the neurons of the human brain are connected across a neural network.
If we applied a deep learning model to our facial keypoint detection process, the computer would be able to recognize the keypoints automatically with great speed and under any condition. Incredible!
Though this technology already exists, Yanshuai expanded on the small model he already had access to and trained the network with a pool of 5,000 images taught to recognize the various keypoints of the human face – everything from jawline to nose and eyes.
Since deep neural nets need a great volume of data to learn properly, the Labs team augmented the original 5,000 images by randomly zooming in and out of the face and creating a mirror version of the same image multiple times. This way, we managed to train the network with 30,000 examples from the original 5,000.
The neural net we used is called a cascaded convolutional neural network. It’s called “cascaded” because the networks successively pass on information to the subsequent networks, much like a waterfall. In our model, two networks begin by trying to roughly identify all the major keypoints in the face at the same time: One is trained on the outer keypoints (boundaries) of the face (chin, ears) while the other gets to work on identifying the inner keypoints of the face (corners of the mouth, eyes).
In the network’s second stage, we trained the model to estimate any small corrections that may be needed after the first-stage predictions to reduce the margin of error. With keypoints detected, the new face is ready to be morphed onto all the previous faces.
What will the final “Face of Toronto” look like? We’re just as curious as you to find out!
Come find us on October 3 at the corner of Adelaide and York and add your face to the mix. Then be sure to document and share how your face contributed to the ultimate Face of Toronto.