PART1: THE AI PART
So this comedy theatre group wanted to do a show on AI and asked us for help: can you build something smart and funny?? After many crazy, hilarious, underwhelming, outrageous and impossible ideas, we converged to the idea of an app that shows how you look like some ugly celebrity via some steps:
- Your face looks similar to person A
- Person A looks like person B
- Person K looks like some ugly celebrity
- Ergo: you are ugly
- Hilarity ensues…
So it’s basically Google’s X degrees of separation except with faces instead of artworks.
So the first thing we need is images:
- A starting image of you (or a set to choose from)
- A set of images of ugly celebs
- A large set of images of people/faces for the intermediate steps. You could use the Labeled Faces in the Wild dataset for this, although larger images work better. We’ll get back to this topic later…
Then, what we need to build is:
- face detection: to find the location of a face in an image
- a similarity function do determine how similar faces are to each other
- a path-finding algorithm to get from the starting image to the final one
- an app to display the path of images
Now about those first two: the Python package face_recognition (which wraps around models from the dlib package) does an excellent job for this. You can detect the face area and get an encoding vector of the face simply like this (although you should add some exception handling):
This encoding, a vector of size 128, comes from the final layer of a CNN model trained to recognize faces.
If we have encodings for two faces, we can easily compare them with standard distance metrics like cosine distance or euclidean distance.
N.B. Computing an n * n distance matrix (and keeping it in memory) is peanuts for any n < 100K, so don’t worry about that.
Finding a path
How to get a path from a starting image to some ending image, where at each step the two images are highly similar?
Using the distances matrix, which gives the distance between any two images in our set, we can find the most similar image by selecting the row for that image and select the column index with the lowest value
Cool, we’ve found a similar image (if you have a good enough data set that is). But this doesn’t yet bring us any closer towards the ending image…
Q: Then what about path-finding algorithms like Dijkstra?
A: This doesn’t work because that looks for the shortest total path length. We’re looking to minimize the individual step length along the path.
Ergo: time to design an algorithm for ourselves! Here’s some pseudo-code:
This algorithm is very basic but works decently. We experimented with some alternatives
How to evaluate a path?
In order to compare algorithms and their resulting paths, we need some evaluation metric. Obviously we want all steps (i.e. the distance between any two successive images) to be as small as possible. So we could use the mean distance. If we adhere to the idea that the chain is as strong as the weakest link though, we should look at the maximum distance, because that identifies the weakest link. Let’s use a weighted average of those two:
A good data set
Now we have all the ingredients for a working solution! So, can we enjoy the fun and bask in glory? No: we notice that we get shitty matches because we have a shitty data set…
Some problems with online available data sets:
- Small image size leads to bad face encodings
- Bad lighting / pictures from an angle lead to bad face encodings
The result is that images, whose face encodings have high similarity, do not necessarily look similar to a human.
A different approach we tried was generating (about 5.000) images with faces using StyleGAN. This is a GAN that generates highly realistic pictures of faces in fairly high resolution (1024×1024 pixels), given some random vector as input. You can check examples of this at https://thispersondoesnotexist.com.
This definitely was an improvement. Another notch for the argument that good data is more important than good algorithms.
How to improve?
We now have a working solution that performs reasonably well… But what if you want to make it really awesome? How can we improve it?
The current approach for finding a good path from person A to person B is selecting images from a pre-defined set. Another, possibly better approach would be to generate the right images by interpolation, i.e. creating a path by generating an image that looks 90% like person A and 10% like person B, then 80/20, …, 20/80, 10/90.
In order to do this we need to learn the relationship between the input vector for a StyleGAN image and the resulting face encoding vector. By building a model that learns this function we could determine the input vector needed to generate the right image.
Another area for improvement could be the similarity/distance function. We notice that sometimes two images, that don’t seem similar to the human eye, do have similar face encodings. To remedy this, we could try to use a different, lower-level encoding that captures the ‘style’ of the image rather than the facial features.
Read on in part 2 to see how this idea was developed in a web app using Flask and deployed using Docker and Azure App service.