So this comedy theatre group wanted to do a show on AI and asked us for help: can you build something smart and funny?? After many crazy, hilarious, underwhelming, outrageous and impossible ideas, we converged to the idea of an app that shows how you look like some ugly celebrity via some steps:

This blog discusses how to build the AI part of the app, while part 2 concerns the aspects of building a working application in Flask and deployment via Docker and Azure App Service:

A large set of images of people/faces for the intermediate steps. You could use the Labeled Faces in the Wild dataset for this, although larger images work better. We’ll get back to this topic later…

Then, what we need to build is:

face detection: to find the location of a face in an image

a similarity function do determine how similar faces are to each other

a path-finding algorithm to get from the starting image to the final one

an app to display the path of images

Encoding faces

Now about those first two: the Python package face_recognition (which wraps around models from the dlib package) does an excellent job for this. You can detect the face area and get an encoding vector of the face simply like this (although you should add some exception handling):

This encoding, a vector of size 128, comes from the final layer of a CNN model trained to recognize faces. For more info on this, check my previous article on face recognition:

If we have encodings for two faces, we can easily compare them with standard distance metrics like cosine distance or euclidean distance.

from sklearn.metrics.pairwise import cosine_distances

# a 2D-array of encodings of size len(images) x 128
encodings = get_encodings(images)

# a 2D-array of distances between i and j (for row i and column j)
distances = cosine_distances(encodings, encodings)

N.B. Computing an n * n distance matrix (and keeping it in memory) is peanuts for any n < 100K, so don’t worry about that.

Finding a path

How to get a path from a starting image to some ending image, where at each step the two images are highly similar?

Using the distances matrix, which gives the distance between any two images in our set, we can find the most similar image by selecting the row for that image and select the column index with the lowest value

Cool, we’ve found a similar image (if you have a good enough data set that is). But this doesn’t yet bring us any closer towards the ending image…

Q: Then what about path-finding algorithms like Dijkstra?
A: This doesn’t work because that looks for the shortest total path length. We’re looking to minimize the individual step length along the path.

Ergo: time to design an algorithm for ourselves! Here’s some pseudo-code:

Algorithm 1:

# init
Is = starting image
Ie = ending image
Ii = some intermediate image
max_num_steps = 10

# Let gamma parameter decrease equidistantly from 1 to 0
# between each of the steps
gamma_list = np.arange(0, 1, 1 / (max_num_steps))[::-1]

Ii = Is
For each step, find an intermediate image Ij that minimizes
d1 = distance(Ii, Ij)
d2 = distance(Ij, Ie)
gamma = gamma_list[step]
cost = gamma * d1 + (1-gamma) * d2
until distance(Ii, Ie) < distance(Ii, Ij)

This algorithm is very basic but works decently. We experimented with some alternatives

Algorithm 2:

- start with (from, to)
- try to find an image i such that:
- distance(from, i) < distance(i, to)
- distance(i, to) < distance(from, to)
- if so, replace the path by [(from, i), (i, to)]
- continue to do the same for all elements in the path until there's no suitable intermediate image

How to evaluate a path?

In order to compare algorithms and their resulting paths, we need some evaluation metric. Obviously we want all steps (i.e. the distance between any two successive images) to be as small as possible. So we could use the mean distance. If we adhere to the idea that the chain is as strong as the weakest linkthough, we should look at the maximum distance, because that identifies the weakest link. Let’s use a weighted average of those two:

def evaluate path(path, distance_matrix, alpha=0.8):
distances = [
distance_matrix[path[i], path[i-1]]
for i in range(1, len(path))
]
cost = alpha*np.max(distances) + (1-alpha)*np.mean(distances)
return cost

A good data set

Now we have all the ingredients for a working solution! So, can we enjoy the fun and bask in glory? No: we notice that we get shitty matches because we have a shitty data set…

Some problems with online available data sets:

Small image size leads to bad face encodings

Bad lighting / pictures from an angle lead to bad face encodings

The result is that images, whose face encodings have high similarity, do not necessarily look similar to a human.

A different approach we tried was generating (about 5.000) images with faces using StyleGAN. This is a GAN that generates highly realistic pictures of faces in fairly high resolution (1024×1024 pixels), given some random vector as input. You can check examples of this at https://thispersondoesnotexist.com.

This definitely was an improvement. Another notch for the argument that good data is more important than good algorithms.

How to improve?

We now have a working solution that performs reasonably well… But what if you want to make it really awesome? How can we improve it?

The current approach for finding a good path from person A to person B is selecting images from a pre-defined set. Another, possibly better approach would be to generate the right images by interpolation, i.e. creating a path by generating an image that looks 90% like person A and 10% like person B, then 80/20, …, 20/80, 10/90.

In order to do this we need to learn the relationship between the input vector for a StyleGAN image and the resulting face encoding vector. By building a model that learns this function we could determine the input vector needed to generate the right image.

Another area for improvement could be the similarity/distance function. We notice that sometimes two images, that don’t seem similar to the human eye, do have similar face encodings. To remedy this, we could try to use a different, lower-level encoding that captures the ‘style’ of the image rather than the facial features.

Read on in part 2 to see how this idea was developed in a web app using Flask and deployed using Docker and Azure App service.