• Person A looks like person B
  • Person K looks like some ugly celebrity
  • Ergo: you are ugly
  • Hilarity ensues…
Cool, I look like an orc from Lord of the Rings!


So the first thing we need is images:

  • A set of images of ugly celebs
  • A large set of images of people/faces for the intermediate steps. You could use the Labeled Faces in the Wild dataset for this, although larger images work better. We’ll get back to this topic later…
  • a similarity function do determine how similar faces are to each other
  • a path-finding algorithm to get from the starting image to the final one
  • an app to display the path of images


Encoding faces

Now about those first two: the Python package face_recognition (which wraps around models from the dlib package) does an excellent job for this. You can detect the face area and get an encoding vector of the face simply like this (although you should add some exception handling):

Comparing faces

If we have encodings for two faces, we can easily compare them with standard distance metrics like cosine distance or euclidean distance.


Finding a path

How to get a path from a starting image to some ending image, where at each step the two images are highly similar?

A good data set

Now we have all the ingredients for a working solution! So, can we enjoy the fun and bask in glory? No: we notice that we get shitty matches because we have a shitty data set…

  • Bad lighting / pictures from an angle lead to bad face encodings


How to improve?

We now have a working solution that performs reasonably well… But what if you want to make it really awesome? How can we improve it?

Which input vector will result in which face encoding?