Insight into Machine Learning: Powers and Pitfalls

October 11, 2018

Since completing the Microsoft Professional Program in Data Science (https://academy.microsoft.com/en-us/professional-program/tracks/data-science/) almost 2 years ago, I have been keeping a close eye on Machine Learning and Data Science trends.

I learned many valuable insights about Machine Learning from this program. While working on a simple Tensorflow example recently I had an experience that perfectly illustrates one of these insights that I'm here to share today.

Powers

Here is a Git repository (https://github.com/ActiveState/tensorflask). In it contains the distillation of the work of an individual or group that took the time to train a Tensorflow model to categorize dogs. By downloading this code you harness the results of many hours of work that people have done. The work includes curating thousands of sample photos of each type of dog, getting it into the correct format, and then training and tweaking the neural network to establish a very accurate tool to categorize dogs that you can simply download and use. Amazing!

In this exact example, it can categorize three types of dog: puddle, pug, or dachshund.

Does it work?

Let's start with an easy one. Here's a picture of a dachshund, with no background and a full outline profile of the dog.

What was the output of the model?

Evaluation time (1-image): 0.109s
dachshund 0.999997
pug 2.7389e-06
poodle 2.90583e-08

Wow, pretty good in less than a second, it got a confidence score that the image is 0.999997 (five 9s) a dachshund.

Let's Up the Ante

Let's try more difficult examples. Like one with other objects, and one that is simply a headshot.

Evaluation time (1-image): 0.086s
pug 0.886593
dachshund 0.111889
poodle 0.00151825

Evaluation time (1-image): 0.091s
poodle 0.999996
dachshund 3.51365e-06
pug 9.75337e-08
As you can see the model was less certain about the baby pug giving it a score of 0.886593, while it has absolutely no problems with the headshot of the poodle.

Confidence In the Model

At this point, I think most people and myself included, are pretty confident in the model to provide a pretty accurate score for almost any picture of a single poodle, pug, or dachshund. But what happens when we throw a wolf at it, something that doesn't belong in either of the three categories.

Evaluation time (1-image): 0.095s
poodle 0.580887
pug 0.342018
dachshund 0.0770947
Damn! Not bad the model even though the model was only trained to categorize three types of dogs a poodle, pug or dachshund, it handled an image of a wolf in a manner to be expected. You read the results and you can conclude that based on the trained model it indicates it's a blend of a poodle and a pug.

At this point, you might be thinking this model is great, I could curate photos of Pomeranians and German shepherds to further extend its usefulness by being able to categories more types of dogs. Maybe you can even use an improved model to build an app to help photographers label all the photos of dogs they take!

What happens if...

Running with this idea, you decide to see what happens when you run the model against a car. A car is clearly not a dog so you'd expect the score the model to return you a zero, which would indicate it is not a dog. Let see what happens...

Evaluation time (1-image): 0.085s
dachshund 0.890405
poodle 0.0899952
pug 0.0195995

What the ... 0.899952 score that a Porsche 911 Turbo is a dachshund dog?

The Takeaway

It's a topic that I touched upon in my article discussing AWS speech recognition and language understanding technologies. You must truly understand the capabilities and limits of the technology you are working with and the problem you are trying to solve.

This example demonstrates how this simple Tensorflow project has a human-level ability to distinguish between three different types of dogs and even the potential for expert human-level ability to determine mix breeds, but then also demonstrate how it fails to identify the image isn't a dog at all, something even a 5-year-old can do.

I can go into details on how and what the model is doing, or how the output of the model can be changed to eliminate this particular problem, but that is beside the point.

The point is Machine Learning has it's strengths (Powers) and it has its weaknesses (Pitfalls). Just like it is foolish not to use Machine Learning technologies to improve your business, products, and services. It is equally foolish to follow the recommendations 100% without any critical analysis or review.

As the data-driven revolution ushers in amazing new insights and the potential to provide new perspectives on how to see the world (watch the documentary on AlphaGo: https://www.alphagomovie.com/), remember always question, be curious and dig deeper.

Kinman Lam