Saturday, May 12, 2018

Deep Learning for Image Classification: Baseball-player vs Cricket-player

If you are from a country which does not play Cricket or Baseball, it could be hard to differentiate. So I tried to build a simple model just to do that.

This post is based on Jeremy Howard's original work of classifying "Cats vs Dogs". I decided to use the same architecture to build the neural network on a different and perhaps more exciting problem statement.

I created a model to label images based on whether they contain a Batter (label=Baseball) or a Batsman (label=Cricket). The aim was to build a basic model with relatively simple images. No exclusive pictures of pitchers, catchers, bowlers, fielders etc.

I used fastai environment. It is a collection of libraries and lessons created to keep the standard practices/technologies available at one place. The fastai is built on top of the PyTorch, open source machine learning created by Facebook developers. PyTorch is an alternative to TensorFlow and being used widely.

The detailed post with code and output could be found on GitHub. In this post, I am sharing a quick summary.

Steps followed:
  1. Import fastai and other required libraries
  2. Set config (Path to your data, the size that the images will be resized to)
  3. Use a pre-existing NN architecture (pre-trained model) called resnet34
  4. Train our data and evaluate the model on the validation set
  5. Explore the results
Some interesting findings:

In just 4 epochs the model was able to achieve 81% accuracy on the validation dataset.
epoch      trn_loss   val_loss   accuracy       
    0      1.289021   0.991551   0.25      
    1      1.156484   0.812032   0.4375         
    2      1.012042   0.668625   0.5625         
    3      0.843908   0.548285   0.8125    
Some correctly predicted images (closer 1 is Cricket, closer to 0 is Baseball)

Some incorrectly predicted images (closer 1 is Cricket, closer to 0 is Baseball)

This was interesting! I had chosen a few images in the validation dataset slightly deviating away from the common pattern in the training dataset. For example, in the first image above (Baseball-image) we do not see the ground unlike the ones in training data. In the second image above (Cricket-image) the third stump is not in place

The values closer to 0.5 shows less confidence in the prediction.

I believe with more training data and increasing number of epochs the performance could be increased. In the next post, I hope to do more experiment with fine-tuning the model with a slightly more difficult problem.

No comments:

Post a Comment