Friday, July 5, 2019

Handwritten Devanagari Character Identification


After playing with Cricket vs Baseball images I wanted to try the fastai approach on a more concrete problem where the benchmarks were available. So I chose a dataset with Handwritten Devanagari Character Identification (character set for my mother tongue Marathi)  with SoTA accuracy of 98.47%. The aim was to check if I can beat this number.

I used fastai environment. It is a collection of libraries and lessons created to keep the standard practices/technologies available at one place. The fastai is built on top of the PyTorch, open source machine learning created by Facebook developers. PyTorch is an alternative to TensorFlow and being used widely.

The detailed post with code and output could be found on GitHub. In this post, I am sharing a quick summary.

Steps followed:
  1. Import fastai and other required libraries
  2. Set config (Path to your data, the size that the images will be resized to)
  3. Use a variants of a pre-trained NeuralNet architecture called ResNet
  4. Train our data and evaluate the model on the validation set
  5. Explore the results

Dataset Samples

The following screenshot displays top incorrect predictions. Some of these are hard to identify for a native Devanagari speaker and writer like me.



The Best Accuracy

Here I tried multiple ResNet architectures. The best accuracy of 99.44% was achieved on the validation dataset using ResNet101 architecture. 













This beats the state of the art accuracy (98.47%) declared by the dataset creator.

No comments:

Post a Comment