Image recognition using Shallow Neural Network
1
2
3
import tensorflow as tf
Improving Computer Vision using CNN
With a shallow neural network, our accuracy will not be great on the validation set. To make it better, we use a Convolutional Neural Network. The concept for this comes from the same convolutional filters as used in basic image processing tasks like filtering. For example, if we take a $3 \times 3$ gaussian blur filter:
\[f = \frac{1}{16} \begin{bmatrix} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \end{bmatrix}\]It gets applied to all the pixels in the image. So the result of each new pixel is a convolution of this filter with the area of 3x3 around the pixel before.
A similar way to implement convolutional NN might help us with making recognition easier, especially on images of high resolution. For example, having a dense NN layer act on an image of $1280 \times 800$ would require 2 Million input points and the dense network will take too many resources. A better approach will be to use 1. A convolutional NN filter and 2. Reduce the input dimensions smartly. Convolutional filters are useful in extracting or suppressing features hence distinguishing them from each other, so they’re very useful for Computer Vision. Using multiple layers of convolution and then reduction will lead to a reduced input that can then be fed into a Dense layer much more easily.
Reduction is accomplished using the keras layer tool called MaxPooling
. What this does is, takes a $n \times n$ image section and compresses it to a pixel using hte max of the values.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
num_convolutions = 32
model = tf.keras.model.Sequential([
# First we use the convolutional layers
tf.keras.layers.Conv2D(num_convolutions, (3,3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Conv2D(num_convolutions, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Next the dense layers
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.summary()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_data, training_labels, epochs=10)
test_loss = model.evaluate(test_data, test_labels)
The value of num_convolutions
is purely arbitrary. It’s generally good to use powers of 2, starting from 32.
Running for more than required number of epochs might result in overfitting, so that it performs better on training data but worse on validation data.
In the model created above, we can see that increasing the number of convolutions improves the accuracy, increasing the number of layers of maxpool is decreasing the accuracy.
A great real world use case where TensorFlow has been used for example, comes from identification of particular diseased cassava plants from healthy vs unhealthy plant leafs. This helps get statistics about what is the ideal problem to solve to increase productivity.
## Tools for categorizing images
Real world images will contain color dimensions, the object might be in a portion and there’s a background data or other objects, the aspect ratios might be different, resolutions different etc. To help with all this, tensorflow provides a ImageDataGenerator
functionality, that can help create a well structured classification of images for training and validation. The subdirectories automatically provide the labels.