Computer Vision and CNNs

Computer Vision and CNNs

Have you ever tried a computer vision project? You must've used a CNN right? And if you used a regular neural net, you must know that it sucks at doing computer vision.

But why is it so? And what is a CNN?

I'll try to answer that here.

Why CNN?


In traditional machine learning, where we work with data analysis, there are a limited number of features. And our neural networks are good at analysing those features and making predictions based on it.

But that is not the case in computer vision. In computer vision, we deal with images, and there maybe 1000 features corresponding to one image. And this is where our traditional neural networks fail.

CNN or Convolutional Neural Nets come to our rescue here. These type of neural networks are designed to work with n number of features. The convolutional neural nets (As the name suggests) perform an operation called Convolution on the image to work so efficiently.

Let's Talk about the convolution operation.

The Convolution Operation


Images are formed of pixels, and each pixel has a certain color. This color has a color encoding which ranges from 0 to 255. This encoding is also known as ByteEncoding and the Byte Encoding of an image is known as it's Byte-Array.

In the convolution operation, a Byte-array of an image is multiplied with a kernel array which yields a new byte array with the desired filter applied on that image. What is the use of applying a filter to an image you ask?

Well, there is a lot of noise in an image along with some valuable information, by applying the right filters, we can augment the valuable information and reduce the noise. con_operation.png

This valuable information then can further be used for extracting features from the image.

Convolution.jpeg

The Pooling Operation


Along with convolution, we also apply an operation called Pooling. Let's know what pooling is...

Images can range from a single-byte to couple megabytes, and most of data in an image is just noise, there is very little information to extract features from. So, after applying convolution on an image, we apply pooling on that image.

What the pooling layer does is that it reduces the size of an image while trying to keep the information intact. This helps you save training time and resources.

Conclusion


The CNN model uses the operations Convolution and Pooling and that is why they are preferred over traditional neural nets.

That was my take on Convolutional Neural Nets, if you have any suggestions, please write below :) Meanwhile please follow Me on Twitter and LinkedIn too.