Convolutional Neural Networks

Cameron Will
Published 
June 12, 2020
C

onvNets happen to have an interesting background story. They were first developed by Yann LeCun et al. in 1980’s (building on some earlier work, e.g. from Fukushima). As a fun early example see this demonstration of LeNet 1 (that was the ConvNet’s name) recognizing digits back in 1993. However, these models remained mostly ignored by the Computer Vision community because it was thought that they would not scale to “real-world” images. That turned out to be only true until about 2012, when we finally had enough compute (in form of GPUs specifically, thanks NVIDIA) and enough data (thanks ImageNet) to actually scale these models, as was first demonstrated when Alex Krizhevsky, Ilya Sutskever and Geoff Hinton won the 2012 ImageNet challenge (think: The World Cup of Computer Vision), crushing their competition (16.4% error vs. 26.2% of the second best entry).

Under the hood

So how do they work? When you peek under the hood you’ll find a very simple computational motif repeated over and over. The gif below illustrates the full computational process of a small ConvNet:

CNN

On the left we feed in the raw image pixels, which we represent as a 3-dimensional grid of numbers. For example, a 256x256 image would be represented as a 256x256x3 array (last 3 for red, green, blue). We then perform convolutions, which is a fancy way of saying that we take small filters and slide them over the image spatially. Different filters get excited over different features in the image: some might respond strongly when they see a small horizontal edge, some might respond around regions of red color, etc. If we suppose that we had 10 filters, in this way we would transform the original (256,256,3) image to a (256,256,10) “image”, where we’ve thrown away the original image information and only keep the 10 responses of our filters at every position in the image. It’s as if the three color channels (red, green, blue) were now replaced with 10 filter response channels (I’m showing these along the first column immediately on the right of the image in the gif above).

Now, I explained the first column of activations right after the image, so what’s with all the other columns that appear over time? They are the exact same operation repeated over and over, once to get each new column. The next columns will correspond to yet another set of filters being applied to the previous column’s responses, gradually detecting more and more complex visual patterns until the last set of filters is computing the probability of entire visual classes (e.g. dog/toad) in the image. Clearly, I’m skimming over some parts but that’s the basic gist: it’s just convolutions from start to end.

Visualizing what ConvNets learn

Several approaches for understanding and visualizing Convolutional Networks have been developed in the literature, partly as a response the common criticism that the learned features in a Neural Network are not interpretable. In this section we briefly survey some of these approaches and related work.

Visualizing the activations and first-layer weights

Layer Activations. The most straight-forward visualization technique is to show the activations of the network during the forward pass. For ReLU networks, the activations usually start out looking relatively blobby and dense, but as the training progresses the activations usually become more sparse and localized. One dangerous pitfall that can be easily noticed with this visualization is that some activation maps may be all zero for many different inputs, which can indicate dead filters, and can be a symptom of high learning rates.

filter


Typical-looking activations on the first CONV layer (left), and the 5th CONV layer (right) of a trained AlexNet looking at a picture of a cat. Every box shows an activation map corresponding to some filter. Notice that the activations are sparse (most values are zero, in this visualization shown in black) and mostly local.

Conv/FC Filters. The second common strategy is to visualize the weights. These are usually most interpretable on the first CONV layer which is looking directly at the raw pixel data, but it is possible to also show the filter weights deeper in the network. The weights are useful to visualize because well-trained networks usually display nice and smooth filters without any noisy patterns. Noisy patterns can be an indicator of a network that hasn’t been trained for long enough, or possibly a very low regularization strength that may have led to overfitting.

Filters


Typical-looking filters on the first CONV layer (left), and the 2nd CONV layer (right) of a trained AlexNet. Notice that the first-layer weights are very nice and smooth, indicating nicely converged network. The color/grayscale features are clustered because the AlexNet contains two separate streams of processing, and an apparent consequence of this architecture is that one stream develops high-frequency grayscale features and the other low-frequency color features. The 2nd CONV layer weights are not as interpretable, but it is apparent that they are still smooth, well-formed, and absent of noisy patterns.

Join Our Weekly
Newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Speech bubbles
Join Our Amazing
Community
THere's More

Post You mIght Also Like

All Posts
Sep
11
//
2020

Create fun and a little weirdness:

Easy Ways to Create a Fun Workplace
All Posts
Aug
17
//
2020

Embrace Change

One of our core values at Neslo Tech is to embrace and drive change. Not only is change essential for corporate growth it is a necessary skill for workers in 2020.
All Posts
Apr
1
//
2020

Setting up your environment to maximise productivity.

Working from home can be difficult when your space is undedicated, cluttered or unwelcoming. Especially when your warm comfy bed is crying out your name.
All Posts
Mar
20
//
2020

Is working remotely the future?

Working remotely is becoming increasingly common and growing faster than most realise. People from different age groups and professions are becoming more and more nomadic and aware of the benefits of choosing a place to live that fits their personal and professional needs best.
All Posts
Mar
6
//
2020

Simple strategies to get a better click-through rate on your ads.

One of the most heart-wrenching problems associated with online advertising has to be ad fatigue. Internet users have become a lot savvier and don’t like ads anymore because they are ‘intrusive’ and ‘annoying’.