I’m a bit confused, because I read and see that Convolutional layers always outperform Densely-connected layers, and I don’t understand why.
Can anyone explain this?
Layers in a Convolutional neural network are invariant to object size and region because they process regions of the image in a step by step fashion. This is not the case with Densely-connected layers.
In other words, Conv layers learn to recognize parts of an image – without necessarily caring where these parts are in the image. This makes the network invariant to object region.
In addition to that, across multiple layers, the more downstream you get, the more general the representations of an input image are. This makes the network invariant to object size and even allows objects to vary somewhat (i.e., multiple types of cars being recognized as a car without having to specify one class for each car type).
Dense layers, instead, simply see the “whole” image. They can figure out that objects can have different sizes and be in different parts of the image. They do require many more training samples for that compared to ConvNets, though.
That’s why ConvNets generally perform better than Dense layers on computer vision problems.