What advantage does network depth (multiple hidden layers) offer over extreme width in a shallow network?

Answer

Deep architectures can represent many functions much more compactly, requiring fewer total parameters.

Depth provides a structural shortcut for learning compositional functions, allowing complex mappings to be achieved with significantly fewer total weights and biases compared to a very wide, shallow structure.

What advantage does network depth (multiple hidden layers) offer over extreme width in a shallow network?

#Videos

Why Neural Networks Can Learn Any Function - YouTube

functionalgorithmneural networkapproximation