Backward Feature Correction: How can Deep Learning perform Deep Learning
How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this learning process in terms of hierarchical learning. We refer to hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a …