Backpropagation renders gradient descent feasible

In forward-mode differentiation, computing the partial derivative of a single node in a computational graph requires computing PDs for all nodes in the graph, making gradient descent extremely expensive. In contrast, in reverse-mode differentiation, which supports backpropagation, computing PDs for all nodes is done in a single sweep, by starting with the PDs of the output nodes and working backwards.

Resources

Backlinks