exploding_n_vanishing

Opinions:

  1. This two problems becomes worth as the number of layers in the architecture increases.
  • vanishing gradients
    假设有一个N层的DNN模型,那么y可以

y = Wn * Wn-1 * Wn-2 W1 * X