Why RL?

  • Can you explain why we do RL sometimes and supervised other times ? What’s the fundamental difference between the two, and which one fits in what situations?
    • Common things:
      • a model acting given an input
        • supervised: model f, acting on input x, outputs
        • RL: policy p, acting on state s, outputs action a
    • Different
      • labels:
        • supervised: for input x true label y directly associated
        • RL: for state s there is no “true” action a
          • there is a reward associated to a rollout of state and actions, and this reward may arise sooner or later

Why Deep Learning works now

  • dead neurons can occur e.g. all incoming weights and biases to a RELU neuron are negative, and the inputs are all positive. Then, the RELU neuron will never learn and the weights will never update.
    • see vanishing gradients
  • Reasons why we don’t need to be as careful as before about correct inits and vanishing/exploding gradients
    • Residual connections
    • Normalization layers (BatchNorm, LayerNorm)
    • Better Optimizers (Adam, …