Can you explain why we do RL sometimes and supervised other times ? What’s the fundamental difference between the two, and which one fits in what situations?
Common things:
a model acting given an input
supervised: model f, acting on input x, outputs y
RL: policy p, acting on state s, outputs action a
Different
labels:
supervised: for input x → true label y directly associated
RL: for state s → there is no “true” action a
there is a reward r associated to a rollout of state and actions, and this reward may arise sooner or later
Why Deep Learning works now
dead neurons can occur e.g. all incoming weights and biases to a RELU neuron are negative, and the inputs are all positive. Then, the RELU neuron will never learn and the weights will never update.
see vanishing gradients
Reasons why we don’t need to be as careful as before about correct inits and vanishing/exploding gradients