Neat tricks

  • immediately async prefetch next batch while model is doing the forward pass on the GPU
  • Pinning the batches to memory in the prefetching function get_batch allows us to us to move them to GPU asynchronously (non_blocking=True) and in a faster way (pinning)
  • Flush the gradients as soon as we can (i.e. just after optimizer.step()), no need to hold this memory anymore

Loading extremely large files in memory when not enough RAM