Neat tricks
- immediately async prefetch next batch while model is doing the forward pass on the GPU
- Pinning the batches to memory in the prefetching function
get_batch
allows us to us to move them to GPU asynchronously (non_blocking=True) and in a faster way (pinning) - Flush the gradients as soon as we can (i.e. just after optimizer.step()), no need to hold this memory anymore
Loading extremely large files in memory when not enough RAM
- use
numpy.memmap
to read a file on disk, and treat it as if it were on RAM. Just need to chunk the data previously and write into a binary file or .txt - Pinned memory, helps with data transfer times when calling x.cuda() (https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/)