TL; DR

This post goes through transformer based architectures in various novel applications.

  • Transformer for End-to-End Object Detection - 🔗 Zhu et al. (2021)
  • Transformer for 3D Object Detection - 🔗 Bhattacharyya et al. (2021)
  • Transformer for Multi-Object Tracking - 🔗 Sun et al. (2020)
  • Transformer for Lane Shape Prediction - 🔗 Liu et al. (2020)
  • Transformer for Vision-Language Modeling - 🔗 Zhang et al. (2021)
  • Transformer for Image Synthesis - 🔗 Esser et al. (2020)
  • Transformer for Music Generation - 🔗 Hsiao et al. (2021)
  • Transformer for Dance Generation with Music - 🔗 Huang et al. (2021)
  • Transformer for Point-Cloud Processing - 🔗 Guo et al. (2020)
  • Transformer for Time-Series Forecasting - 🔗 Lim et al. (2020)

When it comes to training deep learning models, the I/O storage capacity and transfer bandwidth are ususally the bottleneck. While HDF5 is efficient to load the entire dataset for training, it is limited to the system memory capacity, typically up to hunderds of GB. On the other hand, memory mampped (MMAP) storage allows data access beyond system memory constraints. One popular implementation is LMDB, providing numerous language bindings including Python. It is tempting to replace HDF5 with LMDB for super large dataset loading and access. When running locally, potential memory allocation issues may not emerge to the surface as modern computer systems support disk swap space in case the process consumes more than available memory. However, training deep models could take days or longer and it is not uncommon to set up the training job in a high performance coomputing cluster such as SLURM. To submit a job to the SLURM cluster, the memory usage must be specified beforehand and at runtime, the memory usage is monitored according to the Resident Set Size (RSS) statistics. Unfortunately, the same training process on SLRUM is now subject to out of memory error because the swap space may not be available for SLURM tasks and MMAP based LMDB may grow the RSS over time beyond the usage limit. The following pytest snippet demonstrates the task RSS is increasing with LMDB access to a huage dataset over epochs: