Common benchmarks like ResNet-50 generally have much higher throughput with large batch sizes than with batch size =1. For example, the Nvidia Tesla T4 has 4x the throughput at batch=32 than when it ...
Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3×3 convolutions across megapixel images ...