Abstract: Stochastic gradient descent has been a widely used machine learning algorithm, and interest in low-precision SGD is growing because it improves throughput and keeps efficient convergence. We ...