Abstract: 3D convolution networks play an important role in extracting spatiotemporal features in video action recognition. However, it usually brings a large number of paramters, which results in ...