# Spatial Transformer Networks

@inproceedings{Jaderberg2015SpatialTN, title={Spatial Transformer Networks}, author={Max Jaderberg and Karen Simonyan and Andrew Zisserman and Koray Kavukcuoglu}, booktitle={NIPS}, year={2015} }

Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. [...] Key Method This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the… Expand

#### Supplemental Content

Github Repo

Via Papers with Code

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Presentation Slides

#### Paper Mentions

#### 4,142 Citations

Spatial Transformations in Deep Neural Networks

- Computer Science
- 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)
- 2018

This paper introduces the end-to-end system that is able to learn spatial invariance including in-plane and out-of-plane rotations and shows that it can successfully improve the classification score by implementing so-called Spatial Transformer module. Expand

Deep Diffeomorphic Transformer Networks

- Computer Science
- 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018

This work investigates the use of flexible diffeomorphic image transformations within neural networks and demonstrates that significant performance gains can be attained over currently-used models. Expand

A Refined Spatial Transformer Network

- Computer Science
- ICONIP
- 2018

Experimental results reveal that the design of a module to estimate the difference between the ground truth and STN output outperforms the state-of-the-art methods in the cluttered MNIST handwritten digits classification task and planar image alignment task. Expand

Studying Invariances of Trained Convolutional Neural Networks

- Computer Science
- ArXiv
- 2018

A new learnable module, the Invariant Transformer Net, is introduced, which enables us to learn differentiable parameters for a set of affine transformations, which allows us to extract the space of transformations to which the CNN is invariant and its class prediction robust. Expand

Volumetric Transformer Networks

- Computer Science
- ECCV
- 2020

This work proposes a loss function defined between the warped features of pairs of instances, which improves the localization ability of VTN and consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval. Expand

SPATIAL TRANSFORMATIONS

- 2017

Convolutional Neural Networks (CNNs) are extremely efficient, since they exploit the inherent translation-invariance of natural images. However, translation is just one of a myriad of useful spatial… Expand

DeSTNet: Densely Fused Spatial Transformer Networks

- Computer Science
- BMVC
- 2018

This paper proposes Densely Fused Spatial Transformer Network (DeSTNet), which, to the best knowledge, is the first dense fusion pattern for combining multiple STNs, and shows how changing the connectivity pattern of multipleSTNs from sequential to dense leads to more powerful alignment modules. Expand

DeSTNet : Densely Fused Spatial Transformer Networks 1

- 2018

Modern Convolutional Neural Networks (CNN) are extremely powerful on a range of computer vision tasks. However, their performance may degrade when the data is characterised by large intra-class… Expand

Exploiting Cyclic Symmetry in Convolutional Neural Networks

- Computer Science
- ICML
- 2016

This work introduces four operations which can be inserted into neural network models as layers, andWhich can be combined to make these models partially equivariant to rotations, and which enable parameter sharing across different orientations. Expand

Warped Convolutions: Efficient Invariance to Spatial Transformations

- Computer Science
- ICML
- 2017

This work presents a construction that is simple and exact, yet has the same computational complexity that standard convolutions enjoy, consisting of a constant image warp followed by a simple convolution, which are standard blocks in deep learning toolboxes. Expand

#### References

SHOWING 1-10 OF 46 REFERENCES

Transforming Auto-Encoders

- Computer Science, Mathematics
- ICANN
- 2011

It is argued that neural networks can be used to learn features that output a whole vector of instantiation parameters and this is a much more promising way of dealing with variations in position, orientation, scale and lighting than the methods currently employed in the neural networks community. Expand

Deep Symmetry Networks

- Computer Science, Mathematics
- NIPS
- 2014

Deep symmetry networks (symnets), a generalization of convnets that forms feature maps over arbitrary symmetry groups that uses kernel-based interpolation to tractably tie parameters and pool over symmetry spaces of any dimension are introduced. Expand

Locally Scale-Invariant Convolutional Neural Networks

- Computer Science
- ArXiv
- 2014

A simple model is presented that allows ConvNets to learn features in a locally scale-invariant manner without increasing the number of model parameters, and is shown on a modified MNIST dataset that when faced with scale variation, building in scale-Invariance allows Conv net to learn more discriminative features with reduced chances of over-fitting. Expand

Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks

- Computer Science
- 2015 IEEE International Conference on Computer Vision (ICCV)
- 2015

An approach is presented that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning, to find constellations of neural activation patterns computed using convolutional neural networks. Expand

Very Deep Convolutional Networks for Large-Scale Image Recognition

- Computer Science
- ICLR
- 2015

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand

Deep Networks with Internal Selective Attention through Feedback Connections

- Computer Science
- NIPS
- 2014

DasNet harnesses the power of sequential processing to improve classification performance, by allowing the network to iteratively focus its internal attention on some of its convolutional filters. Expand

Going deeper with convolutions

- Computer Science
- 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015

We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition… Expand

Efficient and accurate approximations of nonlinear convolutional networks

- Computer Science, Mathematics
- 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs), and takes the nonlinear units into account, subject to a low-rank constraint which helps to reduce the complexity of filters. Expand

Bilinear CNN Models for Fine-Grained Visual Recognition

- Computer Science
- 2015 IEEE International Conference on Computer Vision (ICCV)
- 2015

We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an… Expand

Two-Stream Convolutional Networks for Action Recognition in Videos

- Computer Science
- NIPS
- 2014

This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Expand