How one might learn structured models of objects and their parts from images is a fundamental long-standing question in computer vision and visual perception. It is particularly interesting in the context of unsupervised learning for parts discovery and structured object descriptions from images. We observe that objects are composed of a set of geometrically organized parts. Based on this, we introduce a neural network architecture, capsule networks, which explicitly uses geometric relationships between parts to reason about objects. Capsule networks are designed to parse an image into a hierarchy of objects, parts and relations. Since these relationships do not depend on a viewpoint, our model gains robustness to viewpoint changes. In a capsule network. A capsule network forms capsules by grouping the neurons in each layer. The goal is for neurons of a single capsule to coherently explain a single part. Then by measuring agreement between these capsules and clustering them, they infer the objects and the hierarchy between them. We have proposed several variants of capsule networks with different clustering and image parsing mechanisms. In this talk we will discuss the motivation, the underlying idea, and focus on an unsupervised capsule model which achieves state of the art of several tasks.
Sara Sabour is a research scientist at Google Brain. She has done her graduate studies under supervision of Prof. Fleet at University of Toronto. Prior to that, she graduated from Sharif university of Technology. Sara's research interests are focused on Machine Learning and Computer Science. Most notably, she and Prof. Hinton have been working on a next generation of neural networks called Capsule Networks. Their work has been reviewed by several media outlets, including a New York Times article.