MehdiNoroozi

Mehdi Noroozi

Bosch Center for Artificial Intelligence

Abstract

The complexity of an information processing task is intimately tied to the way we represent the information. One can think of the feedforward neural networks as representation learning frameworks. Specifically, the role of the early layers is to provide the last layers with a representation space in which the training task is easy to solve. It turns out that the representations learned via object classification task on large scale annotated datasets is applicable to a variety of computer vision tasks like object detection, VQA, etc. However, the required annotations for the training are costly, time consuming, and prone to error. To mitigate this problem, we can learn similar representations in a self- supervised learning setting. The idea is to train a network on some pseudo tasks that employs freely available supervision signals obtained from underlying structure of the data. Starting from 2015, the hope around self-supervised learning outperforming supervised learning has been so strong that one of the brilliant computer vision researchers lost a gelato bet. As a scientist he did not make a random bet though, just needed to give 4 more years to the community. The ImageNet classification pre-training has been outperformed in 2020 by several methods on some challenging benchmarks. In this talk, we will review the main explored directions of self-supervised image and video representation learning in the past 5 years, and will extensively discuss the recent contrastive learning based methods.

Bio

Mehdi is a research scientist at the Bosch Center for Artificial Intelligence. He is interested in computer vision and machine learning in general. More specifically, He likes the approaches that require large scale unlabeled datasets to learn about computer vision tasks. He received his PhD at the University of Bern. His thesis was about self-supervised visual representation learning where the goal is to make the computers understand the visual world from unlabeled data without having human annotations. He did his Masters in computer Science at Sharif University of Technology.