Umut Kocasarı

I am a master’s student in the Informatics department at the Technical University of Munich. My research interests are machine learning and computer vision. In detail, I work on latent space manipulation in GANs, which could be used for various tasks such as face editing, generating the desired image, image restoration, and improving the quality of a low-resolution image.

I am very interested in the concepts of intelligence, learning, and memory. Most living beings do amazing things all the time without being aware of how complex their behaviors are. Their brains have evolved with the environment and other brains throughout millions of years. Now, we look at one of these products produced by this process, the human brain, and it is nearly impossible to understand how it works. However, trying to imitate how the brain perceives its environment via computers is one of the things we could do, and we are very good at it. Becoming a part of this process makes me very excited.

These are the reasons why my research interests are machine learning, computer vision, and artificial intelligence, in general. Up to now, I have worked on interpretations of GANs, how they learn the representations of images and how their latent spaces could be manipulated to produce the desired output.

You could see CV here.

Publications

WACV

Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs

Enis Simsar, Umut Kocasari, Ezgi Gulperi Er, and Pinar Yanardag

Winter Conference on Applications of Computer Vision (WACV) , 2023

Abstract PDF Project Page

The discovery of interpretable directions in the latent spaces of pre-trained GAN models has recently become a popular topic. In particular, StyleGAN2 has enabled various image generation and manipulation tasks due to its rich and disentangled latent spaces. The discovery of such directions is typically done either in a supervised manner, which requires annotated data for each desired manipulation, or in an unsupervised manner, which requires a manual effort to identify the directions. As a result, existing work typically finds only a handful of directions in which controllable edits can be made. In this paper, we attempt to find the most representative and diverse subset of directions in stylespace of StyleGAN2. We formulate the problem as a coverage of stylespace and propose a novel submodular optimization framework that can be solved efficiently with a greedy optimization scheme. We evaluate our framework with qualitative and quantitative experiments and show that our method finds more diverse and relevant channels.
CVPR Workshop

PaintInStyle: One-Shot Discovery of Interpretable Directions by Painting

Berkay Doner*, Elif Sema Balcioglu*, Merve Rabia Barin*, Umut Kocasari, Mert Tiftikci, and Pinar Yanardag

Computer Vision for Fashion, Art, and Design (CVPR Workshop) , 2022

Abstract Project Page

The search for interpretable directions in latent spaces of pre-trained Generative Adversarial Networks (GANs) has become a topic of interest. These directions can be utilized to perform semantic manipulations on the GAN generated images. The discovery of such directions is performed either in a supervised way, which requires manual annotation or pre-trained classifiers, or in an unsupervised way, which requires the user to interpret what these directions represent. Our goal in this work is to find meaningful latent space directions that can be used to manipulate images in a one-shot manner where the user provides a simple drawing (such as drawing a beard or painting a red lipstick) using basic image editing tools. Our method then finds a direction that can be applied to any latent vector to perform the desired edit. We demonstrate that our method is able to find several distinct and fine-grained directions in a variety of datasets.
CVPR Workshop

Rank in Style: A Ranking-based Approach to Find Interpretable Directions

Umut Kocasari, Kerem Zaman, Mert Tiftikci, Enis Simsar, and Pinar Yanardag

Computer Vision for Fashion, Art, and Design (CVPR Workshop) , 2022

Abstract Project Page

Recent work such as StyleCLIP aims to harness the power of CLIP embeddings for controlled manipulations. Although these models are capable of manipulating images based on a text prompt, the success of the manipulation often depends on careful selection of the appropriate text for the desired manipulation. This limitation makes it particularly difficult to perform text-based manipulations in domains where the user lacks expertise, such as fashion. To address this problem, we propose a method for automatically determining the most successful and relevant text-based edits using a pre-trained StyleGAN model. Our approach consists of a novel mechanism that uses CLIP to guide beam-search decoding, and a ranking method that identifies the most relevant and successful edits based on a list of keywords. We also demonstrate the capabilities of our framework in several domains, including fashion.
WACV

StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation

Umut Kocasari, Alara Dirik, Mert Tiftikci, and Pinar Yanardag

Winter Conference on Applications of Computer Vision (WACV) , 2022

Abstract PDF Code Project Page

Pre-trained GANs have shown great potential for interpretable directions in the latent space. The discovery of such directions is often done in a supervised or self-supervised manner and requires manual annotations which limits their application in practice. On the other hand, unsupervised approaches provide a way to discover interpretable directions without any supervision, but no fine-grained attribute can be discovered. Recent work such as StyleCLIP aims to overcome this limitation by leveraging the power of CLIP, a joint representational model for text and images, for text-driven image manipulation. While promising, these methods take several hours of pre-processing or training time, and require multiple text prompts. In this work, we propose a fast and efficient method for text-guided image generation and manipulation by leveraging the power of StyleGAN2 and CLIP. Our method uses a CLIP-based loss and an identity loss to manipulate images via user-supplied text prompts without changing any of the irrelevant attributes. Unlike previous work, our method requires only 12 seconds of optimization per text prompt and can be used with any pre-trained StyleGAN2 model. We demonstrate the effectiveness of our method with extensive results and comparisons to state-of-the-art approaches.
NeurIPS Workshop

Exploring Latent Dimensions of Crowd-sourced Creativity

Umut Kocasari*, Alperen Bag*, Efehan Atici, and Pinar Yanardag

Machine Learning for Creativity and Design (NeurIPS Workshop) , 2021

Abstract PDF Code Project Page

Recent research showed that it is possible to find directions in the latent spaces of pre-trained GANs. These directions provide controllable generation and support a wide range of semantic editing operations such as zoom-in or rotation. While existing works focus on discovering directions for semantic image editing, we focus on an abstract property: Creativity. Can we manipulate an image to make it more or less creative? We build our work on the largest AI-based creativity platform Artbreeder where users are able to generate unique images using pre-trained GAN models. We explore the latent dimensions of the images generated on this platform and present a novel framework for manipulating images to make them more creative.