PhD Student · Technical University of Munich

Hi, I'm Umut Kocasarı.

PhD Student at the Visual Computing & AI Lab, Technical University of Munich, advised by Prof. Matthias Nießner.

About

A short version.

I research feed-forward neural systems for 3D, currently focused on 4D face reconstruction and dense tracking from uncalibrated image sequences. The goal is to amortize 3D understanding into a single forward pass: fast, generalizable, and free of per-scene optimization.

Earlier, I worked on generative modeling and latent-space manipulation in GANs, discovering interpretable directions, text-driven editing, and structured ways to control image generation.

Before joining the Visual Computing & AI Lab, I completed my Bachelor's in Computer Engineering at Boğaziçi University and my Master's in Informatics at the Technical University of Munich.

  • 4D Face Reconstruction
  • Feed-forward 3D
  • Neural Rendering
  • Generative Modeling
Research

What I'm working on.

The threads I've worked on, past and present.

4D Face Reconstruction

Recovering temporally consistent face geometry and dense point-correspondences from in-the-wild image sequences. The focus of Face Anything.

Feed-forward 3D

Networks that predict 3D representations (meshes, point clouds, Gaussians, neural fields) directly from images, with no per-scene optimization. Generalization is the goal, speed is the bonus.

Generative Modeling

Earlier work on text-driven image generation and interpretable directions in GAN latent spaces, bridging editability with structured representations.

Selected work

Publications.

Selected papers below; see Google Scholar for the complete list.

  1. G3DST teaser
    GCPR 2024

    G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles

    Adil Meric, Umut Kocasarı, Matthias Nießner, Barbara Roessle

    Existing NeRF-based 3D style transfer methods need extensive per-scene or per-style optimization, which limits how broadly they can be applied. G3DST renders stylized novel views from a generalizable NeRF without any test-time fitting, sharing a single learned model across scenes and styles via a hypernetwork. We additionally introduce a flow-based multi-view consistency loss that keeps stylization stable as the camera moves. Visual quality matches per-scene methods while being dramatically faster and more broadly applicable.

  2. Fantastic Style Channels teaser
    WACV 2023

    Fantastic Style Channels and Where to Find Them: A Submodular Framework for Discovering Diverse Directions in GANs

    Enis Simsar, Umut Kocasarı, Ezgi Gulperi Er, Pinar Yanardag

    Existing approaches to discovering interpretable editing directions in StyleGAN2 (supervised, unsupervised, or manual) typically surface only a handful of usable directions. We design a submodular framework that selects the most representative and diverse subset of directions inside StyleGAN2's channel-wise style space, clustering channels that perform similar manipulations into groups. Diversity is encoded directly through cluster structure and the whole problem is solved efficiently with a greedy scheme. Quantitative and qualitative experiments show the method finds more diverse and disentangled directions than prior work.

  3. StyleMC teaser
    WACV 2022

    StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation

    Umut Kocasarı, Alara Dirik, Mert Tiftikci, Pinar Yanardag

    Discovering meaningful directions in GAN latent spaces usually requires either large labeled datasets or hours of preprocessing per manipulation. StyleMC is a fast text-driven image manipulation method that combines a CLIP-based loss with an identity-preservation loss to find stable global directions in StyleGAN2, taking only a few seconds of training per text prompt. There is no need for prompt engineering or per-prompt tuning, and the approach drops into any pretrained StyleGAN2 model. We outperform slower CLIP-based baselines while leaving non-target attributes intact.

  4. Rank in Style teaser
    CVPR-W 2022

    Rank in Style: A Ranking-based Approach to Find Interpretable Directions

    Umut Kocasarı, Kerem Zaman, Mert Tiftikci, Enis Simsar, Pinar Yanardag

    Recent CLIP-based StyleGAN editing methods such as StyleCLIP work well, but their success often hinges on careful manual prompt engineering. We propose an automatic ranking-based approach that scores candidate text-driven directions by combining two metrics: relevance (CLIP-space similarity to a target keyword) and editability (the magnitude of induced change). Given a pretrained StyleGAN model and a list of keywords, the framework outputs optimized prompts that reliably produce semantic edits without hand-tuning. Human evaluation finds it more semantically meaningful and disentangled than unsupervised baselines like SeFa and GANSpace.

  5. PaintInStyle teaser
    CVPR-W 2022

    PaintInStyle: One-Shot Discovery of Interpretable Directions by Painting

    Berkay Doner, Elif Sema Balcioglu, Merve Rabia Barin, Umut Kocasarı, Mert Tiftikci, Pinar Yanardag

    Most methods for discovering interpretable directions in GAN latent spaces require many labeled samples, classifiers, or careful annotation. PaintInStyle finds a reusable manipulation direction from a single painted edit, for example sketching a beard or painting red lipstick on one image. The method inverts both the original and edited image, isolates channels that affect only the painted region using segmentation, and averages across random samples to reduce noise. The result is a one-shot, annotation-free pipeline for directions that generalize to any image.

  6. Crowd-sourced Creativity teaser
    NeurIPS-W 2021

    Exploring Latent Dimensions of Crowd-sourced Creativity

    Umut Kocasarı, Alperen Bag, Efehan Atici, Pinar Yanardag

    Most prior work on interpretable GAN directions targets concrete semantic attributes (age, gender, expression, and so on). We instead study a more abstract property: creativity. Can an image be steered to look more or less creative? Building on Artbreeder, the largest crowd-sourced AI image platform, we explore the latent dimensions of images generated there and present a framework for manipulating creativity as a direction in latent space.

Contact

Let's talk.

Always happy to chat about feed-forward 3D, collaborations, or internships. Email is the fastest way.