Mask-Guided Discovery of Semantic Manifolds in Generative Models
Mengyu Yang, David Rokeby, Xavier Snelgrove: 2020-10-09
accepted for the Workshop on Machine Learning for Creativity and Design (NeurIPS), 2020
Advances in the realm of Generative Adversarial Networks (GANs) have led to architectures capable of producing amazingly realistic images such as StyleGAN2 which, when trained on the FFHQ dataset, generates images of human faces from random vectors in a lower-dimensional latent space. Unfortunately, this space is entangled – translating a latent vector along its axes does not correspond to a meaningful transformation in the output space (e.g., smiling mouth, squinting eyes). The model behaves as a black box providing neither control over its output nor insight into the structures it has learned from the data.
However, the smoothness of the mappings from latents to faces plus empirical evidence suggest that manifolds of meaningful transformations are in fact hidden inside the latent space but obscured by not being axis-aligned or even linear. Travelling along these manifolds would provide puppetry-like abilities to manipulate faces while studying their geometry would provide insight into the nature of the face variations present in the dataset – revealing and quantifying the degrees-of-freedom of eyes, mouths, etc.
We present a method to explore the manifolds of changes of spatially localized regions of the face. Our method discovers smoothly varying sequences of latent vectors along these manifolds suitable for creating animations. Unlike existing disentanglement methods that either require labelled data or explicitly alter internal model parameters, our method is an optimization-based approach guided by a custom loss function and manually defined region of change.
Reaching Through the Screen: Exploring real-time video processing to allow for multi-person interaction within Zoom video conferencing sessions
David Rokeby: 2020-06-12
The COVID-19 lockdown has forced us all into a situation where most of our contact with people outside of our families is over video conferencing software such as Zoom. While these systems have been of great utility in the current situation, they have created a situation where most of our social engagement is narrowly framed through video and sound on flat screens with bad speakers. We are exploring ways to reach through these screens to have different kinds of engagements with each other within these frames. Using real-time screen capture, we route the whole zoom screen to real-time video processing software where we can separate the group of images into individual video streams. These individual streams can be analyzed and processed in real-time in various ways to create alternative video and audio experiences. Preliminary experiments include: collaging all participants into a single shared screen where they can engage and collaborate together visually, or giving each participant control over a part of a collective improvisation either through controlling sounds contributing to a shared sound scape, or by giving each control over one joint in a computer animated puppet that all can see through Zoom screen-sharing.
Guided Text Generation Tools for Performance
David Rokeby: 2020-03-24
Recent advances in text generation using Transformer architectures, like OpenAI’s GPT-2 offer new possibilities for the generation of text for creative applications. The simplest examples involve processes like fine-tuning a GPT-2 model on the works of Shakespeare to create a system that will generate properly structured and surprising coherent alternative shakespeare-ish texts. But the utility of this method is limited by the standard mode of usage of these systems where the system is given a text as a prompt and then proceeds to suggest next words in sequence; once the prompt is given, the transformer generates without further interaction. We are developing tools for exploring, visualizing and manipulating GPT-2 models to discover approaches that make it possible to guide or direct the output of transformers in order to allow for more sustained engagement beyond the choice of an initial prompt.
Voxel-based Mapping of Space for Performance
David Rokeby: 2019-11-20
Depth cameras offer enormous potential for interactive performance. Most approaches use skeleton tracking to map joints into 3-d space. While this is effective for many applications, the skeleton data tends to be noisy and fragile. Extracting clean motion information is challenging. Mapping the 3-d point-cloud into voxel space can yield must cleaner and more stable motion information the can be used to provide movement-based interaction possibilities for performance.
Learning Mappings Between Spaces for Creative Control
Xavier Snelgrove: 2019-10-20
We propose a research program to use machine learning techniques to find alignments between the geometric structures on manifolds of user input (e.g. pose, voice, hand gesture, etc.) and manifolds of generative models (e.g. face generating GANs, vocal synthesis, etc.). We hypothesize that this may allow for more intuitive control of generative models, with particular application in live creative performance