Israel D Gebru
I'm a Research Scientist at Meta Reality Labs Research (RLR) in Pittsburgh, PA.
My research focuses on signal processing, machine learning, and multimodal learning for immersive experience in virtual and
augmented reality applications. Currently, I work on human motion generative models for driving Codec Avatars using various input modalities,
including natural language descriptions (text), speech and audio signals, poses from images, and Audio-Visual Language Models (AVLM).
At RLR - Codec Avatars Labs, I've developed streaming neural audio codecs and neural rendering models for spatial audio,
room acoustics, and audio-visual novel-view 6DoF ambient sound generation. These works has significantly improved the audio-visual experience in virtual reality (VR) applications.
I've also developed several spatial audio quality metrics and created neural rendering techniques
to transform [higher-order] ambisonic recordings into binaural audio.
Previously, I was a PhD student at INRIA-Grenoble working with Dr. Radu Patrice Horaud. During my PhD, I was a visiting researcher at Imperial College London and interned at Meta Reality Labs (then Oculus Research). I obtained my PhD in Mathematics and
Computer Science from INRIA & Université Grenoble-Alpes, France, in 2018. Prior to that, I received a Master's degree in Telecommunication Engineering from the University of Trento, Italy, in 2013.
Email /
CV /
Scholar /
Github /
LinkedIn
|
|
Publications
|
Last update: March, 2025 |
|
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla, Christian Richardt, Dejan Markovic, Steven Krenn, Todd Keebler, Jacob Sandakly, Alexander Richard, Eli Shlizerman
CVPR, 2025
SoundVista is a novel method for synthesizing binaural ambient sound from arbitrary viewpoints in a scene. It leverages pre-acquired audio recordings and panoramic visual data to generate spatially accurate sound without requiring detailed knowledge of individual sound sources.
paper |
code
|
|
Real acoustic fields: An audio-visual room acoustics dataset and benchmark
Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard
CVPR, 2024   (Highlight)
We present the Real Acoustic Fields (RAF) dataset that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms.
paper |
dataset |
project page
|
|
Spatialization Quality Metric for Binaural Speech
Pranay Manocha, Israel D. Gebru, Anurag Kumar, Dejan Markovic, Alexander Richard
Interspeech, 2023
We introduces a novel objective metric designed to evaluate the spatialization quality (SQ) between pairs of binaural audio signals,
independent of speech content and signal duration.
paper
|
|
SAQAM: Spatial audio quality assessment metric
Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K Ithapu, Paul Calamia
Interspeech, 2022
We introduces a novel metric for evaluating both listening quality (LQ) and spatialization quality (SQ)
between pairs of binaural audio signals without relying on subjective data.
paper
|
|
End-to-end binaural speech synthesis
Wen Chin Huang, Dejan Markovic, Israel D. Gebru, Alexander Richard, Anjali Menon
Interspeech, 2022
This work presents an End-to-end framework that integrates a low-bitrate audio codec with a binaural audio decoder to accurately synthesize spatialized speech.
We used a modified vector-quantized variational autoencoder, trained with carefully designed objectives, including an adversarial loss, to improve the authenticity of the generated audio.
paper |
project page
|
|
Neural synthesis of binaural speech from mono audio
Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh
ICLR, 2021   (won outstanding paper award)
This work presents a novel neural rendering approach for real-time binaural sound synthesis. The proposed network converts single-channel audio into two-channel binaural sound,
conditioned on the listener's relative position and orientation to the source.
paper |
code |
project page
|
|
Implicit hrtf modeling using temporal convolutional networks
Israel D. Gebru, Dejan Marković, Alexander Richard, Steven Krenn, Gladstone A Butler, Fernando De la Torre, Yaser Sheikh
ICASSP, 2021
This work introduces a data-driven approach to implicitly learn Head-Related Transfer Functions (HRTFs) using neural networks. Traditional HRTF measurement methods are often cumbersome, requiring listener-specific recordings at numerous spatial positions within anechoic chambers.
In contrast, this study proposes capturing data in non-anechoic environments and employing a neural network to model HRTFs.
paper |
dataset
|
Modified version of template from here and here.
|
|