Xingrui (Ryan) Wang

I build AI systems that see, hear, and reason about the 3D world.

I am a PhD student in the Computer Science Department at Johns Hopkins University, advised by Prof. Alan Yuille. My research focuses on AI systems with 3D spatial reasoning and multimodal understanding, aligning 3D/4D knowledge with language models and fusing audio-visual modalities for both generation and understanding. Ultimately, I strive to develop AI systems that can truly understand and engage with the 4D physical world.

Before JHU, I obtained my M.S. from the University of Southern California, where I worked with Prof. Laurent Itti. Prior to USC, I received my B.S. in Statistics from Renmin University of China, supervised by Prof. Hanfang Yang. I have also conducted research internships with the GenAI group at AMD and at Samsung R&D Institute.

I am actively seeking collaborations with undergraduate and Master's students who share similar research interests. Feel free to reach out at xwang378@jhu.edu.

News

Feb 2026

Captain Safari accepted at CVPR 2026.
Jan 2026

XModBench accepted at ICLR 2026. Code release coming soon.
Oct 2025

Heading to ICCV 2025 🏖️ to present at the GEN4AVM Workshop.
Sep 2025

SpatialReasoner accepted at NeurIPS 2025.
Jun 2025

Invited talk at BEAM Workshop, CVPR 2025. (slides)
Feb 2025

Spatial457 accepted at CVPR 2025 as a highlight paper.
Jan 2025

One paper accepted at ICLR 2025.
May 2024

Introduced NS-4DPhysics and DynSuperCLEVR: a 4D neural symbolic model and benchmark for dynamic spatial reasoning.
Sep 2023

One paper accepted at NeurIPS 2023.
Aug 2023

Joined CCVL at JHU as a PhD student.
Mar 2023

SuperCLEVR selected as a highlight paper at CVPR 2023.
Feb 2023

One paper accepted at CVPR 2023.
2022

One paper accepted at ECCV 2022.

Selected Publications

Full list on Google Scholar

Research Experience

Advanced Micro Devices (AMD) — Research Intern

Jun 2024 – Sep 2025

GenAI Group · Remote, US · Advisor: Dr. Jiang Liu

Built a video generation diffusion model conditioned on audio and image inputs for dynamic motion synthesis.
Evaluated temporal alignment between given audio and the generated video sequences.

→ KeyVID (Arxiv 2025) · XModBench (ICLR 2026)

University of Southern California — Research Assistant

Aug 2021 – May 2023

iLab · Los Angeles, CA · Advisor: Prof. Laurent Itti

Designed and implemented a human-centric computer vision system that decompose visual reasoning into shape, color, and texture components.
Knowledge interaction between human and interpretable CV systems using graph neural networks.

→ HVE (ECCV 2022) · HNI

Samsung R&D Institute China-Beijing (SRC-B) — Research Intern

Dec 2020 – Jun 2021

Embodied AI & Reinforcement Learning · Beijing, China · Advisor: Dr. Yang Liu

Proposed a method combining language hints with object template matching for human-guided reinforcement learning, enhancing learning efficiency.
Developed a solution for the ALFRED benchmark integrating instance segmentation and depth estimation to ground object positions on a bird's-eye-view map and generate language-guided navigation paths.

→ Embodied AI @ CVPR 2021

Peking University — Research Assistant

Sep 2019 – Feb 2020

Semantic Segmentation · Beijing, China · Advisor: Prof. Yongtao Wang

Focused on multi-scale feature learning methods for improving semantic segmentation performance.

Service

Conference Reviewer

Computer Vision

CVPR ICCV ECCV WACV BMVC

Machine Learning

NeurIPS ICLR ICML

Journal Reviewer

TMLR

Workshop Organizer

AdvML @ CVPR 2025

Teaching

EN.601.673 — Cognitive Artificial Intelligence

Fall 2024

Teaching Assistant · Johns Hopkins University · Instructor: Prof. Tianmin Shu

DSCI 552 — Machine Learning for Data Science

Spring 2022 - Spring 2023

Course Producer · University of Southern California · Instructor: Prof. Mohammad Reza Rajati