I build AI systems that see, hear, and reason about the 3D world.

I am a PhD student in the Computer Science Department at Johns Hopkins University, advised by Prof. Alan Yuille. My research focuses on AI systems with 3D spatial reasoning and multimodal understanding, aligning 3D/4D knowledge with language models and fusing audio-visual modalities for both generation and understanding. Ultimately, I strive to develop AI systems that can truly understand and engage with the 4D physical world.

Before JHU, I obtained my M.S. from the University of Southern California, where I worked with Prof. Laurent Itti. Prior to USC, I received my B.S. in Statistics from Renmin University of China, supervised by Prof. Hanfang Yang. I have also conducted research internships with the GenAI group at AMD and at Samsung R&D Institute.

I am actively seeking collaborations with undergraduate and Master's students who share similar research interests. Feel free to reach out at xwang378@jhu.edu.
News
  • Jan 2026
    XModBench accepted at ICLR 2026. Code release coming soon.
  • Oct 2025
    Heading to ICCV 2025 🏖️ to present at the GEN4AVM Workshop.
  • Sep 2025
    SpatialReasoner accepted at NeurIPS 2025.
  • Jun 2025
    Invited talk at BEAM Workshop, CVPR 2025. (slides)
  • Feb 2025
    Spatial457 accepted at CVPR 2025 as a highlight paper.
  • Jan 2025
    One paper accepted at ICLR 2025.
Selected Publications

Full list on Google Scholar

Research Experience
Advanced Micro Devices (AMD) — Research Intern
Jun 2024 – Sep 2025
GenAI Group · Remote, US · Advisor: Dr. Jiang Liu
  • Built a video generation diffusion model conditioned on audio and image inputs for dynamic motion synthesis.
  • Evaluated temporal alignment between given audio and the generated video sequences.
KeyVID (Arxiv 2025) · XModBench (ICLR 2026)
University of Southern California — Research Assistant
Aug 2021 – May 2023
iLab · Los Angeles, CA · Advisor: Prof. Laurent Itti
  • Designed and implemented a human-centric computer vision system that decompose visual reasoning into shape, color, and texture components.
  • Knowledge interaction between human and interpretable CV systems using graph neural networks.
HVE (ECCV 2022) · HNI
Dec 2020 – Jun 2021
Embodied AI & Reinforcement Learning · Beijing, China · Advisor: Dr. Yang Liu
  • Proposed a method combining language hints with object template matching for human-guided reinforcement learning, enhancing learning efficiency.
  • Developed a solution for the ALFRED benchmark integrating instance segmentation and depth estimation to ground object positions on a bird's-eye-view map and generate language-guided navigation paths.
Peking University — Research Assistant
Sep 2019 – Feb 2020
Semantic Segmentation · Beijing, China · Advisor: Prof. Yongtao Wang
  • Focused on multi-scale feature learning methods for improving semantic segmentation performance.
Service
Conference Reviewer
Computer Vision
CVPR ICCV ECCV WACV BMVC
Machine Learning
NeurIPS ICLR ICML
Journal Reviewer
TMLR
Workshop Organizer
Teaching
Teaching Assistant · Johns Hopkins University · Instructor: Prof. Tianmin Shu
Course Producer · University of Southern California · Instructor: Prof. Mohammad Reza Rajati