Hi! I am Haoxuan Xu (Harrison, 徐浩轩).

I am an incoming PhD student at the Hong Kong University of Science and Technology (HKUST), advised by Prof. Yuan Liu. I am currently completing my MPhil in System Hub/ROAS Trust at the Hong Kong University of Science and Technology (Guangzhou), advised by Prof. Haoang Li. Previously, I earned my undergraduate degree from the School of Information Science and Engineering at Shandong University (Chongxin College), advised by Prof. Yang Yang.

🎓 Education

HKUST
PhD in Intelligent Graphics Lab 2026.8 (Incoming)
HKUST(GZ)
MPhil in Robotics and AI 2024.9 - 2026.7 (Expected)
Shandong University
B.Eng. in Communication Engineering 2020.9 - 2024.6

💼 Experience

ACE Robotics
Research Intern advised by Liang Pan 2026.4 - Present
vivo
Research Intern advised by Shuai Ren 2026.1 - 2026.3
DJI
Image Algorithm Intern advised by Liang Yu 2023.11 - 2024.4

🚀 Research Interests

  • Embodied Intelligence
  • Computer Vision

My research interests lie in embodied intelligence and computer vision, especially perception, reasoning, and policy learning for agents operating in dynamic real-world environments.

If you are interested in any aspect of me, I am always open to discussions and collaborations. Feel free to reach out to me at - hxu095 [at] connect.hkust-gz.edu.cn

📝 Publications († denotes equal contribution)

Preprint
HCSG

HCSG: Human-Centric Semantic-Geometric Reasoning for Vision-Language Navigation

ArXiv Preprint

Haoxuan Xu†, Tianfu Li†, Wenbo Chen, Yi Liu, Jin Wu, Huashuo Lei, Yunfan Lou, Lujia Wang, Hesheng Wang, Haoang Li

  • Introduces a human-centric VLN framework that combines geometric human motion forecasting with semantic intention understanding for socially aware navigation.
ICML 2026
Mask World Model

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

ICML 2026

Yunfan Lou, Xiaowei Chi, Xiaojie Zhang, Zezhong Qian, Chengxuan Li, Rongyu Zhang, Yaoxu Lyu, Guoyu Song, Chuyao Fu, Haoxuan Xu, Pengwei Wang, Shanghang Zhang

  • Learns predictive world models over semantic masks rather than raw pixels, improving robot policy robustness to visual distractions and distribution shifts.
ICRA 2026

GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments

ICRA 2026

Yi Liu, Haoxuan Xu, Hongbo Duan, Keyu Fan, Zhengyang Zhang, Peiyu Zhuang, Pengting Luo, Houde Liu

  • Builds a monocular 3D Gaussian Splatting SLAM system for dynamic scenes by separating static mapping cues from dynamic distractors.
Preprint
P3Nav

P3Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation

ArXiv Preprint

Tianfu Li†, Wenbo Chen†, Haoxuan Xu†, Xinhu Zheng, Haoang Li

  • Unified perception, prediction, and planning in a single VLN network, using intermediate modules to sharpen scene understanding and boost navigation accuracy.
Preprint
Event Knowledge VLN

Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos

ArXiv Preprint

Haoxuan Xu, Tianfu Li, Wenbo Chen, Yi Liu, Xingxing Zuo, Yaoxian Song, Haoang Li

  • Constructs multimodal event knowledge from real-world indoor tour videos and injects it into VLN agents for long-horizon reasoning.
Preprint
IRPO

IRPO: Boosting Image Restoration via Post-training GRPO

ArXiv Preprint

Haoxuan Xu†, Yi Liu†, Tianfu Li, Ruolin Shen, Boyuan Jiang, Jinlong Peng, Donghao Luo, Xiaobin Hu, Shuicheng Yan, Haoang Li

  • Adapts GRPO-based post-training to image restoration with data-oriented sampling and reward-oriented optimization for stronger in-domain and OOD restoration.
Journal
CDCDMA

Cross-domain Car Detection Model with Integrated Convolutional Block Attention Mechanism

Image and Vision Computing (JCR Q1, IF:4.7, CCF-C)

Haoxuan Xu†, Songnung Lai†, Yang Yang

  • Proposed a complete cross-domain detection framework with an integrated CBAM architecture and GIOU loss optimization.
Journal
Earthquake Analysis

How did the Chinese Public Discuss the 2023 Türkiye-Syria Earthquake and the Humanitarian Response on Social Media? A Topical and Sentimental Analysis

International Journal of Disaster Risk Science (JCR Q1 (IF: 5.0))

Mengfan Shen, Haoxuan Xu, Hongbing Liu and Ziqiang Han

  • Applied topic modeling and sentiment analysis to Weibo posts, identifying key themes and public emotions during international disaster response.
Journal
Multimodal Survey

Multimodal Sentiment Analysis: A Survey

Displays (JCR Q1 (IF: 4.3))

Songning Lai, Xifeng Hu, Haoxuan Xu, Zhaoxia Ren and Zhi Liu

  • Provides a comprehensive overview of multimodal sentiment analysis, covering its history, datasets, advanced models, and future prospects.
Journal
MG-KG

MG-KG: Unsupervised video anomaly detection based on motion guidance and knowledge graph

Image and Vision Computing (JCR Q1, IF:4.7, CCF-C)

Qiyue Sun, Yang Yang, Haoxuan Xu, Zezhou Li, Yunxia Liu and Hongjun Wang

  • Addresses spatio-temporal linkage and interpretability in VAD by unifying motion-guided prediction with knowledge-graph retrieval.

🔭 Projects

RBM Project
Service Robot

Research and Development of Embodied AI-based Multi-terrain Service Robot

  • Used ConceptGraph for open-vocabulary scene mapping and CLIP/GPT4 for object retrieval.
  • Implemented optimized A* and KD-Tree for path planning.
  • Deployed on Songling chassis for sim-to-real transition.

🎖 Honors and Awards

  • Postgraduate Studentship (PGS) Award, HKUST(GZ)
  • First Prize, National College Student Mathematical Modeling Competition (Shandong Province)
  • Second Prize, 14th National College Student Mathematics Competition
  • Outstanding Graduate, Shandong University