Hi! I am Haoxuan Xu (Harrison, 徐浩轩).

I am an incoming PhD student at the Hong Kong University of Science and Technology (HKUST), advised by Prof. Yuan Liu. I am currently completing my MPhil in System Hub/ROAS Trust at the Hong Kong University of Science and Technology (Guangzhou), advised by Prof. Haoang Li. Previously, I earned my undergraduate degree from the School of Information Science and Engineering at Shandong University (Chongxin College), advised by Prof. Yang Yang.

🎓 Education

HKUST

PhD in Intelligent Graphics Lab 2026.8 (Incoming)

HKUST(GZ)

MPhil in Robotics and AI 2024.9 - 2026.7 (Expected)

Shandong University

B.Eng. in Communication Engineering 2020.9 - 2024.6

💼 Experience

ACE Robotics

Research Intern advised by Liang Pan 2026.4 - Present

vivo

Research Intern advised by Shuai Ren 2026.1 - 2026.3

DJI

Image Algorithm Intern advised by Liang Yu 2023.11 - 2024.4

🚀 Research Interests

Embodied Intelligence
Computer Vision

My research interests lie in embodied intelligence and computer vision, especially perception, reasoning, and policy learning for agents operating in dynamic real-world environments.

If you are interested in any aspect of me, I am always open to discussions and collaborations. Feel free to reach out to me at - hxu095 [at] connect.hkust-gz.edu.cn

📝 Publications († denotes equal contribution)

Preprint

HCSG: Human-Centric Semantic-Geometric Reasoning for Vision-Language Navigation

ArXiv Preprint

Haoxuan Xu†, Tianfu Li†, Wenbo Chen, Yi Liu, Jin Wu, Huashuo Lei, Yunfan Lou, Lujia Wang, Hesheng Wang, Haoang Li

Introduces a human-centric VLN framework that combines geometric human motion forecasting with semantic intention understanding for socially aware navigation.

ICML 2026

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

ICML 2026

Yunfan Lou, Xiaowei Chi, Xiaojie Zhang, Zezhong Qian, Chengxuan Li, Rongyu Zhang, Yaoxu Lyu, Guoyu Song, Chuyao Fu, Haoxuan Xu, Pengwei Wang, Shanghang Zhang

Learns predictive world models over semantic masks rather than raw pixels, improving robot policy robustness to visual distractions and distribution shifts.

ICRA 2026

GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments

ICRA 2026

Yi Liu, Haoxuan Xu, Hongbo Duan, Keyu Fan, Zhengyang Zhang, Peiyu Zhuang, Pengting Luo, Houde Liu

Builds a monocular 3D Gaussian Splatting SLAM system for dynamic scenes by separating static mapping cues from dynamic distractors.

Preprint

P³Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation

ArXiv Preprint

Tianfu Li†, Wenbo Chen†, Haoxuan Xu†, Xinhu Zheng, Haoang Li

Unified perception, prediction, and planning in a single VLN network, using intermediate modules to sharpen scene understanding and boost navigation accuracy.

Preprint

Enhancing Vision-Language Navigation with Multimodal Event Knowledge from Real-World Indoor Tour Videos

ArXiv Preprint

Haoxuan Xu, Tianfu Li, Wenbo Chen, Yi Liu, Xingxing Zuo, Yaoxian Song, Haoang Li

Constructs multimodal event knowledge from real-world indoor tour videos and injects it into VLN agents for long-horizon reasoning.

Preprint

IRPO: Boosting Image Restoration via Post-training GRPO

ArXiv Preprint

Haoxuan Xu†, Yi Liu†, Tianfu Li, Ruolin Shen, Boyuan Jiang, Jinlong Peng, Donghao Luo, Xiaobin Hu, Shuicheng Yan, Haoang Li

Adapts GRPO-based post-training to image restoration with data-oriented sampling and reward-oriented optimization for stronger in-domain and OOD restoration.

Journal

Cross-domain Car Detection Model with Integrated Convolutional Block Attention Mechanism

Image and Vision Computing (JCR Q1, IF:4.7, CCF-C)

Haoxuan Xu†, Songnung Lai†, Yang Yang

Proposed a complete cross-domain detection framework with an integrated CBAM architecture and GIOU loss optimization.

Journal

How did the Chinese Public Discuss the 2023 Türkiye-Syria Earthquake and the Humanitarian Response on Social Media? A Topical and Sentimental Analysis

International Journal of Disaster Risk Science (JCR Q1 (IF: 5.0))

Mengfan Shen, Haoxuan Xu, Hongbing Liu and Ziqiang Han

Applied topic modeling and sentiment analysis to Weibo posts, identifying key themes and public emotions during international disaster response.

Journal

Multimodal Sentiment Analysis: A Survey

Displays (JCR Q1 (IF: 4.3))

Songning Lai, Xifeng Hu, Haoxuan Xu, Zhaoxia Ren and Zhi Liu

Provides a comprehensive overview of multimodal sentiment analysis, covering its history, datasets, advanced models, and future prospects.

Journal

MG-KG: Unsupervised video anomaly detection based on motion guidance and knowledge graph

Image and Vision Computing (JCR Q1, IF:4.7, CCF-C)

Qiyue Sun, Yang Yang, Haoxuan Xu, Zezhou Li, Yunxia Liu and Hongjun Wang

Addresses spatio-temporal linkage and interpretability in VAD by unifying motion-guided prediction with knowledge-graph retrieval.

🔭 Projects

RBM Project

Research and Development of Embodied AI-based Multi-terrain Service Robot

Used ConceptGraph for open-vocabulary scene mapping and CLIP/GPT4 for object retrieval.
Implemented optimized A* and KD-Tree for path planning.
Deployed on Songling chassis for sim-to-real transition.

🎖 Honors and Awards

Postgraduate Studentship (PGS) Award, HKUST(GZ)
First Prize, National College Student Mathematical Modeling Competition (Shandong Province)
Second Prize, 14th National College Student Mathematics Competition
Outstanding Graduate, Shandong University