Uni-GUI

UI-MOPD: Multi-platform On-Policy Distillation for Continual GUI Agent Learning

Niu Lian1,3*, Alan Chen4*, Zhehao Yu3, Chengzhen Duan2, Fazhan Liu2, Hui Liu2, Pei Fu2,
Jian Luan2, Yaowei Wang3,5, Shu-Tao Xia1,5, Jinpeng Wang3†

* Equal contribution    † Corresponding author

220110904@stu.hit.edu.cn (Niu Lian), wangjp26@gmail.com (Jinpeng Wang)

1Tsinghua Shenzhen International Graduate School, Tsinghua University, 2Xiaomi

3Harbin Institute of Technology, Shenzhen, 4Zhejiang University, 5Peng Cheng Laboratory

Tsinghua University Xiaomi HIT Shenzhen Zhejiang University
UI-MOPD

Abstract

Recent advances in multimodal foundation models and agent systems have driven GUI agents from single-platform task execution toward cross-platform interaction. However, building multi-platform GUI agents remains challenging. On one hand, high-quality and executable cross-platform interaction trajectories are still scarce, and existing data often suffer from limited platform coverage. On the other hand, different platforms exhibit distinct interaction conventions, making joint or continual training prone to behavioral pattern mixing, platform-specific capability degradation, and catastrophic forgetting. To address these challenges, we construct Uni-GUI, a high-quality cross-platform GUI interaction dataset, and propose UI-MOPD, the first method that incorporates multi-teacher on-policy distillation into continual learning for GUI agents. UI-MOPD dynamically selects a platform-specific teacher according to the current environment and transfers platform-specific behavioral priors to a shared policy through platform-conditioned distillation, enabling adaptation to new platforms while preserving capabilities on existing ones. Experiments on OSWorld and MobileWorld show that UI-MOPD achieves task success rates of 38.2% and 12.0%, respectively, demonstrating its effectiveness in balancing cross-platform capability retention and new-platform adaptation.

Motivation

Motivation of UI-MOPD

Figure 1. Motivation of UI-MOPD. Naively combining desktop and mobile signals, as in model merging or mixed SFT, can mix platform-specific behavioral conventions and produce an averaged policy. UI-MOPD uses platform-conditioned routing and multi-teacher on-policy distillation to integrate platform-specific expertise into a shared GUI agent.

Method

Two-Stage Training Pipeline

Stage 1: Supervised Fine-Tuning

Fine-tune Qwen3-VL-32B-Thinking on the Uni-GUI dataset to obtain platform-specific expert teachers: a desktop teacher and a mobile teacher.

Stage 2: Multi-Teacher On-Policy Distillation

Train a shared student policy (Qwen3-VL-8B-Thinking) with reinforcement learning and platform-conditioned teacher routing for continual cross-platform learning.

UI-MOPD Method Overview

Figure 2. Overview of the UI-MOPD training pipeline. Stage 1 performs supervised fine-tuning to obtain platform-specific teachers. Stage 2 applies multi-teacher on-policy distillation with platform-conditioned routing, adaptive KL masking, and structured outcome reward.

Key Components

Platform-Conditioned Routing

Routes each rollout to the corresponding platform-specific teacher based on the current environment type.

K3 Estimator

Efficient single-sample KL divergence estimator that avoids full vocabulary computation, reducing memory and compute overhead.

Adaptive KL Masking

Removes teacher penalty when task reward is already sufficient, preventing over-regularization.

Uni-GUI Dataset

~160K
Interaction Steps
~11.5K
Trajectories
2
Platforms (Desktop + Mobile)
Data Collection Harness

Figure 3. Overview of the Unified Cross-Platform Data Collection Harness used to build the Uni-GUI dataset.

Main Results

Baselines and integration strategies on OSWorld and MobileWorld (Table 1).

Method OSWorld MobileWorld
General Models
SeedVL-1.534.1%--
Qwen3-VL-8B-Instruct33.9%9.4%
Qwen3-VL-8B-Thinking33.9%7.7%
Qwen3-VL-32B-Instruct32.6%9.0%
Qwen3-VL-235B-A22B-Instruct31.6%9.5%
Qwen3-VL-235B-A22B-Thinking38.1%--
GUI Models (Single-Platform)
OpenCUA-7B28.2%--
OpenAI CUA o331.3%--
OpenCUA-32B34.8%--
GUI Models (Multi-Platform)
UI-TARS-72B-DPO27.1%--
UI-TARS-1.5-7B27.4%--
GELab-Zero-4B31.9%10.9%
GUI-Owl-7B34.9%4.5%
GUI-Owl-32B--5.5%
Integration Strategies
Mixed-SFT35.0%6.4%
Model Merge (Weight Averaging)36.5%6.8%
Model Merge (TIES Merging)36.8%0%
UI-MOPD (Ours)38.2%12.0%

OSWorld (Desktop)

38.2%
+12.7% relative improvement

MobileWorld (Mobile)

12.0%
+55.8% relative improvement

UI-MOPD achieves state-of-the-art balanced cross-platform performance, demonstrating effective capability retention on desktop while significantly improving mobile task success rate.

Teacher-Student Analysis

Teacher-student analysis on OSWorld and MobileWorld (Table 2).

Method OSWorld MobileWorld
Base Models
Qwen3-VL-8B-Thinking33.9%7.7%
Qwen3-VL-32B-Thinking41.0%9.4%
Single-Platform SFT (8B)
8B SFT on OSWorld35.8%0%
8B SFT on MobileWorld35.8%12.8%
Platform-Specific Teachers (32B)
Desktop Teacher, 32B46.3%
Mobile Teacher, 32B16.2%
UI-MOPD (Ours)38.2%12.0%

UI-MOPD effectively distills knowledge from platform-specific 32B teachers into a shared 8B student, achieving balanced cross-platform performance that surpasses single-platform fine-tuning.

GUI Grounding & Understanding

General GUI grounding, visual understanding, and AndroidControl results (Table 3).

Model AndroidControl* ScreenSpot-Pro ScreenSpotV2 OSWorld-G
Qwen3-VL-8B-Thinking78.73%43.71%91.27%52.13%
Model Merge (TIES Merging)74.01%37.13%88.60%47.16%
UI-MOPD (Ours)80.05%43.14%90.88%52.84%

UI-MOPD preserves GUI grounding and visual understanding capabilities while improving interactive task performance, unlike static parameter merging which shows clear degradation.

Case Studies

Desktop

Desktop Case Study

Mobile

Mobile Case Study

Citation

@article{lian2025uimopd,
  title={UI-MOPD: Multi-platform On-Policy Distillation for Continual GUI Agent Learning},
  author={Lian, Niu and Chen, Alan and Yu, Zhehao and Duan, Chengzhen and Liu, Fazhan and Liu, Hui and Fu, Pei and Luan, Jian and Wang, Yaowei and Xia, Shu-Tao and Wang, Jinpeng},
  year={2025}
}