$ whoami

Qian Liu 刘乾

Member of Technical Staff @ xAI · Singapore

At xAI I work on coding agents. My daily work is mostly debug and data — reading failed runs, shipping better training bits, and iterating again. Before that: code pre-training and agentic RL at TikTok; Sailor LLMs and data mixture research at Sea AI Lab.

# experience

  1. xAI · Member of Technical Staff · Singapore

    Coding agents. Daily work: debug and data.

  2. TikTok · Researcher · Singapore

    As a small research team, we focus on code pre-training, agentic reinforcement learning, and code efficiency.

  3. Sea AI Lab · Researcher · Singapore

    Led development of the Sailor LLM family — full pre-training pipeline: data crawling & cleaning, deduplication, data mixture optimization (RegMix), continual pre-training.

# education

  1. Beihang UniversityMicrosoft Research Asia · Joint Ph.D. Program

    Ph.D. in Computer Science and Engineering. thesis

  2. Beihang University · B.S. in Computer Science and Technology

    Ranking 7 / 233.

# research interests

# selected projects

SimpleTIR

End-to-end RL for multi-turn tool-integrated reasoning · corresponding author · ICLR 2026

Plug-and-play RL that stabilizes multi-turn TIR by filtering void turns (no code block / no final answer) during policy updates — mitigating distributional drift from tool feedback and gradient explosions. From Qwen2.5-7B base (no SFT): AIME24 22.1 → 50.5, with emergent self-correction and cross-validation. Guided research direction and experimental design.

SimpleRL-Zoo

Zero RL for open base models · co-first author · COLM 2025

Systematic study of zero RL (RL on base models without SFT) across 10 open bases (Llama3, Mistral, Qwen2.5, …). Key recipes: format rewards, query difficulty control; first verification / “aha moment” behaviors in non-Qwen small models. Open-source toolkit for zero-RL research. Co-led direction and experimental design.

RegMix

Automated data mixture optimization for pre-training · co-first author · ICLR 2025 Spotlight

Formulates mixture selection as regression: train small proxy models on random mixtures, fit mixture → metric, extrapolate to the large run. ~10% compute of prior methods; outperforms human expert selection and DoReMi; transfers from 1M proxies to billion-scale models. Adopted in production pre-training (e.g. Sailor). Proposed the idea and led the project.

# selected publications

Full list → Google Scholar

# talks

# service & awards