Xinyi Zhao

Machine Learning Researcher · University of Washington

Xinyi Zhao

I recently completed my Ph.D. at the University of Washington, where I was advised by Prof. Chaoyue Zhao.

My current work sits at the intersection of large language models, agentic AI, reinforcement learning, multimodal reasoning, and robust optimization. I am especially interested in LLM agents, RAG/RL, LLM-as-a-Judge, long-context reasoning, and scalable training and evaluation pipelines.

Email LinkedIn Google Scholar GitHub

Research

I work on large language models, agentic AI, reinforcement learning, and multimodal reasoning, with a focus on long-context evaluation, LLM-as-a-Judge, and scalable post-training.

Experience

Jun. 2025 - Sept. 2025, Amazon

Applied Scientist Intern

LLM-as-a-Judge

Jun. 2024 - Sept. 2024, Wyze

AI Scientist Intern

Video Anomaly Detection

Jun. 2024 - Dec. 2025, Lawrence Berkeley National Laboratory

Research Assistant

Post-Wildfire Recovery

Jun. 2022 - Sept. 2022, National Renewable Energy Laboratory

Research Intern

Hydrogen System Planning

Selected Publications

View all on Google Scholar

Data generation figure for LRBench and Judge-R1

Ablation figure for LRBench and Judge-R1

LRBench and Judge-R1: Principled Evaluation and Training of LLM-Based Judges for Long-Context Reasoning

Xinyi Zhao, Jinfeng Xiao, et al.

ACL ARR submission, 2026

[Code]

We tackle LLM reasoning evaluation with LRBench, a large-scale reasoning preference benchmark across three domains, and Judge-R1, an agentic training method that outperforms existing approaches while requiring far less training data.

SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models

Xinyi Zhao, Congjing Zhang, Pei Guo, et al.

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2025

[Paper] [Code] [Data]

We present SmartHome-Bench, a large-scale benchmark tailored to smart home video anomaly detection, revealing substantial gaps in current multimodal LLM capabilities. To address this, we propose TRLC, a taxonomy-driven agentic workflow that improves detection accuracy by 11.62%.

Awards

Clean Energy Institute Graduate Fellowship, Jul. 2023
UW College of Engineering Dean's Fellowship, Sept. 2021
Tsinghua Comprehensive Scholarship, Sept. 2019
China National Scholarship (Top 0.2% of undergraduate students in China), Nov. 2015

Others

Pet

Meet my furry friend, Jupyter, born in Spokane, WA in 2025 and named after Jupyter Notebook. You can find more of him on Instagram.

Hobbies

Chinese painting has been one of my favorite hobbies for years. It gives me a slower, more reflective space that nicely balances the pace of research.