Jingyu Liu

CS PhD Student at Uchicago, Part-time Student Researcher at Together AI

University of Chicago

Bio

I am a first-year PhD student at University of Chicago, fortunately advised by prof Ce Zhang. I finished my master study in CS at ETH Zurich. During the master’s, I took a gap year at Meta AI working as an AI resident on LLMs and 3D computer vision. I was very fortunate to work with many talented folks and be supervised by Barlas Oğuz, Mike Lewis, and Gabriel Synnaeve. Before the master’s, I spent a year working on building search engines at ByteDance as a MLE. I graduated from NYU with honors in CS and was awarded with Prize for Outstanding Performance in CS.

With my previous works on CodeLlama and Llama 2 Long, I become very interested in AI systems, specially in developing efficient algorithms and systems for large-scale training and inference. I’m intrigued by how we could improve model alignment and understand the science behind these foundation models.

{first_name}6 AT uchicago DOT edu

Feel free to drop me an email for anything, especially for potential collaboration!!

Interests

Large language models & NLP
AI systems
Science of foundation models

Education

PhD Student in CS, 2024 - Present
University of Chicago
MS in Computer Science, 2024
ETH Zurich
BA in Computer Science with Honors, 2020
New York University

Updates

[2025.11] TiDAR at Nvidia is out! As a sequence-level hybrid model that conducts parallel diffusion drafting and autoregressive sampling in a single forward, TiDAR is the first architecture to close the quality gap with AR models while delivering 4.71x to 5.91x more tokens per second. Stay tuned for the SGLang inference code release.

[2025.05] We introduce HAMburger, a new model that redefines resource allocation for LLMs by generating multiple tokens per step with a single KV cache.

[2025.05] Speculative Prefill got accepted by ICML 2025! Feel free to try our code here.

[2025.03] I will join the Inference Optimization team at Nvidia as a research scientist intern in summer 2025.

[2025.02] New work released called Speculative Prefill, which increases LLM inference TTFT and maximal QPS! Feel free to check the paper and code.

[2024.10] Our survey paper got accepted by TMLR 2025!

[2024.09] I’m starting my PhD at Uchicago, working with professor Ce Zhang.

[2024.08] Our paper got accepted by WACV 2025!

Experience

Research Scientist Intern

Nvidia

June 2025 – December 2025 Santa Clara, CA

Research on Diffusion LLMs:

Scaling different types of DLLMs
Model distillation
Developing adaptive caching mechanism
Hybrid architecture

AI Resident

Meta AI

September 2022 – September 2023 Menlo Park, CA

Research on large language models:

CodeLlama: SOTA open sourced code generation LLMs
Llama 2 Long: effective context length extension of Llama 2 up to 32K

Research on 3D computer vision:

Semantic 3D indoor scene synthesis, reasoning, and planning
Text-guided 3D human generation

Research Assistant

ETH Zurich

March 2022 – November 2022 Zurich, Switzerland

Student research assistant working on offline reinforcement learning algorithms that train with a mixture of trajectories sampled from multiple demonstrators.

Machine Learning Engineer

ByteDance

August 2020 – August 2021 Beijing, China

Worked on the search engine in Douyin’s E-commerce platform from the very early stage, including the search index, data pipeline, retrieval module, and ranking deep models.

Teaching Assistant

Courant Institute, New York University

September 2018 – May 2019 New York, NY

Tutored students on computer system organization.

Papers

TiDAR: Think in Diffusion, Talk in Autoregression

Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due …

Jingyu Liu, Xin Dong, Zhifan Ye, Rishabh Mehta, Yonggan Fu, Vartika Singh, Jan Kautz, Ce Zhang, Pavlo Molchanov

TiDAR: Think in Diffusion, Talk in Autoregression

HAMburger: Accelerating LLM Inference via Token Smashing

The growing demand for efficient Large Language Model (LLM) inference requires a holistic optimization on algorithms, systems, and …

Jingyu Liu, Ce Zhang

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

Improving time-to-first-token (TTFT) is an essentially important objective in modern large language model (LLM) inference engines. …

Jingyu Liu, Beidi Chen, Ce Zhang

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

How Far Are We From AGI?

The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple …

Tao Feng, Chuanyang Jin, Jingyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, Jiaxuan You

Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning

This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied agents’ abilities in interactive 3D indoor …

Rao Fu, Jingyu Liu, Xilun Chen, Yixin Nie, Wenhan Xiong

Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning

See all publications

Academic Service

Reviewer for ICML 2025

Reviewer for How Far Are We From AGI @ ICLR 2024

Reviewer for Long-Context Foundation Models (LCFM) @ ICML 2024

Miscellaneous

I was first trained as a game designer at NYU Game Center during my undergrad and became increasingly more interested in CS and AI. Despite that, I’m still very interested in game dev, physically-based rendering, and game AI.

During my free time, I enjoy playing chess (my favorite live-stream), electric guitars (my favorite instrumental band), and recently got obsessed with golf (a group of chilled golfers).