Jiayu Qin 秦嘉雨 | LLM Serving & AI4Science

Professional Summary

I am a PhD student specializing in LLM application and serving, with additional research in AI4Science. My work emphasizes end-to-end, reproducible systems: from algorithm design to careful benchmarking and deployment-aware evaluation.

LLM Application & Serving

KV-cache compression / quantization · inference latency–accuracy trade-offs · production-style evaluation

Design compression and quantization methods for KV cache to reduce memory footprint and improve throughput.
Build workload-aware evaluation pipelines and ablation suites to measure latency, memory, and quality trade-offs.
Engineer reproducible codebases (PyTorch/CUDA/Triton) for rapid iteration and clean integration with serving stacks.

AI4Science

SE(3) equivariant modeling · molecular representation learning · multimodal biological knowledge

Develop SE(3)-equivariant sequence/graph models for molecular property prediction and structure-aware learning.
Study molecule–protein interaction prediction via OT-based pseudo-labeling and multimodal knowledge graphs.
Research robust graph contrastive learning to mitigate false positives in self-supervised molecular learning.

Research Projects

Selected projects grouped by (1) LLM application & serving and (2) AI4Science.

LLM Application & Serving

Efficiency · System-aware evaluation · Inference optimization

Dynamic KV-Cache Compression

KV-cache · low-rank compression · serving-aware benchmarks

Develop dynamic compression strategies for KV cache to improve throughput under memory constraints.
Evaluate latency/memory/quality trade-offs across sequence lengths and batch settings.
Implement reproducible pipelines for ablation, profiling, and deployment-relevant metrics.

FinTOA: Financial Topic Attention

LLM-assisted topic pipelines · interpretable forecasting · large-scale text processing

Construct topic-attention signals from large financial news corpora for macro/asset forecasting.
Build end-to-end pipelines: embedding → topic labeling/refinement → evaluation (OOS R², DM/CW tests).
Emphasize interpretability, reproducibility, and scalable preprocessing.

AI4Science

Equivariance · Molecular ML · Learning with biological knowledge

GeoMamba-SE(3)

SE(3) equivariant modeling · Mamba-style sequence/graph architecture

Design SE(3)-equivariant Mamba-style models for molecular learning.
Study inductive biases that improve generalization and sample efficiency.

KGOT: OT Pseudo-Labeling for MPI

Molecule–protein interaction · optimal transport · multimodal knowledge graphs

Leverage OT-based pseudo-labeling to reduce annotation cost in MPI tasks.
Integrate multimodal biological knowledge graphs with strong molecular backbones.

Probability-Based Graph Contrastive Learning

Self-supervised learning · false-pair mitigation · robust objectives

Address false-positive pairs in graph contrastive learning with probabilistic weighting.
Improve stability and performance for molecular representation learning.

Experience

Industry & academic experience focused on LLM systems, ML research, and reproducible engineering.

Research Scientist Intern · Harvard University

05/2024 – 08/2025 · Boston, MA

Built reproducible pipelines for research prototyping and large-scale evaluation; collaborated closely with cross-functional teams.
Developed ML components with careful ablations and deployment-oriented performance analysis.

PhD Researcher · University at Buffalo

2023 – Present · Buffalo, NY

Research across LLM serving efficiency and AI4Science; publish and iterate with advisor feedback cycles.
Maintain clean, versioned codebases for experiments, figures, and benchmarking scripts.

Teaching Assistant · UB CSE

CSE 418/518 (Software Security) · grading, rubrics, and structured feedback

Designed grading rubrics and templates; provided structured feedback and reproducible evaluation artifacts.
Supported course operations via office hours, Piazza, and project review.

Technical Skills

A snapshot of tools and areas I use most frequently (grouped for quick scanning).

LLM Serving & Inference

Efficiency · memory optimization · benchmarking

KV-cache compression Quantization (PTQ) Profiling & benchmarking Latency/throughput trade-offs

AI4Science & Modeling

Equivariance · graphs · molecular learning

SE(3) equivariant models GNNs / geometric DL Optimal transport Molecule–protein interaction

Systems & Tooling

Engineering for iteration and reliability

Python PyTorch CUDA / Triton Linux Git Docker

Research Stack

Reproducibility · experiments · writing

Experiment tracking Ablations & evaluation LaTeX Data pipelines

Jiayu Qin (秦嘉雨)

Professional Summary

LLM Application & Serving

AI4Science

Research Projects

LLM Application & Serving

Dynamic KV-Cache Compression

FinTOA: Financial Topic Attention

AI4Science

GeoMamba-SE(3)

KGOT: OT Pseudo-Labeling for MPI

Probability-Based Graph Contrastive Learning

Experience

Research Scientist Intern · Harvard University

PhD Researcher · University at Buffalo

Teaching Assistant · UB CSE

Technical Skills

LLM Serving & Inference

AI4Science & Modeling

Systems & Tooling

Research Stack

Contact

Email

Links