LLM Application & Serving · AI4Science

Jiayu Qin (秦嘉雨)

I work on efficient LLM serving (KV-cache compression/quantization, inference trade-offs) and AI4Science (equivariant graph models, molecular representation learning), with an emphasis on reproducible pipelines and rigorous evaluation.

KV-cache · Serving Optimization PyTorch · CUDA/Triton SE(3) Equivariance Molecular ML

Professional Summary

I am a PhD student specializing in LLM application and serving, with additional research in AI4Science. My work emphasizes end-to-end, reproducible systems: from algorithm design to careful benchmarking and deployment-aware evaluation.

LLM Application & Serving

KV-cache compression / quantization · inference latency–accuracy trade-offs · production-style evaluation
  • Design compression and quantization methods for KV cache to reduce memory footprint and improve throughput.
  • Build workload-aware evaluation pipelines and ablation suites to measure latency, memory, and quality trade-offs.
  • Engineer reproducible codebases (PyTorch/CUDA/Triton) for rapid iteration and clean integration with serving stacks.

AI4Science

SE(3) equivariant modeling · molecular representation learning · multimodal biological knowledge
  • Develop SE(3)-equivariant sequence/graph models for molecular property prediction and structure-aware learning.
  • Study molecule–protein interaction prediction via OT-based pseudo-labeling and multimodal knowledge graphs.
  • Research robust graph contrastive learning to mitigate false positives in self-supervised molecular learning.

Research Projects

Selected projects grouped by (1) LLM application & serving and (2) AI4Science.

LLM Application & Serving

Efficiency · System-aware evaluation · Inference optimization

Dynamic KV-Cache Compression

KV-cache · low-rank compression · serving-aware benchmarks
  • Develop dynamic compression strategies for KV cache to improve throughput under memory constraints.
  • Evaluate latency/memory/quality trade-offs across sequence lengths and batch settings.
  • Implement reproducible pipelines for ablation, profiling, and deployment-relevant metrics.

FinTOA: Financial Topic Attention

LLM-assisted topic pipelines · interpretable forecasting · large-scale text processing
  • Construct topic-attention signals from large financial news corpora for macro/asset forecasting.
  • Build end-to-end pipelines: embedding → topic labeling/refinement → evaluation (OOS R², DM/CW tests).
  • Emphasize interpretability, reproducibility, and scalable preprocessing.

AI4Science

Equivariance · Molecular ML · Learning with biological knowledge

GeoMamba-SE(3)

SE(3) equivariant modeling · Mamba-style sequence/graph architecture
  • Design SE(3)-equivariant Mamba-style models for molecular learning.
  • Study inductive biases that improve generalization and sample efficiency.

KGOT: OT Pseudo-Labeling for MPI

Molecule–protein interaction · optimal transport · multimodal knowledge graphs
  • Leverage OT-based pseudo-labeling to reduce annotation cost in MPI tasks.
  • Integrate multimodal biological knowledge graphs with strong molecular backbones.

Probability-Based Graph Contrastive Learning

Self-supervised learning · false-pair mitigation · robust objectives
  • Address false-positive pairs in graph contrastive learning with probabilistic weighting.
  • Improve stability and performance for molecular representation learning.

Experience

Industry & academic experience focused on LLM systems, ML research, and reproducible engineering.

Research Scientist Intern · Harvard University

05/2024 – 08/2025 · Boston, MA
  • Built reproducible pipelines for research prototyping and large-scale evaluation; collaborated closely with cross-functional teams.
  • Developed ML components with careful ablations and deployment-oriented performance analysis.

PhD Researcher · University at Buffalo

2023 – Present · Buffalo, NY
  • Research across LLM serving efficiency and AI4Science; publish and iterate with advisor feedback cycles.
  • Maintain clean, versioned codebases for experiments, figures, and benchmarking scripts.

Teaching Assistant · UB CSE

CSE 418/518 (Software Security) · grading, rubrics, and structured feedback
  • Designed grading rubrics and templates; provided structured feedback and reproducible evaluation artifacts.
  • Supported course operations via office hours, Piazza, and project review.

Technical Skills

A snapshot of tools and areas I use most frequently (grouped for quick scanning).

LLM Serving & Inference

Efficiency · memory optimization · benchmarking
KV-cache compression Quantization (PTQ) Profiling & benchmarking Latency/throughput trade-offs

AI4Science & Modeling

Equivariance · graphs · molecular learning
SE(3) equivariant models GNNs / geometric DL Optimal transport Molecule–protein interaction

Systems & Tooling

Engineering for iteration and reliability
Python PyTorch CUDA / Triton Linux Git Docker

Research Stack

Reproducibility · experiments · writing
Experiment tracking Ablations & evaluation LaTeX Data pipelines

Contact

Feel free to reach out for collaborations on LLM serving, efficient inference, or AI4Science.

Email

Fastest way to reach me

jiayuqin@buffalo.edu

Links

Profiles & updates