I work on efficient LLM serving (KV-cache compression/quantization, inference trade-offs) and AI4Science (equivariant graph models, molecular representation learning), with an emphasis on reproducible pipelines and rigorous evaluation.
I am a PhD student specializing in LLM application and serving, with additional research in AI4Science. My work emphasizes end-to-end, reproducible systems: from algorithm design to careful benchmarking and deployment-aware evaluation.
Selected projects grouped by (1) LLM application & serving and (2) AI4Science.
Industry & academic experience focused on LLM systems, ML research, and reproducible engineering.
A snapshot of tools and areas I use most frequently (grouped for quick scanning).
Feel free to reach out for collaborations on LLM serving, efficient inference, or AI4Science.
jiayuqin@buffalo.edu