Research
2026
IEEE International Parallel and Distributed Processing Symposium (IPDPSW) 2026 Accepted First Author
2024
3rd International Conference on AI for IoT (AIIoT)
[paper]2023
IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC)
[paper]2022
8th International Conference on Signal Processing and Communication (ICSC)
[paper]SkipPar — Hybrid CPU-GPU LLM Training Framework
Designed a co-execution paradigm overlapping GPU forward/backward passes with concurrent CPU parameter updates. Implemented a 4-thread producer-consumer pipeline with PyTorch DDP hooks. Achieved up to 17% reduction in end-to-end training time on A100/H100 GPUs with LLaMA-2 (10B) and GPT-2 (9B).
PAC-IPV — Prefetch-Aware Cache Replacement
Extended an RRIP-based LLC replacement policy in ZSim with prefetch-aware RRPV assignment. Derived a probabilistic Markov-chain analytical model validated via Monte Carlo analysis (<1.5% error).
VAJRA — Heterogeneous GPU/FPGA Edge Cluster
Architected a heterogeneous edge cluster (Raspberry Pi 5, Intel DE10 SoC FPGAs, NVIDIA Jetson Orin) for model-parallel DNN inference. Demonstrated 400M-parameter inference using 4 GB collective cluster memory.
[code]Joint Resource Allocation in Vehicular Edge Networks
Formulated a joint MDP for resource allocation and service migration across MEC servers. Trained a modified DDPG agent achieving 26.67% reduction in service violations. Published at AIIoT 2024.
[code]