Raj Venkat Reddy Mavuram

LLM Optimization Using QLoRA and AWQ

Optimized Flan-T5 (Base, Large, XL) models using QLoRA and AWQ on the DialogSum dataset, applying precision-aware quantization and low-rank adaptation techniques. Achieved up to 40% GPU memory reduction with QLoRA and improved inference speed with AWQ, enabling efficient LLM deployment on resource-constrained systems.

Python,
HuggingFace,
Google Colab,
Kaggle