Raj.

01

LLM Optimization Using QLoRA and AWQ

Optimized Flan-T5 (Base, Large, XL) models using QLoRA and AWQ on the DialogSum dataset, applying precision-aware quantization and low-rank adaptation techniques. Achieved up to 40% GPU memory reduction with QLoRA and improved inference speed with AWQ, enabling efficient LLM deployment on resource-constrained systems.

  • Python,
  • HuggingFace,
  • Google Colab,
  • Kaggle