EcoPrompt
Cloud-native middleware that optimizes LLM prompts to reduce token usage, cost, and energy consumption without modifying existing applications.

Project Overview
EcoPrompt acts as a middleware layer that processes and optimizes prompts before they are sent to large language models. By removing redundancy, compressing input, and avoiding repeated requests, it improves efficiency without requiring changes to upstream systems.
Key Highlights
Designed a cloud-native microservices architecture for prompt optimization and analytics
Built a multi-stage pipeline combining rule-based cleaning and T5 (LoRA) compression
Implemented semantic caching using TF-IDF similarity to eliminate redundant LLM calls
Deployed services using Docker and Kubernetes with autoscaling capabilities
Tracked token savings, cost reduction, and estimated CO₂ impact through analytics service
Problem
LLM-based applications often send verbose and repetitive prompts, leading to unnecessary token consumption, higher API costs, and increased energy usage. At scale, this inefficiency significantly impacts system performance and cost.
Solution
EcoPrompt introduces a multi-stage pipeline combining rule-based cleaning, AI-based compression using T5 (LoRA), and semantic caching via TF-IDF. This ensures prompts remain concise, relevant, and are not redundantly processed.
Outcome
The system achieved measurable token reduction and cost savings while maintaining output quality. It demonstrates strong capabilities in distributed systems, cloud-native deployment, and applied AI optimization.