NLP to SQL Queries
Plain english to SQL queries to interact with database deployed on AWS using EKS.
GitHub Repository: View on GitHub
Introduction
This project demonstrates how to build and deploy a FastAPI application that translates natural language questions into SQL queries and runs them against a dataset hosted in Amazon Athena. The entire pipeline is containerized, stored in Amazon ECR, and deployed on an AWS EKS cluster, accessible via a public endpoint.
Problem
Data analysts and non-technical stakeholders often struggle to query data if they don’t know SQL. A natural language interface can bridge this gap, allowing anyone to get insights from data using plain English. In this service, users can type a question in plain English (e.g., “Show me all products ordered in July”), and the system generates the corresponding SQL query, executes it against the database, and returns the results in a CSV file - all without touching SQL.
Approach
- Application Development
- Built with FastAPI to handle HTTP requests.
- Integrated OpenAI API to translate natural language into SQL queries.
- Connected to Amazon Athena to execute SQL queries on S3-hosted data.
- Containerization
- Created a multi-architecture Docker image supporting amd64 and arm64.
- Stored images in Amazon ECR.
- Continuous Integration & Deployment
- Used GitHub Actions to automate:
- Building the Docker image.
- Pushing it to ECR.
- Deploying updates to EKS.
- Used GitHub Actions to automate:
- Infrastructure
- Deployed on an AWS EKS cluster.
- Exposed via a LoadBalancer service to get a public URL.
Results
- Successfully deployed a public API where users can send natural language questions and receive SQL results.
- Achieved end-to-end automation from code commit to deployment.
- Validated scalability on AWS infrastructure.
Challenges
- GitHub Actions not triggering due to missing workflow events.
- Docker architecture mismatch (built on ARM Mac, deployed on AMD AWS nodes).
- OpenAI API quota issues causing runtime errors.
- Kubernetes CrashLoopBackOff errors due to missing environment variables.
- Cost management — keeping EKS running racks up charges quickly.
How I Solved
- Added
on: push
triggers in GitHub Actions to automate builds. - Used
--platform linux/amd64
in Docker builds to match AWS node architecture. - Stored sensitive credentials (OpenAI API key, AWS creds) in GitHub Secrets and Kubernetes secrets.
- Implemented cleanup scripts to delete EKS clusters, ECR repos, and S3 data to avoid extra costs.
Conclusion
This project demonstrates the full lifecycle of deploying an NLP-to-SQL application in the cloud, from local development to production deployment on AWS. It also highlights the importance of CI/CD, cross-platform containerization, and cost management.
Future Improvements
- Implement caching to reduce repeated OpenAI calls and save on API costs.
- Add authentication and rate-limiting for the public API.
- Expand support for multiple databases beyond Athena.
- Enhance natural language processing for more complex queries.
- Add monitoring and logging dashboards using AWS CloudWatch and Grafana.