My lessons learned building an NLP-to-SQL Query Engine

GitHub Repository: View on GitHub

Introduction

Over the past few weeks, I embarked on a project that combined my interests in natural language processing, cloud infrastructure, and backend engineering. The goal? To make it possible for non-technical users to query databases using simple English sentences, no SQL knowledge required.

It started as a side project and evolved into a fully deployed, production-ready cloud service with CI/CD, Kubernetes orchestration, and some fun debugging along the way.

The challenge was to create a service that:

Accepts a natural language question
Translates it into SQL using AI
Executes it against a live dataset
Returns results in a clean, accessible format

Approach (Brief)

I used OpenAI’s API to translate natural language into SQL. This step was critical — the model needed schema awareness, so I fed it the database table structures alongside the query prompt.

FastAPI served as the application backend, exposing an /ask endpoint for clients to submit questions. Swagger docs (/docs) provided an interactive interface for testing.

The generated SQL was executed on Amazon Athena, reading from an S3-backed dataset (Northwind database). Results were streamed back in JSON format.

The entire system was containerized with Docker, then deployed to AWS EKS (Elastic Kubernetes Service). A LoadBalancer service exposed it publicly.

CI/CD handled Docker image builds and pushes to Amazon ECR, automatically triggering new deployments on EKS. This meant every code change was deployed with minimal manual intervention.

After deployment, I had a fully functional API where you could send a POST request like:

{ "question": "Show me all products" }

and get real database results back — without writing a single line of SQL.

The service was public via an AWS Load Balancer URL, making it easy for anyone to try it out.

Lessons Learnt:

1. Authentication & Secrets

Ensuring AWS and OpenAI credentials were securely passed into the container required setting up Kubernetes secrets and GitHub Actions secrets properly.

2. Config & Imports After Refactor

I had to rename my project from nlp-agent to nlp_agent as caused broken imports until I updated all references — including in the Docker build and GitHub Actions.

3. ARM vs. AMD64 Build Issue

This was the big one. My M1 Mac (ARM64) built Docker images that failed to run on AWS EKS nodes (AMD64). The fix was to explicitly set the build platform in GitHub Actions:

docker buildx build --platform linux/amd64 ...

This ensured images ran reliably in the target cloud environment.

4. Cost Awareness

Kubernetes clusters on AWS aren’t free — so after testing, I had to tear down the cluster, ECR repos, and S3 data to avoid ongoing charges.

So, that’s all folks! If you’ve ever wanted to turn “English into SQL” and deploy it to the cloud, I hope this breakdown helps. And if you’re working on an M1 Mac — trust me, check your build platform before you spend hours debugging.