← All projects

Pluggable Site RAG Agent

One-line embeddable chat widget backed by a self-hostable RAG service grounded in your own docs.

Why it mattersLets any business drop a grounded AI assistant onto their site without integration work - the same pattern that powers customer support, internal helpdesks, and product onboarding.

PythonFastAPILangChainQdrantGemini

What it does

A self-hostable AI support agent that drops onto any website with a single <script> tag. The visitor gets a floating chat widget; the site owner gets answers grounded in their own documents - with citations, conversation memory, and per-session rate limiting built in.

Where it applies

  • Any business that wants a knowledge-grounded chat assistant on their site without committing to a SaaS vendor or rebuilding their frontend.
  • Internal helpdesks and product onboarding flows - swap customer documents for internal runbooks and the same pipeline works.
  • Static or framework-built pages where dropping an <script> tag is the only integration the host can stomach.

How it works (high level)

A FastAPI backend exposes a single chat endpoint plus a self-installing widget script. Each turn rewrites the user question into a standalone form using prior turns, runs MMR retrieval (k=5) over a Qdrant vector store, and answers via a stuff-docs chain over Gemini. Sessions, token-bounded conversation memory, and slowapi rate limits are layered in via FastAPI middleware. Every chat turn is recorded in LangSmith for offline review.

Outcome

A site owner runs the backend once, drops the widget script into any page, and has a working AI assistant within minutes. The widget mounts a floating iframe with a chat UI; nothing about the host page's frontend has to change.

Stack

Python · FastAPI · LangChain · Qdrant · Gemini embeddings/chat · LangSmith · vanilla-JS widget.