AI-powered coding assistants like GitHub Copilot and ChatGPT have transformed software development, but they come with high costs, API rate limits, and privacy concerns.
What if you could build your own AI coding chatbot—free and self-hosted?
1. Introduction
With ChatGPT’s increasing restrictions and GitHub Copilot’s pricing, developers are looking for better, cost-effective AI coding assistants.
This guide walks you through building a custom AI-powered chatbot using:
- ✅ DeepSeek API (affordable cloud-based AI coding)
- ✅ Ollama + DeepSeek-r1:8B (self-hosted, powerful AI model with no API costs)
- ✅ Ollama + O3-Mini (lightweight self-hosted AI for lower-end devices)
- ✅ Persistent memory for context-aware responses
- ✅ FastAPI-based chatbot with authentication, rate limiting, and error handling
By the end of this guide, you’ll have a fully functional AI-powered coding assistant that can replace GitHub Copilot—without the cost.
2. Why Avoid ChatGPT & GitHub Copilot?
🤔 Comparison of AI Coding Assistants
Feature | ChatGPT / Copilot | DeepSeek API | Ollama + DeepSeek-r1:8B | Ollama + O3-Mini |
---|---|---|---|---|
Cost | 💸 Expensive | 💰 Cheaper (~$0.002 per 1K tokens) | 🆓 Free (Self-hosted) | 🆓 Free (Self-hosted) |
Rate Limits | ❌ Strict | ✅ Relaxed | ✅ No Limits | ✅ No Limits |
Self-Hosting | ❌ No | ❌ No | ✅ Yes | ✅ Yes |
Optimized for Code | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
Ideal Use Case | General AI, Copilot-style | Cloud-based coding AI | Full-powered local AI | Lightweight local AI |
3. Enhancing Memory with Persistent Storage
A scratchpad improves AI responses by maintaining context.
import json
from datetime import datetime
class ChatMemory:
def __init__(self, max_length=5, persist_to_disk=True):
self.history = []
self.max_length = max_length
self.file_path = "chat_history.json"
if persist_to_disk:
self.load_history()
def add_entry(self, user_input, response):
entry = {"timestamp": datetime.now().isoformat(), "user_input": user_input, "response": response}
self.history.append(entry)
if len(self.history) > self.max_length:
self.history.pop(0)
self.save_history()
def get_context(self, window_size=3):
return "\n".join([f"You: {m['user_input']}\nAI: {m['response']}" for m in self.history[-window_size:]])
def save_history(self):
with open(self.file_path, "w") as file:
json.dump(self.history, file, indent=4)
def load_history(self):
try:
with open(self.file_path, "r") as file:
self.history = json.load(file)
except FileNotFoundError:
self.history = []
4. Using DeepSeek API for a Cloud-Based AI Chatbot
If you want cloud-based AI coding assistance, DeepSeek API is a cost-effective alternative to OpenAI.
✅ Set Up API Key & Install Dependencies
export DEEPSEEK_API_KEY="your_api_key_here"
pip install requests python-dotenv
✅ Call DeepSeek API
import requests
import os
import json
DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY")
def call_deepseek(prompt, max_tokens=500):
url = "https://api.deepseek.com/v1/chat/completions"
headers = {"Authorization": f"Bearer {DEEPSEEK_API_KEY}", "Content-Type": "application/json"}
data = {
"model": "deepseek-coder",
"messages": [{"role": "system", "content": "You are an AI coding assistant."},
{"role": "user", "content": prompt}],
"max_tokens": max_tokens
}
response = requests.post(url, headers=headers, json=data)
return response.json()["choices"][0]["message"]["content"]
5. Using Ollama for a Self-Hosted AI Chatbot (DeepSeek-r1:8B & O3-Mini)
For a fully offline coding assistant, you can use Ollama.
✅ Install Ollama & Pull AI Models
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull deepseek/deepseek-r1:8b
ollama pull o3-mini
✅ Run a Local AI Chatbot
import subprocess
def call_ollama(prompt, model="deepseek-r1:8b"):
response = subprocess.run(["ollama", "run", model, prompt], capture_output=True, text=True)
return response.stdout.strip()
print(call_ollama("Write a Python function to reverse a string."))
6. Deploying a Secure Web API with FastAPI
This API allows users to choose between DeepSeek API, DeepSeek-r1:8B (Ollama), and O3-Mini (Ollama lightweight model) dynamically, making it a truly flexible free alternative to GitHub Copilot.
🔹 Install Required Dependencies
pip install fastapi uvicorn requests slowapi python-dotenv
🔹 Full FastAPI Implementation
import os
import json
import requests
import subprocess
import logging
import time
from datetime import datetime
from fastapi import FastAPI, HTTPException, Depends
from slowapi import Limiter
from slowapi.util import get_remote_address
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY")
CHATBOT_API_KEY = os.getenv("CHATBOT_API_KEY")
# Initialize FastAPI
app = FastAPI()
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
# Persistent Chat Memory
class ChatMemory:
""" Stores chat history for context-aware AI responses. """
def __init__(self, max_length=5, persist_to_disk=True):
self.history = []
self.max_length = max_length
self.file_path = "chat_history.json"
if persist_to_disk:
self.load_history()
def add_entry(self, user_input, response):
entry = {"timestamp": datetime.now().isoformat(), "user_input": user_input, "response": response}
self.history.append(entry)
if len(self.history) > self.max_length:
self.history.pop(0)
self.save_history()
def get_context(self, window_size=3):
""" Retrieves the latest conversations for context. """
return "\n".join([f"You: {m['user_input']}\nAI: {m['response']}" for m in self.history[-window_size:]])
def save_history(self):
with open(self.file_path, "w") as file:
json.dump(self.history, file, indent=4)
def load_history(self):
try:
with open(self.file_path, "r") as file:
self.history = json.load(file)
except FileNotFoundError:
self.history = []
# Initialize memory storage
chat_memory = ChatMemory()
# DeepSeek API Call with Error Handling
def call_deepseek(prompt, max_tokens=500, temperature=0.7, retries=3):
url = "https://api.deepseek.com/v1/chat/completions"
headers = {"Authorization": f"Bearer {DEEPSEEK_API_KEY}", "Content-Type": "application/json"}
data = {
"model": "deepseek-coder",
"messages": [{"role": "system", "content": "You are an AI coding assistant."},
{"role": "user", "content": prompt}],
"temperature": temperature,
"max_tokens": max_tokens
}
for attempt in range(retries):
try:
response = requests.post(url, headers=headers, json=data, timeout=10)
response.raise_for_status()
return response.json()["choices"][0]["message"]["content"]
except requests.exceptions.RequestException as e:
logging.error(f"API call failed: {str(e)}")
time.sleep(2 ** attempt)
except KeyError:
logging.error("Unexpected response format")
return "Error: Invalid response format"
return "Error: Maximum retries exceeded"
# Ollama AI Call (Supports DeepSeek-r1:8B and O3-Mini)
def call_ollama(prompt, model="deepseek-r1:8b"):
try:
response = subprocess.run(["ollama", "run", model, prompt], capture_output=True, text=True)
return response.stdout.strip()
except Exception as e:
return f"Error: {str(e)}"
# API Key Authentication
def authenticate(api_key: str):
if api_key != CHATBOT_API_KEY:
raise HTTPException(status_code=403, detail="Invalid API Key")
@app.post("/chat")
@limiter.limit("5/minute")
async def chat_api(request: dict, api_key: str = Depends(authenticate)):
if "prompt" not in request:
raise HTTPException(status_code=400, detail="Missing prompt in request")
user_input = request["prompt"]
# Retrieve context from memory
context = chat_memory.get_context()
full_prompt = f"{context}\nYou: {user_input}\nAI:"
# Select AI backend
model_choice = request.get("model", "deepseek-api")
if model_choice == "deepseek-api":
response = call_deepseek(full_prompt)
elif model_choice == "deepseek-r1:8b":
response = call_ollama(full_prompt, "deepseek-r1:8b")
elif model_choice == "o3-mini":
response = call_ollama(full_prompt, "o3-mini")
else:
return {"error": "Invalid model choice. Choose 'deepseek-api', 'deepseek-r1:8b', or 'o3-mini'."}
# Store conversation history
chat_memory.add_entry(user_input, response)
return {"response": response, "timestamp": datetime.now().isoformat()}
7. How to Use the API (Requests & Examples)
Send a request with model selection (deepseek-api
, deepseek-r1:8b
, o3-mini
).
✅ Use DeepSeek API (Cloud-Based)
curl -X POST "http://localhost:8000/chat" -d '{"prompt": "Explain recursion in Python.", "model": "deepseek-api"}'
✅ Use Ollama (DeepSeek-r1:8B – Self-Hosted)
curl -X POST "http://localhost:8000/chat" -d '{"prompt": "Explain recursion in Python.", "model": "deepseek-r1:8b"}'
8. The Big Picture: Why This Matters?
Instead of paying monthly for ChatGPT or Copilot, you can now:
- 🚀 Own your AI assistant instead of depending on external providers.
- 🛡️ Ensure privacy by keeping your code local.
- 💰 Reduce costs to nearly zero (Ollama) or get a much cheaper API alternative (DeepSeek).
- ⚡ Optimize performance by switching between cloud and local models based on workload.
9. Final Thoughts & Next Steps
✔️ What We Built
In this guide, we:
- ✅ Deployed a FastAPI-based AI coding assistant
- ✅ Integrated both self-hosted and cloud-based AI models
- ✅ Added authentication, memory, and rate limiting
- ✅ Enabled dynamic model selection
💡 What’s Next?
- 🔥 Integrate with VS Code – Use Language Server Protocol (LSP) for inline coding assistance.
- ⚡ Live Execution Support – Let the AI run & test code before sending responses.
- 📈 Fine-Tune for Specific Use Cases – Train DeepSeek models on your own datasets.
🚀 Your Turn: Build, Experiment & Customize! 🚀
Reference
For more details on the technologies used, check out these resources:
- 🔹 DeepSeek API – DeepSeek Official API Docs
- 🔹 Ollama (Self-Hosted AI Engine) – Ollama Official Website
- 🔹 FastAPI (API Framework Used in This Guide) – FastAPI Documentation
- 🔹 SlowAPI (Rate Limiting Middleware for FastAPI) – SlowAPI GitHub
- 🔹 Python Dotenv (Environment Variable Management) – Python-Dotenv Docs
Leave a Reply