Build a Custom AI Coding Chatbot with DeepSeek API & Ollama: Free GitHub Copilot Alternative

AI-powered coding assistants like GitHub Copilot and ChatGPT have transformed software development, but they come with high costs, API rate limits, and privacy concerns.

What if you could build your own AI coding chatbot—free and self-hosted?


1. Introduction

With ChatGPT’s increasing restrictions and GitHub Copilot’s pricing, developers are looking for better, cost-effective AI coding assistants.

This guide walks you through building a custom AI-powered chatbot using:

  • DeepSeek API (affordable cloud-based AI coding)
  • Ollama + DeepSeek-r1:8B (self-hosted, powerful AI model with no API costs)
  • Ollama + O3-Mini (lightweight self-hosted AI for lower-end devices)
  • Persistent memory for context-aware responses
  • FastAPI-based chatbot with authentication, rate limiting, and error handling

By the end of this guide, you’ll have a fully functional AI-powered coding assistant that can replace GitHub Copilot—without the cost.


2. Why Avoid ChatGPT & GitHub Copilot?

🤔 Comparison of AI Coding Assistants

FeatureChatGPT / CopilotDeepSeek APIOllama + DeepSeek-r1:8BOllama + O3-Mini
Cost💸 Expensive💰 Cheaper (~$0.002 per 1K tokens)🆓 Free (Self-hosted)🆓 Free (Self-hosted)
Rate Limits❌ Strict✅ Relaxed✅ No Limits✅ No Limits
Self-Hosting❌ No❌ No✅ Yes✅ Yes
Optimized for Code✅ Yes✅ Yes✅ Yes✅ Yes
Ideal Use CaseGeneral AI, Copilot-styleCloud-based coding AIFull-powered local AILightweight local AI

3. Enhancing Memory with Persistent Storage

A scratchpad improves AI responses by maintaining context.

import json
from datetime import datetime

class ChatMemory:
    def __init__(self, max_length=5, persist_to_disk=True):
        self.history = []
        self.max_length = max_length
        self.file_path = "chat_history.json"
        if persist_to_disk:
            self.load_history()

    def add_entry(self, user_input, response):
        entry = {"timestamp": datetime.now().isoformat(), "user_input": user_input, "response": response}
        self.history.append(entry)
        if len(self.history) > self.max_length:
            self.history.pop(0)
        self.save_history()

    def get_context(self, window_size=3):
        return "\n".join([f"You: {m['user_input']}\nAI: {m['response']}" for m in self.history[-window_size:]])

    def save_history(self):
        with open(self.file_path, "w") as file:
            json.dump(self.history, file, indent=4)

    def load_history(self):
        try:
            with open(self.file_path, "r") as file:
                self.history = json.load(file)
        except FileNotFoundError:
            self.history = []

4. Using DeepSeek API for a Cloud-Based AI Chatbot

If you want cloud-based AI coding assistance, DeepSeek API is a cost-effective alternative to OpenAI.

Set Up API Key & Install Dependencies

export DEEPSEEK_API_KEY="your_api_key_here"
pip install requests python-dotenv

Call DeepSeek API

import requests
import os
import json

DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY")

def call_deepseek(prompt, max_tokens=500):
    url = "https://api.deepseek.com/v1/chat/completions"
    headers = {"Authorization": f"Bearer {DEEPSEEK_API_KEY}", "Content-Type": "application/json"}
    
    data = {
        "model": "deepseek-coder",
        "messages": [{"role": "system", "content": "You are an AI coding assistant."},
                     {"role": "user", "content": prompt}],
        "max_tokens": max_tokens
    }

    response = requests.post(url, headers=headers, json=data)
    return response.json()["choices"][0]["message"]["content"]

5. Using Ollama for a Self-Hosted AI Chatbot (DeepSeek-r1:8B & O3-Mini)

For a fully offline coding assistant, you can use Ollama.

Install Ollama & Pull AI Models

curl -fsSL https://ollama.ai/install.sh | sh
ollama pull deepseek/deepseek-r1:8b
ollama pull o3-mini

Run a Local AI Chatbot

import subprocess

def call_ollama(prompt, model="deepseek-r1:8b"):
    response = subprocess.run(["ollama", "run", model, prompt], capture_output=True, text=True)
    return response.stdout.strip()

print(call_ollama("Write a Python function to reverse a string."))

6. Deploying a Secure Web API with FastAPI

This API allows users to choose between DeepSeek API, DeepSeek-r1:8B (Ollama), and O3-Mini (Ollama lightweight model) dynamically, making it a truly flexible free alternative to GitHub Copilot.

🔹 Install Required Dependencies

pip install fastapi uvicorn requests slowapi python-dotenv

🔹 Full FastAPI Implementation

import os
import json
import requests
import subprocess
import logging
import time
from datetime import datetime
from fastapi import FastAPI, HTTPException, Depends
from slowapi import Limiter
from slowapi.util import get_remote_address
from dotenv import load_dotenv

# Load environment variables
load_dotenv()
DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY")
CHATBOT_API_KEY = os.getenv("CHATBOT_API_KEY")

# Initialize FastAPI
app = FastAPI()
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter

# Persistent Chat Memory
class ChatMemory:
    """ Stores chat history for context-aware AI responses. """
    def __init__(self, max_length=5, persist_to_disk=True):
        self.history = []
        self.max_length = max_length
        self.file_path = "chat_history.json"
        if persist_to_disk:
            self.load_history()

    def add_entry(self, user_input, response):
        entry = {"timestamp": datetime.now().isoformat(), "user_input": user_input, "response": response}
        self.history.append(entry)
        if len(self.history) > self.max_length:
            self.history.pop(0)
        self.save_history()

    def get_context(self, window_size=3):
        """ Retrieves the latest conversations for context. """
        return "\n".join([f"You: {m['user_input']}\nAI: {m['response']}" for m in self.history[-window_size:]])

    def save_history(self):
        with open(self.file_path, "w") as file:
            json.dump(self.history, file, indent=4)

    def load_history(self):
        try:
            with open(self.file_path, "r") as file:
                self.history = json.load(file)
        except FileNotFoundError:
            self.history = []

# Initialize memory storage
chat_memory = ChatMemory()

# DeepSeek API Call with Error Handling
def call_deepseek(prompt, max_tokens=500, temperature=0.7, retries=3):
    url = "https://api.deepseek.com/v1/chat/completions"
    headers = {"Authorization": f"Bearer {DEEPSEEK_API_KEY}", "Content-Type": "application/json"}
    
    data = {
        "model": "deepseek-coder",
        "messages": [{"role": "system", "content": "You are an AI coding assistant."},
                     {"role": "user", "content": prompt}],
        "temperature": temperature,
        "max_tokens": max_tokens
    }

    for attempt in range(retries):
        try:
            response = requests.post(url, headers=headers, json=data, timeout=10)
            response.raise_for_status()
            return response.json()["choices"][0]["message"]["content"]
        except requests.exceptions.RequestException as e:
            logging.error(f"API call failed: {str(e)}")
            time.sleep(2 ** attempt)
        except KeyError:
            logging.error("Unexpected response format")
            return "Error: Invalid response format"
    return "Error: Maximum retries exceeded"

# Ollama AI Call (Supports DeepSeek-r1:8B and O3-Mini)
def call_ollama(prompt, model="deepseek-r1:8b"):
    try:
        response = subprocess.run(["ollama", "run", model, prompt], capture_output=True, text=True)
        return response.stdout.strip()
    except Exception as e:
        return f"Error: {str(e)}"

# API Key Authentication
def authenticate(api_key: str):
    if api_key != CHATBOT_API_KEY:
        raise HTTPException(status_code=403, detail="Invalid API Key")

@app.post("/chat")
@limiter.limit("5/minute")
async def chat_api(request: dict, api_key: str = Depends(authenticate)):
    if "prompt" not in request:
        raise HTTPException(status_code=400, detail="Missing prompt in request")

    user_input = request["prompt"]
    
    # Retrieve context from memory
    context = chat_memory.get_context()
    full_prompt = f"{context}\nYou: {user_input}\nAI:"

    # Select AI backend
    model_choice = request.get("model", "deepseek-api")

    if model_choice == "deepseek-api":
        response = call_deepseek(full_prompt)
    elif model_choice == "deepseek-r1:8b":
        response = call_ollama(full_prompt, "deepseek-r1:8b")
    elif model_choice == "o3-mini":
        response = call_ollama(full_prompt, "o3-mini")
    else:
        return {"error": "Invalid model choice. Choose 'deepseek-api', 'deepseek-r1:8b', or 'o3-mini'."}

    # Store conversation history
    chat_memory.add_entry(user_input, response)

    return {"response": response, "timestamp": datetime.now().isoformat()}

7. How to Use the API (Requests & Examples)

Send a request with model selection (deepseek-api, deepseek-r1:8b, o3-mini).

Use DeepSeek API (Cloud-Based)

curl -X POST "http://localhost:8000/chat" -d '{"prompt": "Explain recursion in Python.", "model": "deepseek-api"}'

Use Ollama (DeepSeek-r1:8B – Self-Hosted)

curl -X POST "http://localhost:8000/chat" -d '{"prompt": "Explain recursion in Python.", "model": "deepseek-r1:8b"}'

8. The Big Picture: Why This Matters?

Instead of paying monthly for ChatGPT or Copilot, you can now:

  • 🚀 Own your AI assistant instead of depending on external providers.
  • 🛡️ Ensure privacy by keeping your code local.
  • 💰 Reduce costs to nearly zero (Ollama) or get a much cheaper API alternative (DeepSeek).
  • Optimize performance by switching between cloud and local models based on workload.

9. Final Thoughts & Next Steps

✔️ What We Built

In this guide, we:

  • Deployed a FastAPI-based AI coding assistant
  • Integrated both self-hosted and cloud-based AI models
  • Added authentication, memory, and rate limiting
  • Enabled dynamic model selection

💡 What’s Next?

  • 🔥 Integrate with VS Code – Use Language Server Protocol (LSP) for inline coding assistance.
  • Live Execution Support – Let the AI run & test code before sending responses.
  • 📈 Fine-Tune for Specific Use Cases – Train DeepSeek models on your own datasets.

🚀 Your Turn: Build, Experiment & Customize! 🚀


Reference

For more details on the technologies used, check out these resources:


Leave a Reply

Your email address will not be published. Required fields are marked *

y