AI & Data Infrastructure Teams

    RAG Record Manager

    Automatically monitors Google Drive, detects new or updated files, re-generates embeddings, and syncs them to Supabase for a clean, always-updated AI knowledge base.

    Problem Statement

    Maintaining an AI knowledge base manually is messy. Businesses struggle with outdated documents, duplicate records, missing or stale embeddings, no version tracking, and no automated sync between Google Drive and vector databases. Any RAG system becomes unreliable if the underlying documents aren’t kept fresh and consistent. The client needed a self-maintaining knowledge base that updates automatically whenever a document is added, edited, or replaced.

    The Solution

    Automation Overview

    We built a workflow that continuously monitors Google Drive and manages the entire lifecycle of documents used in a RAG system. It ensures only new or updated files are embedded, old vectors are deleted cleanly, fresh embeddings are created, and Supabase always holds the latest version — making the RAG knowledge base trustworthy at all times.

    Google Drive Watcher

    The workflow checks a specific Drive folder (or multiple folders) to identify new, modified, or deleted files. This turns Drive into a dynamic data source for your RAG system.

    Crypto Hash-Based Change Detection

    Each document’s content is hashed using a crypto node. This hash is compared with the previously stored hash to detect file content changes, even if the filename is the same. If the hash changes, the workflow triggers a full re-embedding.

    Vector Cleanup Before Re-Embedding

    Before adding any new vectors, the system deletes old embeddings of that file, clears outdated chunks, and removes duplicate entries. This ensures the vector DB stays clean—no bloated indexes or conflicting versions.

    Intelligent Document Chunking

    To prepare text for embedding, the workflow extracts text from the file, splits content into optimal-size chunks, removes noisy content, and keeps semantic structure intact.

    Google Gemini Embedding Generation

    Each chunk is sent to Google Gemini to generate embeddings that capture meaning, semantic relationships, and context.

    Supabase Vector Storage

    The workflow uploads all embeddings into Supabase with file name, chunk text, chunk index, hash value, and timestamp. This makes it easy to search, retrieve, or rerun embeddings for any document.

    Clean Sync & Metadata Logging

    Each run logs which files changed, which were re-embedded, chunk counts, and any errors. This creates a traceable update history.

    Integrations & Connected Systems

    Google Drive – Source of documents; Google Gemini – Embedding generator; Supabase Vector DB – Vector storage; n8n – Document watcher, hashing, embedding pipeline; Crypto Node – Content hashing.

    Smart Logic & Reliability

    • Hashing ensures only changed files are processed
    • Duplicate embeddings prevented automatically
    • Scalable logic supports hundreds of documents
    • Handles PDFs, text files, Word docs, and more
    • Safe retries for transient errors

    Before

    Manually tracking files in Google Drive, forgetting updates, outdated embeddings, unreliable RAG results.

    After

    Upload or update a file → embeddings are refreshed automatically in Supabase.

    Tools Used

    n8n
    Crypto hashing
    Google Gemini
    Supabase
    Document text extraction tools

    Our Process

    1

    Discover

    Understood gaps in the client’s knowledge base lifecycle.

    2

    Design

    Created a Drive-watcher → hash → embed → sync workflow.

    3

    Build

    Implemented chunking, re-embedding, and Supabase sync logic.

    4

    Integrate

    Ensured clean vector database maintenance.

    5

    Deploy

    Tuned hashing and chunk sizes for performance.

    Business Impact

    Fully automated knowledge base maintenance

    No outdated or duplicated embeddings

    Stronger RAG accuracy and reliability

    Faster responses for end-users

    Scalable document ingestion

    Zero manual effort required

    "The RAG Record Manager turns Google Drive into a self-maintaining data source for your vector database. It monitors updates, re-generates embeddings, cleans old vectors, and syncs new ones — keeping your knowledge base fresh and trustworthy automatically."

    Want a system like this for your business?

    Let’s build it.