RAG Record Manager
Automatically monitors Google Drive, detects new or updated files, re-generates embeddings, and syncs them to Supabase for a clean, always-updated AI knowledge base.
Problem Statement
The Solution
Automation Overview
We built a workflow that continuously monitors Google Drive and manages the entire lifecycle of documents used in a RAG system. It ensures only new or updated files are embedded, old vectors are deleted cleanly, fresh embeddings are created, and Supabase always holds the latest version — making the RAG knowledge base trustworthy at all times.
Google Drive Watcher
The workflow checks a specific Drive folder (or multiple folders) to identify new, modified, or deleted files. This turns Drive into a dynamic data source for your RAG system.
Crypto Hash-Based Change Detection
Each document’s content is hashed using a crypto node. This hash is compared with the previously stored hash to detect file content changes, even if the filename is the same. If the hash changes, the workflow triggers a full re-embedding.
Vector Cleanup Before Re-Embedding
Before adding any new vectors, the system deletes old embeddings of that file, clears outdated chunks, and removes duplicate entries. This ensures the vector DB stays clean—no bloated indexes or conflicting versions.
Intelligent Document Chunking
To prepare text for embedding, the workflow extracts text from the file, splits content into optimal-size chunks, removes noisy content, and keeps semantic structure intact.
Google Gemini Embedding Generation
Each chunk is sent to Google Gemini to generate embeddings that capture meaning, semantic relationships, and context.
Supabase Vector Storage
The workflow uploads all embeddings into Supabase with file name, chunk text, chunk index, hash value, and timestamp. This makes it easy to search, retrieve, or rerun embeddings for any document.
Clean Sync & Metadata Logging
Each run logs which files changed, which were re-embedded, chunk counts, and any errors. This creates a traceable update history.
Integrations & Connected Systems
Google Drive – Source of documents; Google Gemini – Embedding generator; Supabase Vector DB – Vector storage; n8n – Document watcher, hashing, embedding pipeline; Crypto Node – Content hashing.
Smart Logic & Reliability
- Hashing ensures only changed files are processed
- Duplicate embeddings prevented automatically
- Scalable logic supports hundreds of documents
- Handles PDFs, text files, Word docs, and more
- Safe retries for transient errors
Before
Manually tracking files in Google Drive, forgetting updates, outdated embeddings, unreliable RAG results.
After
Upload or update a file → embeddings are refreshed automatically in Supabase.
Tools Used
Our Process
Discover
Understood gaps in the client’s knowledge base lifecycle.
Design
Created a Drive-watcher → hash → embed → sync workflow.
Build
Implemented chunking, re-embedding, and Supabase sync logic.
Integrate
Ensured clean vector database maintenance.
Deploy
Tuned hashing and chunk sizes for performance.
Business Impact
Fully automated knowledge base maintenance
No outdated or duplicated embeddings
Stronger RAG accuracy and reliability
Faster responses for end-users
Scalable document ingestion
Zero manual effort required
"The RAG Record Manager turns Google Drive into a self-maintaining data source for your vector database. It monitors updates, re-generates embeddings, cleans old vectors, and syncs new ones — keeping your knowledge base fresh and trustworthy automatically."
Want a system like this for your business?
Let’s build it.