Open Source

AI-Native
File Database

cat + grep for any file format. Parse once, query forever. Built for LLM agents that need to read, search, and discover files without writing parsing scripts.

70%Less Tokens

30%Faster

100%Success Rate

20+File Formats

View on GitHub Get Started

Without YakDB

# Agent writes inline parsing script
# ~500 tokens, often fails
run_command("""python -c "
  import fitz
  doc = fitz.open('report.pdf')
  for page in doc:
      print(page.get_text())
" """)

500+ tokens per call — 79% success rate

With YakDB

# One call, always works
read("report.pdf")

# With options
read("report.pdf", pages="1-3")
search("quarterly revenue")
glob("**/*.xlsx")

~50 tokens per call — 100% success rate

3 Tools Replace Everything

Read. Search. Discover.

yakdb_read

Read any file — PDF, DOCX, XLSX, images, code. One call, structured output.

read("quarterly-report.pdf", pages="1-3")

yakdb_search

Full-text search across all indexed documents. Regex grep for code files.

search("revenue growth Q4")

yakdb_glob

Discover files with glob patterns. Instant metadata and type inference.

glob("**/*.xlsx")

Universal Parsing

Every Format, One API

PDF, Word, PowerPoint, Excel, CSV, images, code files — YakDB parses them all server-side so your agent doesn't have to.

PDFOCR + pages

DOCXHeadings & tables

PPTXSlides & notes

XLSXStructured JSON

CSVAuto-encoding

ImagesOCR + vision

CodeLine-numbered

MarkdownPlain text

Parse Once, Query Forever

Files are parsed and indexed on ingest. Every subsequent read is instant — no repeated parsing, no wasted tokens.

Dual-Mode Storage

SQLite for zero-config local use. PostgreSQL for shared team deployments. Same API, same tools.

Real-Time Directory Watch

Point YakDB at a folder and it auto-indexes new and modified files. No manual re-ingestion.

Open Source (AGPL)

Fully open source. Inspect, extend, self-host. Community contributions welcome.

Get Started in 30 Seconds

Three Commands

# Install

$ pip install yakdb[cli]

# Index a workspace

$ yakdb index ./my-documents

# Start MCP server for your agent

$ yakdb serve-mcp

Benchmarks

Tested Across 4 Models,
24 Document Tasks

Token Usage

100%27–45%

55–73% saved

Task Speed

100%36–58%

42–64% faster

Success Rate

79%100%

+21% tasks

Built for OpenYak.
Works Everywhere.

YakDB ships as OpenYak's native file layer — but it's a standalone tool. Use it with any MCP-compatible agent, via REST API, or as a Python library.

Download OpenYak YakDB Docs

AI-NativeFile Database