.latch and .latchdoc — The PDF equivalent for AI intelligence

Direct & Aggressive

The end of RAG. Document memory at the model level.

Performance-Led

Kill the context window. 40× faster document intelligence.

Economic-Led

Stop re-reading. Compile once, query forever for 97% less.

Clarity-Led

LATCH: document memory without the RAG artifacts.

Portability-Led

Compile once. Ship the .latchdoc file. Query anywhere in 1.6ms.

// I couldn't pick one tagline — because for most enterprise workloads, all five are true. LATCH is a fundamental shift in how models remember documents. Not RAG. Not prompt compression. Something new.

Self-Hosted License — $79 → Read Documentation

Self-hosted by default. Private by default. LATCH is an inference engine for large private document sets running on your own infrastructure. It is not a hosted chatbot, and your source documents do not need to leave your environment.

Seeing is believing

Same model. Same query. Same documents.

Qwen 2.5 14B — Standard Baseline

23.10s

Illustrative Time to First Token

● Baseline capture

Standard Qwen 2.5 14B hits the memory wall quickly on this corpus. On an A100 or H100 80GB, only a small subset of the document set fits before VRAM is exhausted, so the baseline cannot sustain the same multi-document workload that LATCH compiles and serves on the same class of GPU.

Qwen 2.5 14B — LATCH Compiled

0.11s

Time to First Token

● Live capture

Live LATCH room capture showing compiled cross-document inference. This is the single public demo video on the page for now.

Benchmarked on DOJ antitrust brief, SEC 10-K, credit agreement, commercial lease, and NIST AI RMF on NVIDIA H100 80GB.
Standard Qwen 2.5 14B cannot hold this document load in-memory on an 80GB A100/H100 without running out of VRAM. LATCH changes the memory path so the same GPU class can answer against the compiled corpus directly.

Hard numbers

Built to be measured, not marketed.

Every claim below was benchmarked on real enterprise documents on NVIDIA H100 80GB with vLLM serving infrastructure.

0.11s

TTFT

vs 23.1s baseline cold start

210×

Faster Cold Start

Time-to-first-token speedup

1.6ms

Cache Reload

From .latch file on disk

91.7%

Multi-Doc Pass

11/12 benchmark gates

97%

Cost Reduction

Amortized after 25 queries

50%

Less VRAM

More instances per node

Model Families

Qwen · Mistral · Llama · DeepSeek

5.2×

End-to-End Speedup

Full query cycle improvement

How it works

Compile once. Query indefinitely.

LATCH intercepts the standard inference path and replaces per-query document processing with a persistent representation. The economics compound with every additional query.

Compile

Documents processed through proprietary compilation. Persistent representation saved to disk as .latch or .latchdoc.

Query

Each query runs against compiled memory. Sub-200ms response. No raw document re-processing. Ever.

Amortize

Compilation cost paid once. Shared across teams, workflows, and time. More queries = lower unit cost.

Portable document memory

Stop indexing. Start shipping .latch files.

Compilation is now a tangible, portable asset. Save it. Transfer it. Load it in 1.6ms. No computation. No extra cost.No resources spent repeatedly converting native files into text.

See Use Cases & FAQ.

.latch

Privacy-First Variant

A lightweight binary of your compiled corpus. Contains only the model-level memory — no source text included.

Compiled document intelligence
1.6ms reload from disk
Share analysis without exposing source docs
Smallest possible file size

.latchdoc

Full Intelligence Package

Everything in .latch, plus embedded raw text for full-text search and automatic quality fallback. The smart default.

Everything in .latch
Ctrl+F / needle-in-haystack search
Automatic fallback for edge-case queries
Negligible size overhead vs .latch

1 Compile on your H100 → 2 Save as .latchdoc → 3 Ship to your team → 4 They query in 0.11s — zero recompute

Quickstart

Docker API and integrated Console flow example

Start the Docker container, then either call the API directly or open the LATCH Console from your own machine.

Terminal 1 · API Flow

Start LATCH, verify readiness, upload a document, and send a query over the standard local API surface.

$ docker run --gpus all -p 8091:8091 codynamics/latch:latest
[latch] runtime starting on http://0.0.0.0:8091
[latch] status=loading profile=cdlac_latch_qwen14b_locked_20260317
[latch] warmup complete status=ready

$ curl -s http://127.0.0.1:8091/health | jq '.status, .default_memory_tokens'
"ready"
1024

$ curl -s http://127.0.0.1:8091/compile_file \
  -H 'Content-Type: application/json' \
  -d '{"filename":"acme-10k.pdf","content_base64":"<base64>"}'
{ "doc_id":"doc_6f3a59b3f8", "status":"ready", "tokens":95030 }

$ curl -s http://127.0.0.1:8091/query \
  -H 'Content-Type: application/json' \
  -d '{"query":"Summarize the company in 3 bullets.","doc_ids":["doc_6f3a59b3f8"]}'
{ "results":[{ "answer":"Acme provides cloud learning software..." }] }

Terminal 2 · Console Flow

Use the same Docker runtime, then open the local LATCH Console in a browser to upload files, query, inspect telemetry, and manage workspaces.

$ docker run --gpus all -p 8091:8091 codynamics/latch:latest
[latch] runtime starting on http://0.0.0.0:8091
[latch] UI available at http://127.0.0.1:8091/

$ curl -s http://127.0.0.1:8091/health | jq '.ready, .service_rev'
true
"latch_product_nomount_20260325"

$ python3 -m webbrowser http://127.0.0.1:8091/
Opening CDLaC-LATCH Console...

# In the console UI:
1. Upload PDFs, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, JSON, or XML
2. Adjust compile/query controls and inspect runtime defaults from /health
3. Run prompts, inspect telemetry, and save or load workspaces

Licensing

Start with evaluation. Upgrade when it becomes a business.

The public checkout is for evaluation and individual operators. Commercial deployment and OEM redistribution are available, but handled directly so terms and support match the actual use case.

Tier 1

Evaluation / Personal

$79

For one natural person evaluating or personally operating the self-hosted LATCH runtime on their own infrastructure.

Scope: personal or evaluation use only.
Activations: up to 3 activations tied to one user.
Not included: third-party production deployment, SaaS resale, OEM embedding, or redistribution.
Terms: purchase constitutes acceptance of the LATCH EULA.

Buy Evaluation License → Read License

Tier 2

Commercial Deployment

For companies running LATCH themselves as an internal tool, private workflow system, or SaaS they operate for their own customers.

Scope: internal deployment or company-operated production service.
Delivery: annual commercial license with direct support and updates.
Restriction: no redistribution of the Docker image or runtime to third parties.
Fit: teams using LATCH as infrastructure, not as shrink-wrapped software.

Discuss Commercial Use

Tier 3

Enterprise / OEM

Custom

For companies that want to embed LATCH into their own product, redistribute it to customers, or structure deployment rights around multiple downstream instances.

Scope: OEM, embedded delivery, or multi-customer deployment rights.
Structure: custom commercial agreement tied to deployment model.
Support: negotiated support, entitlement scope, and operational terms.
Fit: software vendors and enterprise distribution partners.

Discuss Enterprise Terms

LATCH is under active development.

Planned capabilities include expanded model support, managed deployment options, and additional document-intelligence features. Commercial customers receive updates throughout the licensed term, and the self-hosted runtime will continue to improve as the operator path hardens.

LATCH occupies a new category

Not an optimization. A replacement.

Every alternative re-reads, re-chunks, or re-embeds on every query. LATCH doesn't.

Capability	Full-Context	RAG	KV Cache	LATCH
Compile once, reuse forever	✗	✗	✗	✓
Cross-document reasoning	✓	Limited	✓	✓
Sub-200ms TTFT	✗	✗	Partial	✓
Cost amortization over queries	✗	✗	Limited	✓
Persistent on disk	✗	Embeddings only	Session-bound	✓
Model-agnostic	✓	✓	Per-model	✓
No chunking artifacts	✓	✗	✓	✓
Portable binary format	✗	✗	✗	✓ .latch/.latchdoc

See the full LATCH vs RAG comparison →

Get started

Your documents. Your infrastructure.
Sub-200ms answers.

Self-Hosted

Run on your own GPU

$79

One-time license for a self-hosted, privacy-first inference engine. Docker image, license key, and one-line deploy. Your documents stay on your infrastructure. OpenAI-format API compatible.

Buy on Gumroad →

Purchase of the $79 self-hosted license is for evaluation/personal use and constitutes acceptance of the LATCH EULA.

Initial image pull: the first download is over 40GB and typically takes about 20 minutes, depending on your link and host.

First startup: after `docker run`, expect roughly 7 to 8 minutes for model warmup before the console reports ready.

Direct support: this is built and supported by one developer. I genuinely appreciate the business and I respond to emails as quickly as I can.

Early Adopter Pricing & Update Policy

Free updates included. Your license covers all runtime updates, bug fixes, and feature additions through the current major version. We expect to ship minor updates regularly throughout the year, and your `docker compose pull` or image pull path will always pick up the latest v1 runtime.

Next major release: when LATCH v2.0 ships, it will be a new purchase. Existing v1 customers receive 50% off automatically and we will reach out directly.

First 100 customers: the first 100 license holders receive v2 at no charge. Current license count is not displayed publicly; email mike@codynamicslab.com if you want your position confirmed.

Hosted Access

Managed GPU runtime

Coming Soon

For teams that want LATCH on hosted infrastructure without owning the GPU room themselves. Same product direction, managed delivery.

Notify Me

We are keeping the first release self-hosted. Managed hosted access will follow after the operator and licensing path hardens.

Investor Portal

Open the private materials

Restricted Access

Use your investor code to open the private pitch portal. The public landing page still does not link the console directly.

Invalid access code.

Existing investor codes and portal destinations are preserved from the prior landing page.

LATCH — Compiled Document Memory for LLMs