$ docker run --gpus all -p 8091:8091 codynamics/latch:latest [latch] runtime starting on http://0.0.0.0:8091 [latch] status=loading profile=cdlac_latch_qwen14b_locked_20260317 [latch] warmup complete status=ready $ curl -s http://127.0.0.1:8091/health | jq '.status, .default_memory_tokens' "ready" 1024 $ curl -s http://127.0.0.1:8091/compile_file \ -H 'Content-Type: application/json' \ -d '{"filename":"acme-10k.pdf","content_base64":"<base64>"}' { "doc_id":"doc_6f3a59b3f8", "status":"ready", "tokens":95030 } $ curl -s http://127.0.0.1:8091/query \ -H 'Content-Type: application/json' \ -d '{"query":"Summarize the company in 3 bullets.","doc_ids":["doc_6f3a59b3f8"]}' { "results":[{ "answer":"Acme provides cloud learning software..." }] }
LATCH — Compiled Document Memory for LLMs
Benchmarked on DOJ antitrust brief, SEC 10-K, credit agreement, commercial lease, and NIST AI RMF on NVIDIA H100 80GB.
Standard Qwen 2.5 14B cannot hold this document load in-memory on an 80GB A100/H100 without running out of VRAM. LATCH changes the memory path so the same GPU class can answer against the compiled corpus directly.
Built to be measured, not marketed.
Every claim below was benchmarked on real enterprise documents on NVIDIA H100 80GB with vLLM serving infrastructure.
Compile once. Query indefinitely.
LATCH intercepts the standard inference path and replaces per-query document processing with a persistent representation. The economics compound with every additional query.
Stop indexing. Start shipping .latch files.
Compilation is now a tangible, portable asset. Save it. Transfer it. Load it in 1.6ms. No computation. No extra cost.No resources spent repeatedly converting native files into text.
- Compiled document intelligence
- 1.6ms reload from disk
- Share analysis without exposing source docs
- Smallest possible file size
- Everything in .latch
- Ctrl+F / needle-in-haystack search
- Automatic fallback for edge-case queries
- Negligible size overhead vs .latch
Docker API and integrated Console flow example
Start the Docker container, then either call the API directly or open the LATCH Console from your own machine.
$ docker run --gpus all -p 8091:8091 codynamics/latch:latest [latch] runtime starting on http://0.0.0.0:8091 [latch] UI available at http://127.0.0.1:8091/ $ curl -s http://127.0.0.1:8091/health | jq '.ready, .service_rev' true "latch_product_nomount_20260325" $ python3 -m webbrowser http://127.0.0.1:8091/ Opening CDLaC-LATCH Console... # In the console UI: 1. Upload PDFs, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, JSON, or XML 2. Adjust compile/query controls and inspect runtime defaults from /health 3. Run prompts, inspect telemetry, and save or load workspaces
Start with evaluation. Upgrade when it becomes a business.
The public checkout is for evaluation and individual operators. Commercial deployment and OEM redistribution are available, but handled directly so terms and support match the actual use case.
- Scope: personal or evaluation use only.
- Activations: up to 3 activations tied to one user.
- Not included: third-party production deployment, SaaS resale, OEM embedding, or redistribution.
- Terms: purchase constitutes acceptance of the LATCH EULA.
- Scope: internal deployment or company-operated production service.
- Delivery: annual commercial license with direct support and updates.
- Restriction: no redistribution of the Docker image or runtime to third parties.
- Fit: teams using LATCH as infrastructure, not as shrink-wrapped software.
- Scope: OEM, embedded delivery, or multi-customer deployment rights.
- Structure: custom commercial agreement tied to deployment model.
- Support: negotiated support, entitlement scope, and operational terms.
- Fit: software vendors and enterprise distribution partners.
LATCH is under active development.
Planned capabilities include expanded model support, managed deployment options, and additional document-intelligence features. Commercial customers receive updates throughout the licensed term, and the self-hosted runtime will continue to improve as the operator path hardens.
Not an optimization. A replacement.
Every alternative re-reads, re-chunks, or re-embeds on every query. LATCH doesn't.
| Capability | Full-Context | RAG | KV Cache | LATCH |
|---|---|---|---|---|
| Compile once, reuse forever | ✗ | ✗ | ✗ | ✓ |
| Cross-document reasoning | ✓ | Limited | ✓ | ✓ |
| Sub-200ms TTFT | ✗ | ✗ | Partial | ✓ |
| Cost amortization over queries | ✗ | ✗ | Limited | ✓ |
| Persistent on disk | ✗ | Embeddings only | Session-bound | ✓ |
| Model-agnostic | ✓ | ✓ | Per-model | ✓ |
| No chunking artifacts | ✓ | ✗ | ✓ | ✓ |
| Portable binary format | ✗ | ✗ | ✗ | ✓ .latch/.latchdoc |
Your documents. Your infrastructure.
Sub-200ms answers.
Early Adopter Pricing & Update Policy
Free updates included. Your license covers all runtime updates, bug fixes, and feature additions through the current major version. We expect to ship minor updates regularly throughout the year, and your `docker compose pull` or image pull path will always pick up the latest v1 runtime.
Next major release: when LATCH v2.0 ships, it will be a new purchase. Existing v1 customers receive 50% off automatically and we will reach out directly.
First 100 customers: the first 100 license holders receive v2 at no charge. Current license count is not displayed publicly; email mike@codynamicslab.com if you want your position confirmed.
Invalid access code.
