CoDynamics Lab Corporation — Proprietary Inference Layer
Direct & Aggressive
The end of RAG. Document memory at the model level.
Performance-Led
Kill the context window. 40× faster document intelligence.
Economic-Led
Stop re-reading. Compile once, query forever for 97% less.
Clarity-Led
LATCH: document memory without the RAG artifacts.
Portability-Led
Compile once. Ship the .latchdoc file. Query anywhere in 1.6ms.
// I couldn't pick one tagline — because for most enterprise workloads, all five are true. LATCH is a fundamental shift in how models remember documents. Not RAG. Not prompt compression. Something new.
Self-hosted by default. Private by default. LATCH is an inference engine for large private document sets running on your own infrastructure. It is not a hosted chatbot, and your source documents do not need to leave your environment.
Seeing is believing
Same model. Same query. Same documents.
Qwen 2.5 14B — Standard Baseline
23.10s
Illustrative Time to First Token
● Baseline capture
Standard Qwen 2.5 14B hits the memory wall quickly on this corpus. On an A100 or H100 80GB, only a small subset of the document set fits before VRAM is exhausted, so the baseline cannot sustain the same multi-document workload that LATCH compiles and serves on the same class of GPU.
Qwen 2.5 14B — LATCH Compiled
0.11s
Time to First Token
● Live capture
Live LATCH room capture showing compiled cross-document inference. This is the single public demo video on the page for now.

Benchmarked on DOJ antitrust brief, SEC 10-K, credit agreement, commercial lease, and NIST AI RMF on NVIDIA H100 80GB.
Standard Qwen 2.5 14B cannot hold this document load in-memory on an 80GB A100/H100 without running out of VRAM. LATCH changes the memory path so the same GPU class can answer against the compiled corpus directly.

Hard numbers

Built to be measured, not marketed.

Every claim below was benchmarked on real enterprise documents on NVIDIA H100 80GB with vLLM serving infrastructure.

0.11s
TTFT
vs 23.1s baseline cold start
210×
Faster Cold Start
Time-to-first-token speedup
1.6ms
Cache Reload
From .latch file on disk
91.7%
Multi-Doc Pass
11/12 benchmark gates
97%
Cost Reduction
Amortized after 25 queries
50%
Less VRAM
More instances per node
4
Model Families
Qwen · Mistral · Llama · DeepSeek
5.2×
End-to-End Speedup
Full query cycle improvement
How it works

Compile once. Query indefinitely.

LATCH intercepts the standard inference path and replaces per-query document processing with a persistent representation. The economics compound with every additional query.

01
Compile
Documents processed through proprietary compilation. Persistent representation saved to disk as .latch or .latchdoc.
02
Query
Each query runs against compiled memory. Sub-200ms response. No raw document re-processing. Ever.
03
Amortize
Compilation cost paid once. Shared across teams, workflows, and time. More queries = lower unit cost.
Portable document memory

Stop indexing. Start shipping .latch files.

Compilation is now a tangible, portable asset. Save it. Transfer it. Load it in 1.6ms. No re-computation. No extra cost.

.latch
Privacy-First Variant
A lightweight binary of your compiled corpus. Contains only the model-level memory — no source text included.
  • Compiled document intelligence
  • 1.6ms reload from disk
  • Share analysis without exposing source docs
  • Smallest possible file size
.latchdoc
Full Intelligence Package
Everything in .latch, plus embedded raw text for full-text search and automatic quality fallback. The smart default.
  • Everything in .latch
  • Ctrl+F / needle-in-haystack search
  • Automatic fallback for edge-case queries
  • Negligible size overhead vs .latch
1 Compile on your H100 2 Save as .latchdoc 3 Ship to your team 4 They query in 0.11s — zero recompute
Quickstart

Show the operator path, not just the benchmark.

Not everyone knows how to provision a GPU room correctly on day one. The product page should still make the operating model legible: start the Docker container, then either call the API directly or open the LATCH Console from your own machine.

Terminal 1 · API Flow
Start LATCH, verify readiness, upload a document, and send a query over the standard local API surface.
$ docker run --gpus all -p 8091:8091 codynamics/latch:latest
[latch] runtime starting on http://0.0.0.0:8091
[latch] status=loading profile=cdlac_latch_qwen14b_locked_20260317
[latch] warmup complete status=ready

$ curl -s http://127.0.0.1:8091/health | jq '.status, .default_memory_tokens'
"ready"
1024

$ curl -s http://127.0.0.1:8091/compile_file \
  -H 'Content-Type: application/json' \
  -d '{"filename":"acme-10k.pdf","content_base64":"<base64>"}'
{ "doc_id":"doc_6f3a59b3f8", "status":"ready", "tokens":95030 }

$ curl -s http://127.0.0.1:8091/query \
  -H 'Content-Type: application/json' \
  -d '{"query":"Summarize the company in 3 bullets.","doc_ids":["doc_6f3a59b3f8"]}'
{ "results":[{ "answer":"Acme provides cloud learning software..." }] }
Terminal 2 · Console Flow
Use the same Docker runtime, then open the local LATCH Console in a browser to upload files, query, inspect telemetry, and manage workspaces.
$ docker run --gpus all -p 8091:8091 codynamics/latch:latest
[latch] runtime starting on http://0.0.0.0:8091
[latch] UI available at http://127.0.0.1:8091/

$ curl -s http://127.0.0.1:8091/health | jq '.ready, .service_rev'
true
"latch_product_nomount_20260325"

$ python3 -m webbrowser http://127.0.0.1:8091/
Opening CDLaC-LATCH Console...

# In the console UI:
1. Upload PDFs, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, JSON, or XML
2. Adjust compile/query controls and inspect runtime defaults from /health
3. Run prompts, inspect telemetry, and save or load workspaces
LATCH occupies a new category

Not an optimization. A replacement.

Every alternative re-reads, re-chunks, or re-embeds on every query. LATCH doesn't.

Capability Full-Context RAG KV Cache LATCH
Compile once, reuse forever
Cross-document reasoningLimited
Sub-200ms TTFTPartial
Cost amortization over queriesLimited
Persistent on diskEmbeddings onlySession-bound
Model-agnosticPer-model
No chunking artifacts
Portable binary format✓ .latch/.latchdoc
Get started

Your documents. Your infrastructure.
Sub-200ms answers.

Self-Hosted
Run on your own GPU
$79
One-time license for a self-hosted, privacy-first inference engine. Docker image, license key, and one-line deploy. Your documents stay on your infrastructure. OpenAI-format API compatible.
Buy on Gumroad →
Initial image pull: the first download is over 40GB and typically takes about 20 minutes, depending on your link and host.
First startup: after `docker run`, expect roughly 7 to 8 minutes for model warmup before the console reports ready.
Direct support: this is built and supported by one developer. I genuinely appreciate the business and I respond to emails as quickly as I can.

Early Adopter Pricing & Update Policy

Free updates included. Your license covers all runtime updates, bug fixes, and feature additions through the current major version. We expect to ship minor updates regularly throughout the year, and your `docker compose pull` or image pull path will always pick up the latest v1 runtime.

Next major release: when LATCH v2.0 ships, it will be a new purchase. Existing v1 customers receive 50% off automatically and we will reach out directly.

First 100 customers: the first 100 license holders receive v2 at no charge. Current license count is not displayed publicly; email mike@codynamicslab.com if you want your position confirmed.

Hosted Access
Managed GPU runtime
Coming Soon
For teams that want LATCH on hosted infrastructure without owning the GPU room themselves. Same product direction, managed delivery.
Notify Me
We are keeping the first release self-hosted. Managed hosted access will follow after the operator and licensing path hardens.
Investor Portal
Open the private materials
Restricted Access
Use your investor code to open the private pitch portal. The public landing page still does not link the console directly.

Invalid access code.

Existing investor codes and portal destinations are preserved from the prior landing page.