The `tabular-review` skill: the firm-owned, open-model replacement for the Harvey/Legora "review grid". Documents are rows, fields are columns, and every cell is an independent, source-cited extraction (short `value` + click-through `reason`, one verbatim `quote`, and the cited PDF `page`). The orchestrating agent never reads the documents itself — it resolves the row set, resolves the columns at the branch point (user-specified, a loaded `doctype-<type>` skill, or ask-the-user with a suggested starter set), then fans out one `document-extractor` subagent per file in parallel, collects each row's JSON (never dropping a file silently), and runs the builder to emit a single self-contained HTML artifact the firm keeps. When building or running multi-document review/diligence grids, or extending the doctype-skill column conventions.
Tabular review: rows × columns, fanned out
Click any node to read its source — a doc section or the code itself.
name: tabular-review description: >- Firm-owned tabular document review — the open-model replacement for Harvey/Legora "review grids". Extract a defined set of fields (columns) across a set of documents (rows) and produce an interactive, source-cited review table. Use whenever the user wants to review/compare/extract across MANY documents at once: due-diligence pulls, NDA/contract abstraction, lease abstraction, "make a table of X across these files", "review these contracts for Y", "build a review grid", or "extract these terms from every document".
Tabular Review
This skill is how this firm runs tabular document review — the workflow Harvey and Legora call a "review grid": documents are rows, the fields you care about are columns, and every cell is an independent, source-cited extraction. Unlike the SaaS versions, this runs on the firm's own models and infrastructure, the column logic lives in firm-owned doctype skills, and the output is a self-contained artifact the firm keeps.
You are the orchestrator. You do not read the documents yourself. You define the
grid, fan out one document-extractor subagent per document, then assemble the results
into one HTML artifact.
The shape of the job
Parties Term Governing law Assignment ... ← columns (fields)
acme_nda.pdf [cell] [cell] [cell] [cell]
beta_msa.pdf [cell] [cell] [cell] [cell] ← rows (documents)
...
Each cell = a short value (what shows in the grid) plus, behind a click, a longer
reason, one verbatim quote sentence, and the cited PDF page rendered with that
sentence highlighted. Every value is grounded in a quote from that document or it is
"Not found". No hallucinated cells.
Workflow
1. Resolve the document set (rows)
Find the files to review. They may be attached, referenced by @path, named in the
prompt, or sitting in a folder ("review the NDAs in ./ndas"). Use glob/list to
expand folders. Confirm the list with the user if it's ambiguous or large (>~20).
Sniff the document type of each file (from filename and, if cheap, a first-page peek). You'll use this both to pick columns and to tell each extractor what it's looking at. A set can be mixed (some NDAs, some leases) — that's fine; group by type.
2. Resolve the columns (fields) — THIS IS THE BRANCH POINT
Columns can come from three places, in priority order:
-
The user already specified them. ("Extract party names, term, and governing law.") Use those verbatim; only add a column if you ask first.
-
A loaded doctype skill. Look for a skill named
doctype-<type>(e.g.doctype-nda,doctype-commercial-lease). If one matches the documents, load it with theskilltool and use its recommended columns as the default set. List available skills first if unsure what exists. -
Neither → ASK THE USER. Do not invent a column set silently. Detect the doc types, then ask what to extract and propose a starter set based on those types, using the suggestion library below. Make it a one-tap decision: offer the suggested columns and let them add/remove. Example:
These look like mutual NDAs. What should I pull into the review table? A common starting set for NDAs: Parties · Effective date · Term · Purpose · Definition of Confidential Information · Exclusions · Permitted disclosures · Return/destruction · Governing law · Term of confidentiality. Want this set, a subset, or your own columns?
If the documents are mixed types, suggest the union and note which columns apply to which type.
Normalize the final columns into objects you'll pass down and render:
{ key, label, question, hint? } — key is a short slug (governing_law), label is
the header (Governing law), question is the precise instruction the extractor
answers, hint is optional (format/where to look).
3. Fan out — one subagent per document, in parallel
For each document, spawn the document-extractor subagent with the Task tool.
Issue the calls in parallel (multiple Task calls in a single turn) so the grid fills
concurrently — this is the whole point of the fan-out.
Default granularity is one extractor per file: it reads the document once and fills
that file's entire row across all columns. (This is cheaper and more consistent than one
subagent per cell, because the document is read a single time.) Escalate to a
per-cell extractor only for a column that is high-stakes and came back low
confidence or conflicted — re-run just that (file, column) pair with a focused prompt.
Give each extractor exactly this:
FILE: <path to the one document>
DOC_TYPE: <NDA | Commercial Lease | ... | unknown>
COLUMNS:
1. key: parties
question: Who are the parties to this agreement (full legal names)?
hint: Check the preamble and signature block.
2. key: governing_law
question: Which jurisdiction's law governs?
... (every column, with key + question, hints where useful)
Return ONLY the strict JSON object defined in your instructions.
Each extractor returns a JSON object: { file, title, docType, summary, cells: [ {key, value, reason, quote, page, location, confidence} ] }. Remember: value is short (it's
a table cell), reason is the longer sidebar explanation, quote is one verbatim
sentence, and page is the 1-based page that sentence is on.
4. Collect and assemble the data file
Parse each extractor's JSON (read the last ```json block). If one fails to parse or the
subagent errored, keep the row with that file's cells set to value: "Error",
confidence: "low" — never drop a document silently; the grid must account for every
file. Convert each extractor's cells array into a map keyed by key.
Write a data file <matter-slug>-review.data.json in the workspace with this shape.
Give each row BOTH paths so the viewer works in the app and when opened from disk:
file— the workspace-relative path (e.g.ndas/acme.pdf). The in-app viewer asks the app for this file over the bridge, so it must be workspace-relative.fileAbs— the absolute path on disk (e.g./Users/.../ndas/acme.pdf). Used for the "open the PDF at this page" link when the saved.htmlis opened standalone.
{
"matter": "<short matter/review name>",
"generatedAt": "<current ISO 8601 timestamp>",
"columns": [ { "key": "...", "label": "...", "question": "..." } ],
"rows": [
{
"file": "ndas/acme.pdf", "fileAbs": "/abs/path/to/ndas/acme.pdf",
"title": "...", "docType": "...", "summary": "...",
"cells": {
"<key>": { "value": "<short>", "reason": "<longer>", "quote": "<one sentence>", "page": 3, "location": "§7.2", "confidence": "high" }
}
}
]
}
5. Build the artifact (run the builder — do NOT hand-write the HTML)
Run the bundled builder. It injects a pdf.js viewer + the Eigenwelt theme + your JSON and
writes a .html artifact. No PDF is embedded — the artifact loads each source PDF
dynamically from its local file at view time and highlights the quote string on the
cited page. So the artifact is a small, fixed size no matter how many or how large the
source documents are.
node .opencode/skills/tabular-review/assets/build-review.mjs "<matter-slug>-review.data.json" --out "<matter-slug>-review.html"
(If your CWD is the skill dir, adjust the path; the builder also accepts --template and
--vendor overrides.) The builder is deterministic — you just produce good JSON.
Writing the .html makes the app surface it automatically as a previewable HTML artifact
(sandboxed iframe, scripts enabled): the grid shows the short value per cell; clicking a
cell opens a liquid-glass sidebar with the reason, the verbatim quote, and a pdf.js
viewer that opens the source PDF to the cited page with the sentence highlighted (with
page navigation). Filter and CSV export are built in.
How the viewer reads the local PDF (no embedding):
- In the app: the viewer asks the app for the file bytes over a postMessage bridge
using
row.file(workspace-relative), and renders + highlights inline. - Opened standalone from disk (
file://): browsers block pages from auto-loading local files, so the viewer shows an Open page N in the PDF link (built fromrow.fileAbs) that opens it in the native viewer, plus a file picker to load it into the highlighting viewer. This is whyfileAbsmatters — without it that link is dead. - No poppler or other system tools are required — pdf.js renders in the browser.
6. Summarize in chat
After building the artifact, give a short readout: how many documents × columns, the name
of the artifact file, and — most useful to a lawyer — the exceptions: cells that came
back low confidence, conflicts, "Not found" where you'd expect a value, and any
reason worth surfacing (auto-renewals, unusual carve-outs, missing signatures). The
table is for scanning; your summary is for triage.
Column suggestion library (used in step 2.3 when no doctype skill is loaded)
Starter columns by document type. Offer these as the proposed set, then let the user
edit. Prefer a loaded doctype-* skill over this list when one exists.
- NDA / confidentiality agreement — Parties · Mutual or one-way · Effective date · Term · Purpose · Definition of Confidential Information · Exclusions · Permitted disclosures · Return/destruction · Term of confidentiality · Governing law · Injunctive relief.
- Services / MSA / SOW — Parties · Effective date · Term & renewal · Services/scope · Fees & payment terms · Termination rights · Liability cap · Indemnification · IP ownership · Warranties · Governing law.
- Employment agreement — Employee · Employer · Start date · Title/role · Compensation · At-will vs term · Non-compete · Non-solicit · Confidentiality · Severance · Governing law.
- Commercial lease — Landlord · Tenant · Premises · Commencement date · Term · Base rent · Escalations · Renewal options · Security deposit · Permitted use · Assignment/sublease · Maintenance (CAM) · Governing law.
- Purchase / M&A agreement (SPA/APA) — Buyer · Seller · Target/assets · Purchase price · Closing date · Conditions to closing · Reps & warranties survival · Indemnification cap/basket · Non-compete · Governing law.
- Loan / credit agreement — Borrower · Lender · Principal · Interest rate · Maturity · Repayment schedule · Collateral/security · Financial covenants · Events of default · Governing law.
- Unknown / mixed — Document type · Parties · Effective date · Term · Key obligations · Termination · Governing law · Notable risks. (Then refine with the user.)
The doctype-skill convention
A doctype skill is a normal skill named doctype-<type> whose job is to define the
review columns (and where to look) for one kind of document. When the firm reviews a new
document type often, capture its column logic as a doctype-* skill so this orchestrator
can load it automatically instead of asking every time. Each one should provide a
## Columns section: a list of key, label, question, and a where to look hint.
Example packs (doctype-nda, doctype-commercial-lease, …) live in the LegalWork
Hub — a firm installs the ones it needs from Settings → Extensions → Skills. If none
is installed, the suggestion library above is the fallback.
Notes & guardrails
- Open models, firm-owned. Don't hardcode a model — extractors inherit the firm's configured model. The value here is that the column logic and corrections stay in the firm's skills and artifacts.
- Never fabricate a cell. A blank, source-cited grid beats a confident wrong one. This is the one bar that matters; everything else is convenience.
- Account for every file. Each input document is exactly one row, even on error.
- Scale check. Many files × parallel subagents is fine, but if the set is very large (say >30), confirm scope with the user and consider batching.