apps/server/resources/core-opencode/skills/tabular-review/SKILL.md

The `tabular-review` skill: the firm-owned, open-model replacement for the Harvey/Legora "review grid". Documents are rows, fields are columns, and every cell is an independent, source-cited extraction (short `value` + click-through `reason`, one verbatim `quote`, and the cited PDF `page`). The orchestrating agent never reads the documents itself — it resolves the row set, resolves the columns at the branch point (user-specified, a loaded `doctype-<type>` skill, or ask-the-user with a suggested starter set), then fans out one `document-extractor` subagent per file in parallel, collects each row's JSON (never dropping a file silently), and runs the builder to emit a single self-contained HTML artifact the firm keeps. When building or running multi-document review/diligence grids, or extending the doctype-skill column conventions.

Tabular review: rows × columns, fanned out

Click any node to read its source — a doc section or the code itself.

name: tabular-review description: >- Firm-owned tabular document review — the open-model replacement for Harvey/Legora "review grids". Extract a defined set of fields (columns) across a set of documents (rows) and produce an interactive, source-cited review table. Use whenever the user wants to review/compare/extract across MANY documents at once: due-diligence pulls, NDA/contract abstraction, lease abstraction, "make a table of X across these files", "review these contracts for Y", "build a review grid", or "extract these terms from every document".

Tabular Review

This skill is how this firm runs tabular document review — the workflow Harvey and Legora call a "review grid": documents are rows, the fields you care about are columns, and every cell is an independent, source-cited extraction. Unlike the SaaS versions, this runs on the firm's own models and infrastructure, the column logic lives in firm-owned doctype skills, and the output is a self-contained artifact the firm keeps.

You are the orchestrator. You do not read the documents yourself. You define the grid, fan out one document-extractor subagent per document, then assemble the results into one HTML artifact.

The shape of the job

                 Parties     Term      Governing law   Assignment ...   ← columns (fields)
  acme_nda.pdf   [cell]      [cell]    [cell]          [cell]
  beta_msa.pdf   [cell]      [cell]    [cell]          [cell]      ← rows (documents)
  ...

Each cell = a short value (what shows in the grid) plus, behind a click, a longer reason, one verbatim quote sentence, and the cited PDF page rendered with that sentence highlighted. Every value is grounded in a quote from that document or it is "Not found". No hallucinated cells.

Workflow

1. Resolve the document set (rows)

Find the files to review. They may be attached, referenced by @path, named in the prompt, or sitting in a folder ("review the NDAs in ./ndas"). Use glob/list to expand folders. Confirm the list with the user if it's ambiguous or large (>~20).

Sniff the document type of each file (from filename and, if cheap, a first-page peek). You'll use this both to pick columns and to tell each extractor what it's looking at. A set can be mixed (some NDAs, some leases) — that's fine; group by type.

2. Resolve the columns (fields) — THIS IS THE BRANCH POINT

Columns can come from three places, in priority order:

The user already specified them. ("Extract party names, term, and governing law.") Use those verbatim; only add a column if you ask first.
A loaded doctype skill. Look for a skill named doctype-<type> (e.g. doctype-nda, doctype-commercial-lease). If one matches the documents, load it with the skill tool and use its recommended columns as the default set. List available skills first if unsure what exists.
Neither → ASK THE USER. Do not invent a column set silently. Detect the doc types, then ask what to extract and propose a starter set based on those types, using the suggestion library below. Make it a one-tap decision: offer the suggested columns and let them add/remove. Example:

These look like mutual NDAs. What should I pull into the review table? A common starting set for NDAs: Parties · Effective date · Term · Purpose · Definition of Confidential Information · Exclusions · Permitted disclosures · Return/destruction · Governing law · Term of confidentiality. Want this set, a subset, or your own columns?

If the documents are mixed types, suggest the union and note which columns apply to which type.

Normalize the final columns into objects you'll pass down and render: { key, label, question, hint? } — key is a short slug (governing_law), label is the header (Governing law), question is the precise instruction the extractor answers, hint is optional (format/where to look).

3. Fan out — one subagent per document, in parallel

For each document, spawn the document-extractor subagent with the Task tool. Issue the calls in parallel (multiple Task calls in a single turn) so the grid fills concurrently — this is the whole point of the fan-out.

Default granularity is one extractor per file: it reads the document once and fills that file's entire row across all columns. (This is cheaper and more consistent than one subagent per cell, because the document is read a single time.) Escalate to a per-cell extractor only for a column that is high-stakes and came back low confidence or conflicted — re-run just that (file, column) pair with a focused prompt.

Give each extractor exactly this:

FILE: <path to the one document>
DOC_TYPE: <NDA | Commercial Lease | ... | unknown>
COLUMNS:
  1. key: parties
     question: Who are the parties to this agreement (full legal names)?
     hint: Check the preamble and signature block.
  2. key: governing_law
     question: Which jurisdiction's law governs?
  ... (every column, with key + question, hints where useful)

Return ONLY the strict JSON object defined in your instructions.

Each extractor returns a JSON object: { file, title, docType, summary, cells: [ {key, value, reason, quote, page, location, confidence} ] }. Remember: value is short (it's a table cell), reason is the longer sidebar explanation, quote is one verbatim sentence, and page is the 1-based page that sentence is on.

4. Collect and assemble the data file

Parse each extractor's JSON (read the last ```json block). If one fails to parse or the subagent errored, keep the row with that file's cells set to value: "Error", confidence: "low" — never drop a document silently; the grid must account for every file. Convert each extractor's cells array into a map keyed by key.

Write a data file <matter-slug>-review.data.json in the workspace with this shape. Give each row BOTH paths so the viewer works in the app and when opened from disk:

file — the workspace-relative path (e.g. ndas/acme.pdf). The in-app viewer asks the app for this file over the bridge, so it must be workspace-relative.
fileAbs — the absolute path on disk (e.g. /Users/.../ndas/acme.pdf). Used for the "open the PDF at this page" link when the saved .html is opened standalone.

{
  "matter": "<short matter/review name>",
  "generatedAt": "<current ISO 8601 timestamp>",
  "columns": [ { "key": "...", "label": "...", "question": "..." } ],
  "rows": [
    {
      "file": "ndas/acme.pdf", "fileAbs": "/abs/path/to/ndas/acme.pdf",
      "title": "...", "docType": "...", "summary": "...",
      "cells": {
        "<key>": { "value": "<short>", "reason": "<longer>", "quote": "<one sentence>", "page": 3, "location": "§7.2", "confidence": "high" }
      }
    }
  ]
}

5. Build the artifact (run the builder — do NOT hand-write the HTML)

Run the bundled builder. It injects a pdf.js viewer + the Eigenwelt theme + your JSON and writes a .html artifact. No PDF is embedded — the artifact loads each source PDF dynamically from its local file at view time and highlights the quote string on the cited page. So the artifact is a small, fixed size no matter how many or how large the source documents are.

node .opencode/skills/tabular-review/assets/build-review.mjs "<matter-slug>-review.data.json" --out "<matter-slug>-review.html"

(If your CWD is the skill dir, adjust the path; the builder also accepts --template and --vendor overrides.) The builder is deterministic — you just produce good JSON.

Writing the .html makes the app surface it automatically as a previewable HTML artifact (sandboxed iframe, scripts enabled): the grid shows the short value per cell; clicking a cell opens a liquid-glass sidebar with the reason, the verbatim quote, and a pdf.js viewer that opens the source PDF to the cited page with the sentence highlighted (with page navigation). Filter and CSV export are built in.

How the viewer reads the local PDF (no embedding):

In the app: the viewer asks the app for the file bytes over a postMessage bridge using row.file (workspace-relative), and renders + highlights inline.
Opened standalone from disk (file://): browsers block pages from auto-loading local files, so the viewer shows an Open page N in the PDF link (built from row.fileAbs) that opens it in the native viewer, plus a file picker to load it into the highlighting viewer. This is why fileAbs matters — without it that link is dead.
No poppler or other system tools are required — pdf.js renders in the browser.

6. Summarize in chat

After building the artifact, give a short readout: how many documents × columns, the name of the artifact file, and — most useful to a lawyer — the exceptions: cells that came back low confidence, conflicts, "Not found" where you'd expect a value, and any reason worth surfacing (auto-renewals, unusual carve-outs, missing signatures). The table is for scanning; your summary is for triage.

Column suggestion library (used in step 2.3 when no doctype skill is loaded)

Starter columns by document type. Offer these as the proposed set, then let the user edit. Prefer a loaded doctype-* skill over this list when one exists.

NDA / confidentiality agreement — Parties · Mutual or one-way · Effective date · Term · Purpose · Definition of Confidential Information · Exclusions · Permitted disclosures · Return/destruction · Term of confidentiality · Governing law · Injunctive relief.
Services / MSA / SOW — Parties · Effective date · Term & renewal · Services/scope · Fees & payment terms · Termination rights · Liability cap · Indemnification · IP ownership · Warranties · Governing law.
Employment agreement — Employee · Employer · Start date · Title/role · Compensation · At-will vs term · Non-compete · Non-solicit · Confidentiality · Severance · Governing law.
Commercial lease — Landlord · Tenant · Premises · Commencement date · Term · Base rent · Escalations · Renewal options · Security deposit · Permitted use · Assignment/sublease · Maintenance (CAM) · Governing law.
Purchase / M&A agreement (SPA/APA) — Buyer · Seller · Target/assets · Purchase price · Closing date · Conditions to closing · Reps & warranties survival · Indemnification cap/basket · Non-compete · Governing law.
Loan / credit agreement — Borrower · Lender · Principal · Interest rate · Maturity · Repayment schedule · Collateral/security · Financial covenants · Events of default · Governing law.
Unknown / mixed — Document type · Parties · Effective date · Term · Key obligations · Termination · Governing law · Notable risks. (Then refine with the user.)

The doctype-skill convention

A doctype skill is a normal skill named doctype-<type> whose job is to define the review columns (and where to look) for one kind of document. When the firm reviews a new document type often, capture its column logic as a doctype-* skill so this orchestrator can load it automatically instead of asking every time. Each one should provide a ## Columns section: a list of key, label, question, and a where to look hint. Example packs (doctype-nda, doctype-commercial-lease, …) live in the LegalWork Hub — a firm installs the ones it needs from Settings → Extensions → Skills. If none is installed, the suggestion library above is the fallback.

Notes & guardrails

Open models, firm-owned. Don't hardcode a model — extractors inherit the firm's configured model. The value here is that the column logic and corrections stay in the firm's skills and artifacts.
Never fabricate a cell. A blank, source-cited grid beats a confident wrong one. This is the one bar that matters; everything else is convenience.
Account for every file. Each input document is exactly one row, even on error.
Scale check. Many files × parallel subagents is fine, but if the set is very large (say >30), confirm scope with the user and consider batching.