SEC Insider Trades — EDGAR Form 4 Buys & Sells as JSON
Design notes for a deterministic Actor that turns SEC EDGAR Form 3/4/5 ownership filings into clean, decoded insider-transaction JSON for AI agents.
Design notes for sec-insider-trades, a deterministic insider-trading Actor on the Apify Store. Give it a ticker or a CIK; it reads the company’s recent Form 3/4/5 ownership filings from SEC EDGAR, parses each one’s structured XML, decodes the cryptic transaction codes, and returns one clean JSON record per filing — who traded, whether they bought or sold, how many shares, at what price, what they hold now, and when.
The Apify Store page covers the input schema, current pricing, and how to try it. This page is for the design questions — why the Actor is shaped this way, what I deliberately left out, and where the genuinely fiddly parts are.
What this is
Corporate insiders — officers, directors, and beneficial owners of more than 10% of a class of stock — are required to report their transactions in their own company’s securities to the SEC. A Form 4 is filed within two business days of a trade; a Form 3 is the initial statement when someone becomes an insider; a Form 5 is an annual catch-up for certain exempt transactions. Collectively these are the canonical record of insider buying and selling, and they’re public the moment they’re filed.
The data is free, and it’s already machine-readable — SEC stores each ownership form as XML. So, as with most of my SEC tools, the problem this Actor solves isn’t access. It’s that the raw document is genuinely awkward to consume from agent-callable code. A single Form 4 nests transaction amounts several elements deep, splits a trade’s direction across a one-letter transaction code and a separate acquired-or-disposed flag, and puts options and RSUs in an entirely separate “derivative” table with its own schema. Multiply that by the dozens of filings a large company generates and you’re writing and maintaining an EDGAR parser before you can ask a single useful question.
The decision the Actor makes on each filing is narrow and mechanical: locate the issuer and the reporting owner, walk both the non-derivative and derivative tables, and for every transaction pull the security title, date, shares, price, acquired/disposed direction, and post-transaction holdings — verbatim — then translate the SEC transaction code into a readable label. The output is one record per filing: the reporting owner and their relationship to the company, the list of transactions, and a small per-filing summary (transaction count, total acquired value, total disposed value, net share change) so a caller can scan a list of filings without doing arithmetic.
It is not a scraper — the data comes from SEC’s official submissions API and the EDGAR document archive, not a rendered page. And it is not an analyst: it returns the transactions as filed, with no scoring, no sentiment, and no recommendation. What it owns is the parse and the decode.
Why I built it this way
Deterministic, with no model in the path
This is the same call I made on the financials side, and for the same reason: these are numbers people act on. Share counts, prices, and resulting holdings are copied straight out of the XML; the only interpretation is mapping documented SEC transaction codes (P, S, A, M, F, G, and the rest of the Form 4/5 table) to human-readable types. There is no language model anywhere in the extraction, so there’s nothing that can emit a plausible-but-wrong number into a field someone might trade on. A pleasant side effect is that the cost floor is almost nothing — SEC’s APIs are free and keyless, there’s no proxy and no inference bill — so the price reflects the normalization, not a per-call model cost.
One record per filing, not per transaction
I went back and forth here. A flat per-transaction dataset is the most directly queryable shape — “show me every open-market purchase over $1M” is a trivial filter. But a filing is the natural unit of the underlying event: one insider, reporting on one date, often with several related lines (an option exercise plus the sales that funded it, say). Flattening loses that grouping, and it makes billing strange — a single filing could fire ten charges.
So each record is a filing, with its transactions as a nested list and a summary that already separates acquired from disposed value. A caller who wants flat rows can explode the list in two lines; a caller who wants “what did this insider do on this date” gets it intact. And billing stays legible: one filing, one charge, and maxFilings is therefore also the spend cap.
Charge per filing, because cost scales per filing
The cost driver is one document fetch per filing (plus one cheap filing-list fetch per company). Pricing per filing keeps the bill aligned with the work and gives the caller a hard, predictable cap: set maxFilings and you’ve set your maximum charge. A flat per-company price would have undercharged a deep 100-filing pull and capped the upside on exactly the queries that deliver the most data. Failed lookups and unparseable documents are never charged — you pay for filings you actually get.
Going to the primary source instead of a finance site
It would have been faster to scrape a third-party site that already aggregates insider trades. I didn’t, because that inherits someone else’s layout changes, rate limits, and coverage gaps, and it puts a re-publisher between the agent and the authoritative record. Reading EDGAR directly means the only thing that can break the Actor is SEC changing its own format, and every record links back to the exact source document for verification. For an agent doing investment research, that provenance is the point.
The fiddly part: finding the raw XML
The genuinely annoying bit — the one I’d warn anyone reproducing this about — is that the filing-list feed points at the rendered viewer path for each document (an XSL-styled URL), not the raw machine-readable XML. The raw document lives at the same accession folder under the document’s own basename, and the filename varies by filing agent (form4.xml for some, wk-form4_….xml or tm…_4seq1.xml for others). Stripping the viewer directory off the path and fetching the basename resolves the raw XML across every filing agent I tested. It’s a small thing, but it’s exactly the kind of undocumented detail that makes “just parse Form 4” a half-day instead of ten minutes.
How to use it
A realistic call: monitor a small watchlist for recent insider activity, newest first, since the start of the year.
{
"identifiers": ["AAPL", "NVDA", "MSFT"],
"formTypes": ["4"],
"maxFilings": 15,
"sinceDate": "2026-01-01"
}
from apify_client import ApifyClient
client = ApifyClient("YOUR_TOKEN")
run = client.actor("shelvick/sec-insider-trades").call(
run_input={
"identifiers": ["AAPL", "NVDA", "MSFT"],
"formTypes": ["4"],
"maxFilings": 15,
"sinceDate": "2026-01-01",
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
s = item.get("summary") or {}
print(item["issuerTicker"], item["reportingOwner"]["relationship"],
"net", s.get("netShares"), "disposed $", s.get("totalDisposedValue"))
A single filing comes back like this (an Apple director’s open-market sale):
{
"identifier": "AAPL",
"status": "completed",
"issuerName": "Apple Inc.",
"issuerTicker": "AAPL",
"formType": "4",
"filingDate": "2026-05-29",
"periodOfReport": "2026-05-27",
"reportingOwner": {
"name": "LEVINSON ARTHUR D", "isDirector": true, "relationship": "Director"
},
"transactions": [
{
"securityTitle": "Common Stock",
"transactionCode": "S",
"transactionType": "Open-market sale",
"acquiredDisposed": "D",
"shares": 50000,
"pricePerShare": 311.02,
"transactionValue": 15551000.0,
"sharesOwnedAfter": 3764576,
"isDerivative": false
}
],
"summary": {
"transactionCount": 1, "totalAcquiredValue": 0.0,
"totalDisposedValue": 15551000.0, "netShares": -50000
},
"sourceFilingUrl": "https://www.sec.gov/Archives/edgar/data/320193/000114036126023363/form4.xml"
}
If you’re calling from an MCP-enabled agent, the same call is available as a tool on mcp.apify.com — the input schema is self-documenting, so the model can construct the call from the tool description, and you can pay per call over x402 (USDC on Base) or Skyfire managed tokens. This is the path I most expect for an investment-research agent that wants to check insider activity on a name mid-conversation.
How it compares to the alternatives
| Approach | Normalized fields | Codes decoded | Per-filing aggregates | Derivatives | Agent-callable |
|---|---|---|---|---|---|
| Raw EDGAR XML / full-text search | — | — | — | raw only | — (you parse it) |
| Generic web scraper on a finance site | partial | sometimes | — | inconsistent | brittle |
| SEC Insider Trades | yes | yes | yes | yes | yes |
The honest framing: raw EDGAR is the authoritative source and you can absolutely parse it yourself — this Actor is for when you’d rather not own the parser, the transaction-code table, and the derivative-table edge cases, and you want a stable JSON contract you can point an agent at. A third-party finance site is the alternative when you don’t need provenance and don’t mind a re-publisher’s coverage gaps; going to the primary source is the better fit when the link back to the official filing matters.
Pricing model
Pay-per-event, billed only on success: one charge per insider filing fetched and parsed, after that record is pushed to the dataset. Companies that don’t resolve, and filings that can’t be retrieved or parsed, are free. Because billing is per filing, the maxFilings input is also the spend cap — a quick latest-activity scan is a handful of charges, and a deep multi-company pull scales linearly within your run’s max-charge setting.
Current per-event rates are on the Apify Store Pricing tab.
Open questions / future work
- A flat per-transaction view. The per-filing record is the right default, but I can see wanting an output mode that explodes transactions into one row each for direct filtering. If enough callers want it, it’s an easy addition rather than a rewrite.
- Deeper history. Today the Actor reads the recent EDGAR filing window (the latest ~1000 filings per company). Reaching the full multi-decade archive means paging the older filing shards — worth doing if the demand is for long backtests rather than current monitoring.
- Amendment reconciliation. Amended filings currently come back as their own records. A future option could link an amendment to the filing it supersedes, or collapse them, instead of leaving that to the caller.
- Cross-company cluster detection. The raw material for spotting “several insiders bought this week” is all here, but the clustering itself is left to the caller. Whether that belongs in this Actor or a thin layer on top of it is still open.