US County Property Records — Owner, Value, Tax & Sales as JSON
Design notes for an Actor that turns an address or parcel ID into a normalized US property record — owner, assessed value, tax and sale history — across many counties behind one schema.
US County Property Records — Owner, Value, Tax & Sales as JSON
Design notes for county-property-records, an Apify Actor that turns a street address or an assessor parcel ID into a normalized property record — owner, parcel ID, assessed and market value, tax history, and sale history — sourced from public county assessor and recorder records and returned in one schema across many counties. It’s built to be called by an AI agent: one tool, one output shape, no per-county special-casing.
What this is
US property data is public, but it’s fragmented across roughly 3,000 counties, each with its own assessor and recorder, its own portal, its own field names. For a human that’s an afternoon of clicking; for an agent it’s 3,000 integrations. The value isn’t in any one county — it’s in collapsing that heterogeneity into a single, predictable contract.
This Actor does exactly that. You pass an address ("425 Addison Rd, Riverside, IL 60546") or a parcel lookup ("IL/Cook/15362060520000"); it resolves the right county, fetches that county’s public record, and returns a PropertyRecord with a fixed set of fields: owner_name, parcel_id, assessed_value, market_value, land_value, improvement_value, tax_year, tax_history, last_sale_date, last_sale_price, sale_history, and characteristics. The decision the Actor makes on every call — which county, which source, how to map that source’s columns onto the unified schema — is the whole product. The caller never sees it.
It’s deliberately scoped to public-record data: assessor valuations, recorded ownership, recorded sales. Not for-sale listings, not estimated “what’s it worth” numbers, not MLS feeds. That keeps it on clean, reusable ground and makes the output something you can build on without a data-licensing problem.
Why I built it this way
One schema is the entire point
A scraper for a single county is a commodity — it solves a tiny slice of the problem and breaks the day that county redesigns its site. The defensible thing is the unification: one call, many counties, identical output. So the architecture is a registry of per-county adapters behind a single resolve(lookup) -> PropertyRecord contract. Adding a county never changes the call you make or the shape you parse; it just lights up another value in the coverage set.
Per-county sources differ; the output must not
Counties don’t agree on how to publish. Some expose clean government open-data APIs (Socrata, Carto, ArcGIS) with owner, value, and sales in queryable tables. Others only have a rendered web portal. So adapters come in two flavors: a config-driven open-data adapter (a new clean county is a handful of lines — endpoint plus a column map) and bespoke adapters for the messier sources. The internal path varies wildly; the external contract is byte-identical. That split is what lets coverage grow cheaply without the caller ever feeling it.
Best-effort field fill, surfaced honestly
Here’s the tradeoff I decided to make visible rather than hide: not every county publishes every field. One county exposes a decade of recorded sales but no building characteristics; another has rich characteristics but only a current valuation. Rather than pretend uniformity, the schema is constant and fields are null where a county doesn’t publish them. When a field lives in a separate source for a county — for instance, sales recorded in a different dataset than the assessment roll — a small “enricher” fetches it and merges it in, best-effort: if that secondary source is down, the record still returns with everything else rather than failing. “Best effort” means actually try the extra source, then degrade gracefully — not “return less and call it a day.”
Pay only for data you get
Coverage is partial and always will be (it grows continuously). Charging for a lookup in a county we don’t cover yet — or one that doesn’t resolve — would punish callers for our gaps. So billing is push-then-charge on success only: a record is charged after it’s pushed, and not_covered / failed results come back labeled and free. An agent can fire a batch of mixed addresses and pay only for the ones that returned data.
Coverage is a fact to query, not a list to recite
I deliberately don’t enumerate every supported county in the docs — a wall of county names is thin content that goes stale instantly and helps no one. Instead, each run writes the live supported-county set to the key-value store (COVERAGE). An agent that needs to know “do you cover this county” asks the manifest, which is always current, instead of trusting prose that drifts.
How to use it
A mixed batch — one address, one known parcel ID, history included:
{
"addresses": ["1060 Wakeling St, Philadelphia, PA 19124"],
"parcelLookups": ["IL/Cook/15362060520000"],
"includeHistory": true
}
from apify_client import ApifyClient
client = ApifyClient("YOUR_TOKEN")
run = client.actor("shelvick/county-property-records").call(run_input={
"addresses": ["1060 Wakeling St, Philadelphia, PA 19124"],
"parcelLookups": ["IL/Cook/15362060520000"],
})
for r in client.dataset(run["defaultDatasetId"]).iterate_items():
print(r["county"], r["owner_name"], r["assessed_value"], r["last_sale_price"])
A resolved record:
{
"query": "1060 Wakeling St, Philadelphia, PA 19124",
"status": "completed",
"county": "Philadelphia",
"state": "PA",
"parcel_id": "234151600",
"owner_name": "HANRAHAN NOEL MARIE",
"market_value": 272800,
"last_sale_date": "2026-03-17",
"last_sale_price": 165000,
"characteristics": { "year_built": 1945, "building_sqft": 1688 },
"source_url": "https://atlas.phila.gov/234151600"
}
If you’re calling from an MCP-enabled agent, the same call works as a tool on the Apify MCP server — the input schema is self-describing, so the model can construct the call from the tool description, and it’s billable per call over x402 or Skyfire. This is the path I actually optimize for: an agent that needs owner-and-value-by-address shouldn’t have to know which county portal to scrape.
How it compares
| Approach | Many counties, one schema | Public-record sourced | Pay-per-use, no minimum | Agent-callable |
|---|---|---|---|---|
| Single-county scraper | — | varies | OK | varies |
| Enterprise property-data platforms | OK | OK | — (subscription + contract) | — |
| county-property-records | OK | OK | OK | OK |
The single-county scraper makes sense when you only ever need that one county. The enterprise platform makes sense when you need nationwide coverage with a contract and a procurement cycle. This Actor is for the case in between — you need several counties, you want public-record data you can reuse, you want to pay per record with no commitment, and you want an agent to call it directly.
Pricing model
Pay-per-event, on success only. One charge per resolved property record, after it’s pushed to the dataset; not_covered and failed results are free; the per-run Actor-start event is amortized across the batch. The model rewards breadth — a big batch costs only for the records that came back with data.
Current per-record rates are on the Apify Store Pricing tab.
Open questions / future work
- Coverage expansion cadence. The clean open-data tier makes adding a county nearly free (a config row); the rendered-portal tier is more work per county. The open question is ordering — population, real-estate activity, and source tractability all pull differently.
- Address → parcel resolution. Address matching is the main source of
failedresults on covered counties. A geocoding/normalization layer (directionals, suffix variants, unit handling) would lift the hit rate; today exact-ish matching plus optional county/state hints is the floor. - History depth. Where a county only publishes recent sales, full historical chains may need a recorder/deeds source as a second enricher. Worth it for counties where that depth is the use case.
- Characteristics coverage. Building characteristics live in a separate roll for some counties; whether to add those as enrichers depends on demand for the physical-attributes fields versus the valuation/ownership core.