Business For Sale Aggregator

Design notes for an Actor that merges business-for-sale and small-M&A listings from multiple broker marketplaces into one normalized, deduplicated deal feed.

Design notes for business-for-sale-aggregator, an Apify Actor that merges business-for-sale and small-M&A listings from several broker marketplaces into one normalized, deduplicated deal feed. Give it filters — sector, location, asking-price range, minimum revenue — and it returns one structured record per listing: asking price (USD), revenue, cash flow, sector, location, broker, and a link back to the source.

The Apify Store page covers the schema, current pricing, and how to try it. This page is for the design questions — why aggregation (not another single-site scraper) is the whole point, why the parser leans on structured data instead of scraping cards, and the deliberately narrow posture on what gets extracted.

What this is

Deal sourcing in the lower-middle market means searching a dozen broker marketplaces, each with its own layout, currency, and price formatting, then merging the results by hand. This Actor does the merge: one run, multiple marketplaces (currently DealStream, BizQuest, and BusinessesForSale), one normalized schema out.

Each output record carries the structured facts an acquirer screens on — asking price normalized to USD, annual revenue, cash flow / SDE, sector, location, broker — plus a short summary and the canonical URL of the full listing on its source. Filters apply across every source at once, and listings are deduplicated by source and id. It’s built to be consumed by an LLM or a pipeline: every field is structured, so a downstream agent can rank, cluster, or threshold the feed without re-parsing prose.

Why I built it this way

Aggregation is the product, not the scrape

There are plenty of single-site scrapers for individual business-for-sale marketplaces. They hand you one site in that site’s raw shape, and you still own the hard part: reconciling different field names, normalizing mixed currencies, and deduplicating the same business that’s cross-listed on three marketplaces. That reconciliation layer — consistent schema, USD-normalized prices, cross-source dedup, one filter pass — is the tedious, brittle work that’s genuinely worth automating, and it’s exactly what a single-site scraper leaves on your plate. So the Actor is scoped as the layer above the scrape: breadth across sources plus normalization, not depth on any one site.

Facts, not expression; link, don’t rehost

The Actor extracts only the factual fields of a listing — price, revenue, cash flow, sector, location, broker — plus a short summary, and then links to the source for the full write-up. It never reproduces a seller’s full marketing description or photos, and it never authenticates or touches gated/members-only content; it reads the same public search results a browser would. That posture is deliberate: factual data points (a price, a location) are exactly what a deal-screening feed needs, while the long-form copy and imagery are the source’s to host. Linking back, rather than rehosting, keeps the Actor a pointer to deals — and sends traffic to the marketplaces it reads.

Parse structured data, not rendered cards

Where a marketplace publishes its listings as structured schema.org Product metadata in the page, the Actor parses that rather than scraping the visual cards. Structured metadata is far more stable than CSS-class-based card scraping — it survives redesigns, and it gives clean typed fields (price, currency, address) instead of regex-on-rendered-text. v1 ships the sources that expose this structured form and reach cleanly through the fetch layer; marketplaces that render listings entirely client-side, or that gate them, are deferred rather than scraped fragilely.

Currency honesty

Asking prices are normalized to USD — but only when the source actually quotes USD. A non-USD listing returns a null USD price rather than a number that silently pretends a £750,000 figure is $750,000. It’s better to return “price not in USD yet” than a wrong number; cross-currency conversion is planned, and until it lands the feed never mislabels a foreign-currency figure.

How to use it

Minimal call — everything but the filters is optional:

{
  "sectors": ["software"],
  "maxAskingPriceUsd": 2000000,
  "maxListings": 50
}

From the Apify Python SDK:

from apify_client import ApifyClient

client = ApifyClient("<APIFY_TOKEN>")
run = client.actor("shelvick/business-for-sale-aggregator").call(
    run_input={
        "sources": ["bizquest", "businessesforsale", "dealstream"],
        "sectors": ["ecommerce"],
        "maxAskingPriceUsd": 5000000,
        "maxListings": 100,
    }
)
for listing in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(listing["title"], listing["askingPriceUsd"], listing["sourceUrl"])

It’s also discoverable through the Apify MCP server as shelvick/business-for-sale-aggregator, so an agent can call it as a tool, and via the REST run-sync-get-dataset-items endpoint for synchronous use.

How it compares to single-site scrapers

	Single-site scraper	Business For Sale Aggregator
Sources per run	one marketplace	multiple marketplaces
Output shape	each site’s raw fields	one normalized schema
Price	as-displayed, mixed currencies	USD-normalized (non-USD explicit)
Cross-source dedup	none	yes
Filtering	per site	unified across sources
Intended job	scrape one site	screen the whole field

The single-site scrapers aren’t wrong — they’re a different job. If you only ever look at one marketplace and want it raw, use one of those. This Actor earns its place only when you want the whole field reconciled into one feed.

Pricing model

Pay-per-event, billed only on success: a single flat fee per run that returns at least one listing, charged after the records are pushed. The fee is the same whether the run returns one listing or hundreds — so broad, multi-source searches stay predictable — and runs that fail or return nothing are never charged.

I picked a flat per-run fee over per-listing for three reasons. The cost of a run is essentially fixed (a handful of fetches; the listing count barely moves it), so per-run billing matches the cost shape. A single known price is trivial for an agent to budget or pre-authorize, where a per-listing total forces it to guess the result count before calling. And a flat fee rewards the breadth that’s the whole point of an aggregator instead of taxing it — going wider doesn’t cost the caller more. The Apify Store Pricing tab is authoritative for the current per-run rate and any subscriber discounts.

Open questions / future work

More sources. v1 covers three marketplaces. The aggregation value grows with coverage; adding marketplaces that render listings client-side (or sit behind heavier defenses) is the main roadmap item.
Currency conversion. Non-USD listings currently return a null USD price. A dated FX layer would let them carry a converted figure (clearly flagged as converted).
Cross-source fuzzy dedup. Today’s dedup is exact source+id. The same business cross-listed under different ids on different marketplaces should collapse to one record — a fuzzy match on title + price + location.
Pagination depth. v1 reads the first search page per source; deeper paging would raise recall on broad queries at a proportional cost.