Local Market Saturation & Competitor Density Analyzer

Design notes for an agent-callable Actor that turns Google Maps local listings into an aggregated competitive-landscape report and a transparent 0-1 saturation score.

Design notes for local-market-saturation, an agent-callable market-analysis Actor on the Apify Store. Give it a business category and a location — coffee shops in Austin, TX — and it pulls the matching businesses from Google Maps local listings and returns an aggregated competitive-landscape report: how many competitors there are, how good and how established they are, what they charge, and a single 0-1 saturation score with its reasoning attached.

The Apify Store page covers the input schema, current pricing, and how to try it. This page is for the design questions — why the Actor returns an aggregate rather than a list, why the saturation score is deliberately transparent, and what I left out on purpose.

What this is

The decision the Actor makes is narrow and specific: for one market, fetch the top-ranked local results, then turn them into the answer shape a competitive-landscape question actually wants. Not a list of pins — a structured read. Competitor count and density. The distribution of star ratings (are incumbents strong or mediocre?). The distribution of review counts (is the field entrenched or nascent?). The price-tier mix. The fraction of competitors that are both well-rated and review-deep — the ones that are genuinely hard to displace. And on top of all that, a saturation score from 0 to 1, banded low / moderate / high.

It is not a scraper in the usual sense — it doesn’t hand you the businesses. It’s an aggregator: the output is statistics plus a small most-reviewed-incumbents sample (name, rating, review count) for context, and nothing else. No addresses, no phone numbers, no review text. That boundary is a deliberate design choice I’ll come back to.

The intended caller is an AI agent: “my user is thinking about opening a bakery in Logan Square — how saturated is that market?” or “rank these five metros by HVAC competition.” The Actor is the tool that answers that in one call, returning something an agent can reason over directly rather than a payload it has to crunch first.

Why I built it this way

Aggregate-out, not list-out

The raw material here — local business listings — is among the most heavily scraped data on the web. There is no shortage of tools that will hand you a list of every coffee shop in a city with its rating and review count. So returning another list would add nothing.

The value isn’t the pins; it’s the aggregation. “How saturated is this market” is not answered by 60 rows — someone still has to bucket the ratings, sum the reviews, weigh how entrenched the incumbents are, and turn that into something comparable across locations. That work is the product. So the Actor returns the result of the aggregation and not its inputs: distributions, fractions, a score. An agent gets an answer it can act on, not a dataset it has to process.

This also keeps the tool on the right side of a line I care about. A list of local businesses with contact details is a lead list. An aggregate count of how many there are, with rating and review distributions, is market research. By returning only aggregates plus a five-row most-reviewed sample — names, ratings, review counts, nothing contactable — the Actor stays a market-analysis tool and never becomes a lead-generation scraper. No addresses, no phones, no copied review text. That’s a constraint baked into the output shape, not a promise in the docs.

The saturation score is transparent on purpose

It would be easy to emit a single saturation number and call it proprietary magic. I deliberately didn’t. The score is a documented composite — a weighted blend of competitor density, incumbent entrenchment (the share of competitors that are both highly rated and review-deep), and review depth (how mature the median competitor is) — and every one of those component signals is returned next to the score, along with the exact formula and the reference constants used to normalize it.

Two reasons. First, a market-entry decision is consequential enough that a black-box score is worse than useless — a caller needs to see why a market scored 0.74, and to disagree. If you think entrenchment should matter more than raw density, the components are right there to re-weight. Second, transparency is the honest posture given what the score actually is: a heuristic over a sample, not a law of nature.

It’s a heuristic over a sample — and the copy says so

This is the design decision I’d defend most carefully, because it’s about not overclaiming. The textbook notion of retail saturation is an index against population and spend — businesses per capita, demand per square foot. This Actor does not have a population join, so it does not compute that. What it has is Google’s ranked local results, which is a sample of the competitive set, not a census of every business, and which carries ratings and review counts that stand in for incumbent strength.

So the saturation score is explicitly a Maps-derived heuristic, and the output says so in a standing notice on every record. The competitor count is labeled as a lower bound when the sample is capped. I would rather ship a clearly-bounded estimate that a caller can calibrate than a confident-looking number that implies a precision the data can’t support. The whole point of returning the score’s components is so the estimate can be audited rather than trusted.

Sourcing decided by a spike, not a guess

Before committing, I ran a spike to settle how to actually get clean per-business data — name, rating, review count, price — reliably enough that the aggregation isn’t built on sand. The answer was the local-finder list view rather than the map application: it exposes each result as one clean, structured label that parses deterministically into the fields the aggregation needs, and it clears on the cheap datacenter fetch tier most of the time, with a residential fallback only when the source pushes back. Garbage-in would poison every downstream number, so the data layer got validated first and the rest of the design followed from it. The single place that depends on the source’s formatting is one small parser — the one thing to maintain as the source drifts, and the kind of ongoing maintenance that’s the actual moat for a tool like this.

Aggregate-only, so the cost stays honest

Because the output is an aggregate and not a stored list, and because there’s no language model in the path, the cost per market is small and dominated by the fetch — so the price reflects the analysis, not an inference bill or a per-row storage tax. Markets are processed concurrently, which keeps batch runs fast and the compute cost negligible. The economics and the product shape reinforce each other: an aggregate is cheaper to produce and more useful to an agent than the list it’s derived from.

How to use it

A realistic call compares one category across several candidate locations:

from apify_client import ApifyClient

client = ApifyClient("YOUR_TOKEN")
run = client.actor("shelvick/local-market-saturation").call(
    run_input={
        "markets": [
            {"category": "coffee shops", "location": "Austin, TX"},
            {"category": "coffee shops", "location": "Portland, OR"},
            {"category": "coffee shops", "location": "Boise, ID"},
        ],
        "maxPlaces": 60,
    }
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    s = item["saturation"]
    print(item["location"], item["competitorCount"], s["score"], s["band"])

You get one record per market: the counts and distributions, the incumbent-strength fractions, the most-reviewed sample, and the saturation score with its components. From an MCP-enabled agent the Actor appears as a tool on mcp.apify.com, the input schema is self-documenting, and you pay per call via x402 USDC on Base or Skyfire managed tokens. It’s the kind of tool an agent reaches for when a user asks “is this market too crowded” and the agent needs a structured answer rather than a scraping project.

How it compares to alternatives

Approach Output Aggregated Saturation score Per-business PII Agent-callable
Raw places / Maps scraper list of pins No — you aggregate No often yes partial
Counting pins by hand a rough number by hand No n/a No
Enterprise location-intelligence platform dashboards, foot-traffic, forecasts Yes proprietary varies No — seat-licensed
Local Market Saturation Analyzer aggregate report per market Yes Yes, transparent No — aggregate-only Yes

A raw scraper gives you the soup and leaves the cooking to you. An enterprise platform is a powerful, expensive, seat-licensed dashboard built for retail-expansion teams. This sits deliberately in between: a cheap, callable, aggregate market read with a score you can audit — the thing an agent or a solo operator wants when the enterprise platform is overkill and the raw list is underbaked.

Pricing model

Pay-per-event, billed only on success: one charge per market (category × location) that produces a report, after the report is pushed. A market that returns no usable results is never charged, and multi-market runs are billed per completed market, capped by maxTotalChargeUsd. Because the output is an aggregate with no model and no stored list behind it, the underlying cost is small and the price reflects the aggregation and the score, not infrastructure. Current per-event rates are on the Apify Store Pricing tab.

Open questions / future work

  • A population join for a true saturation index. The current score is a heuristic over the competitive sample. Joining population and spend data would turn it into a real per-capita saturation index — the most valuable next investment, and the one that would let the score drop the “heuristic” caveat for some categories.
  • Calibrating the residential-fetch rate. How often the source forces the expensive fetch tier is the dominant cost lever and is only knowable from real traffic; the score weights and the fetch path will both be tuned once there’s production data.
  • Sub-area density. Today a “location” is a named place. Splitting a metro into sub-areas (ZIP or neighborhood) would turn the single density figure into a heat-map of where a category is under- vs over-served — the question site-selection callers really want answered.
  • More signals. “Recently opened” / “permanently closed” markers and rating-trend-over-time are visible in the source and would sharpen the entrenchment read, added deliberately and only where they can be derived without guessing.