The Situation

The marketing team had all the tools. Salesforce for CRM. Marketo for automation. DemandBase for ABM and intent data. LinkedIn Ads for paid targeting. Four platforms, significant annual spend, and a dashboard that could tell you anything you wanted to know.

Except it couldn't tell you which accounts were actually likely to buy.

The propensity model existed—technically. But sales didn't trust it. "Your 'hot' leads are garbage," was the polite version of the feedback. Marketing would point to the data showing high intent scores. Sales would point to the conversations that went nowhere.

Both were right. The model was mathematically sound. It was also practically useless.

I was brought in to fix demand gen, but within the first two weeks, it became clear that the problem wasn't campaigns or content or targeting. The problem was data.

The Insight

Propensity models are only as good as the data feeding them. Garbage in, garbage out—everyone knows this. But the failure mode I found wasn't garbage data. It was inconsistent data.

The four systems had been implemented at different times by different teams with different naming conventions:

  • Salesforce had "IBM Corporation" as an account. Marketo had "IBM" and "IBM Corp" as separate records. DemandBase had "International Business Machines."
  • Industry classifications didn't match. Salesforce used a custom picklist. DemandBase used SIC codes.
  • The same contact might exist in three systems with three different titles.
  • Intent signals from DemandBase weren't consistently flowing to Salesforce.

The model was mathematically weighting data—but it was doing math on records that weren't actually the same entities.

The insight: The prediction problem was actually a data governance problem.

Before-and-after comparison showing four marketing systems with mismatched data records being cleaned through domain-based matching and standardization into aligned records

The System

Layer 1: Data Audit and Mapping

First, we mapped every field across all four systems. The output was brutal:

  • 47 fields that should have been synchronized
  • 23 fields with naming inconsistencies
  • 12 fields with conflicting data types
  • 3 systems with no automated sync

Layer 2: Standardization Rules

Company Identification

  • Primary key: Domain name (most reliable unique identifier)
  • Secondary matching: Legal entity name with fuzzy matching
  • Standardization: All company names normalized to D&B format

Industry Taxonomy

  • Standard: Adopted DemandBase industry categories as source of truth
  • Mapping: Built translation tables from SIC codes and Salesforce picklists

Layer 3: Integration Architecture

DemandBase (Intent Data)
    |
    |---> Salesforce Account (nightly batch + real-time for surges)
    |        |
    |        \---> Marketo (account-level intent visible in automation)
    |
LinkedIn Ads
    |
    |---> Marketo (engagement tracking)
    |        |
    |        \---> Salesforce Lead/Contact
    |
Marketo (Behavioral Data)
    |
    \---> Salesforce (bi-directional sync, 15-minute intervals)

Layer 4: The Propensity Model

With clean, synchronized data, we rebuilt the scoring model:

Input Signals (weighted by predictive power):

  • Firmographic: Industry alignment, employee count, tech stack, geography
  • Behavioral: Website visits by page value, content downloads, email engagement
  • Intent: DemandBase surge scores, keyword research, competitor signals
  • Historical: Similarity to closed-won accounts
Score RangeClassificationSales Action
80-100Sales-ReadyImmediate outreach, high priority
60-79Marketing QualifiedAccelerated nurture, SDR qualification
40-59EngagedStandard nurture, monitor for surge
20-39AwareLight-touch digital, brand awareness
0-19ColdExclude from active campaigns
Propensity score gauge showing five tiers from 0 to 100 with corresponding sales actions, plus vertical calibration adjustments for banking and retail segments

Layer 5: Vertical Segmentation

Rather than one model to rule them all, we calibrated for high-value verticals:

Banking/Financial Services: Heavier weight on compliance content, longer lookback window

Retail: Seasonal adjustment, customer experience content weight, peak period multiplier

The Takeaway

Most companies don't have a prediction problem. They have a data problem masquerading as a prediction problem.

1. Audit before you build. We lost a month before I realized the model wasn't the problem. That month would have been saved by starting with a data audit.
2. Domain is your friend. Email domain as the primary key for matching solved most of our account matching challenges.
3. Vertical calibration beats universal models. A model trained on all accounts will be mediocre at predicting any specific segment.