Account scoring as infrastructure: how I built it at Default

Key Takeaways

1.Buying signals are easy to get; unifying them per account is the real work, and that takes a data layer you own.
2.Two axes: Fit (right buyer) and Timing (active now) combine into four tiers a rep can act on.
3.The richest signals are inferred from job-description language, not looked up: ops buildouts, pain language, AI initiatives.
4.Storing outcomes with the score they converted at turns scoring into a closed loop and builds the training set for a learned model.
5.All-in cost is under $0.75 per account scored across 5,800+ accounts.
6.For a BDR working the model's rankings, the touchpoints it takes to book a meeting have been cut by 2x.

Every Monday, a BDR opens Salesforce to thousands of accounts and the same question: which of these is actually worth calling this week?

Until recently, the honest answer was we don't know.

So reps spent hours a day researching accounts by hand, opening tabs and building lists just to find the ones worth a dial.

The signals that would answer it (hiring patterns, funding rounds, website visits, ad engagement, executive moves, the language inside job descriptions) are easy to access. Apollo sells them. Clearbit sells them. RB2B sells them.

Unfortunately, these signals live in isolation, each in its own tool and format, with its own idea of what an account even is, and nothing pulls them together.

I built an account scoring system to unify every signal per account into one score, so a rep opens Salesforce to a ranked list of who to call and why, with the research already done.

I'm Nandika, Founding Growth Engineer at Default. I joined about six months ago, brought on to treat outbound like a product and approach go-to-market problems from an engineering perspective: to build GTM infrastructure.

Account scoring is our first big project. It was scoped as a 6 to 12 month project with a dedicated engineering team. It took about a month, with no engineers. Just me, my manager Lev as a sounding board, and Claude Code in the loop for schema design, migrations, and the middle of the build.

The bottleneck on building real GTM infrastructure has moved. It used to mean a team of engineers and a long timeline.

What takes the work now is deciding what to build, and staying close enough to the sales team to know whether it's right.

I first walked through this system in a talk at a GTM event. This post is the written version, with the parts I could not fit on stage.

My talk on the system.

HiringJob postings

FundingNew raise

VisitsSite + ads

NewsM&A, moves

Scored

HiringJob postings

FundingNew raise

VisitsSite + ads

NewsM&A, moves

Scattered, commoditized signals, unified per account into one score the sales team can act on.

The problem: signals everywhere, unified nowhere

At any moment, a small subset of your TAM is producing signals that say "we're getting ready to buy." The hard part was correlating them, against every account, in one place, continuously.

BDRs, AEs, and mass-email infrastructure all hit the same wall: nothing in the stack fully answers "who's in-market right now?" So the work was to find the signals, structure them against every account, and turn that into a priority the sales team can act on.

The model pulls every signal we care about on a company, unifies it per account, stores it in a structured database, and serves it back to the sales team.

What I decided: Fit × Timing

I wanted the model to be simple enough to understand in a single sentence, so I settled on two axes.

Fit: is this the right kind of buyer? Firmographic and technographic shape: employee count, vertical, funding and growth stage, tech stack. Fit changes slowly. It's the floor.

Timing: is this account active right now? Change signals that say they're getting ready to buy. Timing decays. A signal that was hot last month and has gone quiet drops out of the score.

The two combine into four quadrants:

P1: high fit, high timing. Call now.
P2: high fit, decent timing. Multi-touch, nurture.
P3: lower fit or weak timing. Automated, low-touch sequences.
P4: low fit, low timing. Ignore.

Timing

Investigate

Low fit, high timing

P1Call

Call now

High fit, high timing

Deprioritize

Low fit, low timing

Nurture

High fit, decent timing

Fit

Fit answers who the buyer is. Timing answers whether they are active now. The two combine into a tier the sales team can act on.

Under the hood the cutoffs are explicit. An account is P1 at fit ≥ 50 and timing ≥ 22, P2 at fit ≥ 50 and timing 16 to 21, and everything weaker falls to P3 or P4. I added one override after looking at the early output: any account with fit ≥ 80 gets bumped up a tier. If the company is an obviously great buyer, it gets surfaced despite the timing.

I score accounts on a monthly basis. Real-time enrichment across the full base would cost far more than it's worth, and the signals that matter don't change by the hour.

Signals as inference

Most of the firmographic data is a lookup. The third-party signals are not. They're inferred. I store a large set of job descriptions in a table and run inference over them to derive signals the raw data doesn't state outright.

Job description · Director, Developer GTM

We're building out our GTM operations and need an owner who can scale outbound. You'll own our AI roadmap for sales, replacing the work we do today stitching tools together by hand.

Sonnet 4.6 inference

ops_buildout

3 GTM-ops roles in a quarter

ai_initiative

"own our AI roadmap"

pain_language

"stitching tools by hand"

Sonnet 4.6 reads the job description and emits structured, scored signals. Every one links back to the exact text that triggered it.

A company posting three "Director, Developer GTM" roles in a quarter is an ops-buildout signal. Job-description language about a specific pain we solve becomes a pain-language signal. JD copy referencing an AI push becomes an AI-initiative signal. On top of that, web detection surfaces funding, M&A, and transformation announcements (a new VP of a new region, say). Each becomes its own scored signal with a record of exactly what triggered it.

The inference runs on Sonnet 4.6. The classification rules are explicit and the matching is deterministic, so for any given score the contributing signals are queryable end to end. The sales team can click into the underlying job posting and confirm the system is reading real intent, not hallucinating it. That visibility was important for adoption.

How it works

The system runs on a few primitives: Claude Code to build it, a stack of enrichment providers to pull data in, and a Supabase Postgres database (12 tables) underneath, orchestrated with GitHub Actions.

Inputs

Bulk import
Web app
Salesforce sync
Direct signals

Enrichment

Firmographic
Technographic
JD + news inference

Storage

Supabase Postgres
12 tables
Quarterly history

Scoring + output

Fit x Timing
Salesforce fields
Web app

Outcomes feed back into the model to power the future ML loop

Four inputs, one enrichment waterfall, one system of record, one score.

How accounts enter the model

Accounts enter the system four ways:

Bulk CSV import, to bootstrap the base.
One-off scoring on demand through the web app.
Salesforce sync, pulling accounts already in the CRM.
Direct interaction with Default (a website visit, form fill, email reply, LinkedIn engagement, or ad click), which creates the account if it doesn't exist yet.

Enrichment

Enrichment runs as a waterfall across providers: firmographic and technographic data first, then the job-description and news pipeline that feeds the inference layer. Every provider call is logged, with credit-burn alerts wired to Slack so a runaway query gets caught before it becomes costly.

Storage

Storage is where I spent the most design time. Alongside the live tables sit quarterly history tables, so the model can read direction, not just state. "500 employees today, 3,000 two years ago" is a different story than "500 and growing 40% a year," and I wanted the model to see both.

Output

Scores write back to Salesforce custom fields and surface in the web app, where the GTM team can drill into why any account scored the way it did.

How the sales team uses it

Adoption was a design constraint from the start, so the system meets the sales team where they already work rather than asking them to come to it.

In the CRM

Scores write back to Salesforce as custom fields: tier, fit, and timing, right on the account record. The sales team never has to leave Salesforce to know where an account stands. Each member of the team gets a focused book of P1s and P2s, so instead of working through thousands of accounts they have a high-intent short list.

In the dialer

Alongside the score, the system pushes an account summary and a contact summary into the dialer. When the sales team picks up the phone or opens a draft, they already have the context: the firmographic shape, the signals that fired, what the company is hiring for, what pain showed up in their job posts. That context feeds both the call and the email, so outreach is personalized without anyone doing manual research first. The dialer reads the same Salesforce fields, so the brief a rep sees on a call is the account summary the model wrote back.

In the web app

For anyone who wants to go deeper, the web app exposes the full breakdown: every contributing rule, every signal, the source it came from, and the underlying job postings. It's the place to vet the score.

The web app scoring view for Profound: a retro-game interface with the P1 score (fit 89, timing 30), the Fit by Timing battle map, and the full stats sheet of every rule that contributed. — The web app: Profound's full scorecard. Click to expand and scroll through every rule that contributed.

A note on the interface

Finding good accounts is a kind of foraging: sifting a lot of noise for the few worth your time. I took inspiration from retro games to make that hunt feel amusing instead of like one more dashboard, so scoring an account is genuinely a little fun.

The throughline: the sales team doesn't adopt a new tool. The scoring meets them in the three places they already work.

What changed

The premise is that every touchpoint should be a higher-quality touchpoint, because the sales team isn't spending time on bad-fit accounts.

The sales team spends far less time on manual research. The account and contact summaries do the prep work that used to consume the first part of every call.
The metric I watch is outreach-per-booking: how many dials and emails it takes to book a qualified meeting. For a rep working the model's rankings, that has been cut by more than 2x since launch, from around 640 touches per qualified opportunity to about 275, while their opportunity-conversion rate more than doubled. The time goes to better-fit accounts.
Pipeline is concentrating where the model points. The share of new business opportunities created from P1 and P2 accounts has roughly doubled, from about a quarter at the start of the year to around 40 percent by spring, instead of effort spreading across the whole TAM.

By the numbers

5,800+

Accounts scored

under $0.75 each, all-in

Where qualified accounts land~9% P1 · ~11% P2

P1P2P3P4

Outreach per booking

2x fewer

Touches to book a meeting, cut by 2x for the team working the model's rankings.

8,200+

Unified signals

24,000+

Job descriptions inferred

~300K

Scoring records logged

Postgres tables

Closing the loop

Every outreach outcome (calls, emails, meetings booked, opportunities created) is stored from day one, and stored with the score at the time of the outcome: at what score was that meeting booked, at what score did that account convert, how many touchpoints it took.

Score

Fit x Timing

Outreach

Calls + email

Outcome

Meetings, opps

Refine

Labeled data

Every outcome carries the score it converted at, so the model learns

It's the model's report card: I can ask whether P1s actually convert better than P3s instead of assuming it. And it builds the labeled dataset I'll need to graduate the model from heuristic weights to a gradient-boosted tree. It would have been easy to skip, since the payoff is months out, but storing inputs and outputs together from the start is the only way that future ever exists. The end state is a closed loop: scoring drives outreach, outcomes refine scoring, the model improves on its own.

Why I didn't just buy Clay

I used to be a Clay power user, so before going all-in on an engineering project I had to ask the obvious question: why not just use Clay? The cost was roughly comparable and the signals were all there.

I didn't arrive at this from theory. As a Clay power user, I had already tried to do exactly this in Clay: segmenting accounts, joining sources, and building scoring and tiering into tables and workflows. It is all doable, but extremely tedious, and every personalized-outreach or account-scoring build I did that way was painful.

Clay is excellent for one-time enrichment workflows. It isn't built to structure and continuously score your entire TAM. After a few thousand rows it starts to strain, and you're clicking through a UI to get the output you need. I needed a system of record: audit logs, snapshots, history, everything joinable, to build on top of.

The deeper principle: signals are commoditized, but the unified data model that correlates every signal to every account, with history and attribution back to outcomes, is infrastructure you have to build.

Own your data before you buy agents. Agents built on your own niche data infrastructure will outperform generic lead-conversion tools, because the data is specific to exactly what you care about.

What's next

From here it is a loop of iteration. Every outreach outcome feeds back into the model, so each run sharpens the next.

Near term: collect more signals and more labeled outcomes, and build a genuinely great outbound agent on top of the data.

As that labeled set grows, especially after our launch, I will move scoring from heuristic weights to a learned model that predicts a good account from real conversions. The data is specific enough to us that an agent built on it will beat anything generic.

Now

Live in production

Heuristic Fit x Timing scoring
Ranked books in the CRM, dialer, and Slack
Under $0.75 per account, all-in

Sharpen the loop

More signals and labeled outcomes
A great outbound agent on the data
Tighter scoring-to-outcome feedback

Later

Learn from outcomes

Heuristics to a learned model post-launch
Predict good accounts from real conversions
Agent-driven outbound at scale

Scoring drives outreach, outcomes refine scoring, and the model improves with every run.

It always comes back to the data layer

Account scoring started with a simple goal: a better way to decide who to reach out to, and why, rooted in data instead of instinct.

The lesson is that the systems that work start with unified data, strong context, and one clear job.

Without context, a model is guessing, and context lives in records that are labeled, joinable, and history-bearing. Every score in this system is grounded in one.

That's the same idea behind Default's platform for running GTM on a unified, identity-resolved, history-aware revenue graph.

If you're building anything that asks AI to make a judgment about a company, the data layer is the work.

Building Default's account scoring infrastructure

Key Takeaways

The problem: signals everywhere, unified nowhere

What I decided: Fit × Timing

Signals as inference

How it works

How accounts enter the model

Enrichment

Storage

Output

How the sales team uses it

In the CRM

In the dialer

In the web app

A note on the interface

What changed

By the numbers

Closing the loop

Why I didn't just buy Clay

What's next

It always comes back to the data layer

Related Articles

The State of AI in Revenue Operations: H1 2026 Report

Building Default's autonomous customer support agent

Agent infrastructure for go-to-market