May 6, 2026 13 min read

Running candidate background research with parallel Claude Code agents

I built a parallel-agent workflow to make candidate background research faster. Sixty-four candidates later, the speedup is the least of it: running the investigations side by side turned a noisy classifier into a confident one, and then into a way to rank the candidates who are real.

I built a parallel-agent workflow to make candidate background research faster. It made the research deeper instead. The speedup is real: about five hours for the first ten candidates this spring, against the fifteen to twenty-five the serial version would have run. The part I didn’t plan for is more useful than the speedup, and it only got clearer as the sample grew.

The setup. I’m hiring for a senior mobile role. Sixty-four candidates now, across five waves — the first three rounds in the spring, a handful of one-offs, then two larger sweeps as the applications piled up. Each one gets real public-footprint work before a calendar invite goes out: resume cross-reference, GitHub search, App Store verification, conference and podcast checks, employer-city sanity tests, and a Sherlock OSINT sweep for username discovery across 400+ sites. Twenty-five minutes per candidate, done seriously. Done serially, the arithmetic doesn’t survive contact with a normal week. Done in parallel through Claude Code, it does.

#The shape of the workflow

For each candidate I spin up one general-purpose agent with run_in_background: true.¹ Each agent gets the same brief: candidate identity, resume path, the full application content, the curated Sherlock dump, the role-specific signals to chase, and the sub-page template to fill out. I launch the whole wave in a single message and let them work.

One message fans out to an agent per candidate, each running in the background. The pages fan back in to a single recap.

The brief is the most-asked-for artifact when I describe this. Here it is, lightly redacted:

Background research on a job candidate. Public internet only. ~25 min budget.
Role: [Senior X Engineer at $COMPANY] -- [1-line role context].
Be candid -- surface signal, not be diplomatic.

# Candidate
- Name: ...
- Email: ...
- LinkedIn: ...
- Status: ...
- Resume: [path] ([size] KB)

# Application content
**Brief intro:** [paste]
**Deep-dive:** [paste]

# Role-specific research signals
[Strong / yellow / red flags -- pulled from the role's hiring rubric]

# Research targets
1. LinkedIn -- WebFetch (usually auth-walled; supplement with Google snippets)
2. GitHub -- handle variants; check for relevant language footprint
3. Personal site / blog / portfolio
4. Social: Twitter/X, Bluesky, Mastodon, Threads
5. Medium / Substack / dev.to
6. Conference talks, podcasts
7. App Store / Play Store -- verify any claimed apps + developer name
8. Employer verification (do the offices in claimed cities actually exist?)
9. Targeted verification of yellow flags from triage

# Sherlock results
Curated hits at [path]

# Time budget
~25 min. If you can't verify a yellow flag in 10 min, document the gap.

# Output format
[Notion sub-page template -- sources reviewed, confirmed claims with
confidence tags, surfaced (not in resume), flagged for interview, 3-5
interview questions, internet personality, public footprint summary]

While the agents run, I curate the raw Sherlock output (50-100 hits per username, maybe 5% actionable) and prep interview kits for candidates I already know will advance. Twenty minutes later, a wave of structured research pages exists. I read them in order, write a one-page recap with verdicts, and we move.

Round one took two hours for four candidates. Round two: ninety minutes for three. Round three: seventy-five minutes for three. The later sweeps ran bigger: seventeen candidates in an afternoon, then thirty-four in a long morning. The agents have not gotten faster. The brief, and the triage in front of it, have gotten tighter.

#What the signals look like together

Running a wave of agents at once surfaced something I didn’t plan for. Call it cluster detection. Any single AI-tell on a resume is noise — plenty of explanations besides “AI wrote this.” But lay a wave of investigations side by side in one comparison table and the same handful of signals fire on the same resumes. That cluster is the fingerprint. At ten candidates it was suggestive. At sixty-four it is the most reliable thing the workflow does.

Six signals show up reliably across the pass-grade resumes. The first four were there from the start; the last two are what the bigger sample surfaced.²

Signal	What it looks like	The read
Resume under 10 KB	Real document editors produce 100-200 KB; minimal AI templates land at 8-9 KB. The fastest first pass, basically an `ls -la`.	A flag, never a verdict. One real candidate’s spartan one-pager was 5.9 KB and verified clean on every other axis.
Verbatim JD vocabulary	Bespoke phrases from the job description reproduced word-for-word in the cover letter, in the order the JD lists them.	Authentic candidates rearrange and rephrase.
GitHub absence at senior level	No public artifact after a claimed decade: no dotfiles repo, no CocoaPod, no gist, no Stack Overflow answer.	Loud, with one real exception: a decade inside a company that owns the work product. Verify the employer first.
Claims that fail employer-by-employer checks	A regional office that isn’t in that city; a claimed App Store app missing under the named publisher; a platform the employer doesn’t work on.	About five minutes each to verify if you go looking.
A GitHub born for the application	A “ten-year engineer” whose entire footprint is timestamped to the week they applied (portfolio repos six to nine days old).	Presence, freshly manufactured. Check creation dates, not just whether the handle exists.
A footprint stitched from more than one person	Commits authored under a different name and email than the candidate; an “App Store link” that resolves to a different developer’s app.	The inverse of absence, and the signal I’d least want to miss, because it’s deliberate.

Two signals is worth a closer look. Three is a routing decision — and at this sample size, routing got more granular than pass or fail. More on that below.

One tell is an accident. The same cluster on three resumes, surfaced by agents that never compared notes, is a method.

#Cluster detection isn’t only useful against AI fakes

The parallel approach also catches things that have nothing to do with AI generation.

One candidate’s cover letter framed him as an actively-shipping engineer. A local press hit told a different story: he had stood up a full-time retail business more than a year earlier and was running it as his primary occupation. Both things were true. The application just neglected one of them. A single researcher might have missed the press hit entirely. The parallel agent flagged it because it was looking for that exact kind of cross-reference, on a brief written in advance.

The inverse case was more interesting. Another candidate’s application was thin — a one-sentence intro, an empty deep-dive box. On its own it read as low effort. The resume plus the public footprint surfaced a senior mobile developer with directly-relevant streaming and TV-platform experience, plus a prior research role building communication technology adjacent to the product we were hiring for. He does not market himself well. Triage alone would have killed him in round one. The agent surfaced the gap between the thin application and the dense reality.

A third real-candidate case only showed up at scale: the right engineer for the wrong role. A senior Android engineer, genuinely senior, applying to an iOS-first opening. A full-stack web engineer with a real shipping history and no mobile surface area at all. Nothing about these is fake. The footprint verifies cleanly. They are simply the wrong shape, and at ten candidates I didn’t see enough of them to name the category. At sixty-four it’s a routing bucket of its own.

#When three signals fire on a real candidate

The obvious counterargument. A candidate’s resume is compactly formatted because they prefer one-pagers. They have no GitHub because they spent a decade inside a company that owns the work product. They quoted the JD in their cover letter because they read it carefully. Three signals fire. You miss a real candidate.

The answer is the cluster threshold, plus the verification step. Three yellow flags doesn’t pass the candidate. It downgrades them to aggressive phone screen. The screen probes the specific signals that fired. If the employer claim verifies, the GitHub absence has a believable cause, the quoted JD phrasing turns out to be the candidate’s own framing of a hard problem we both care about — all three signals dissolve and the candidate advances. Cluster detection is a routing decision, not a rejection decision.

At ten candidates, “route, don’t reject” was a binary: aggressive screen, or advance. At sixty-four it fanned out into five buckets. The most recent sweep of thirty-four sorted like this:

Bucket	Count	Next step
Interview-grade	11	Straight to a screen
Screen-grade	8	Aggressive phone screen on the flagged signals
Right person, wrong shape	4	Real and senior, wrong stack for this role
Verify-first	5	A conversation before anything advances
Pass	6	The only bucket that’s a no

Verify-first is the interesting middle: a real person with real code whose resume embellishes hard enough to need a conversation first — a team hackathon rebranded as a solo award, repos backdated to look older than they are. The buckets earn their keep because they route to different next steps, and only the bottom one is a no.

Every candidate the workflow flagged hardest also failed the screen on independent grounds. The sample of hardest-flagged candidates is still small, so the false-positive count doesn’t generalize — but the asymmetry I care about does. The false-negative cost (interviewing a candidate whose claims don’t survive contact) was the cost I was trying to avoid in the first place.

#What this workflow doesn’t do

OSINT sweeps across 400+ public sites surface personal information that has no business in a hiring decision: dating profiles, religious community forums, family-status signals, political affiliations. The curation step discards all of it. The agent brief is scoped to the candidate’s professional surface area — the work they’ve built, the venues they’ve spoken at, the public artifacts they own. Personal-life surface area is filtered out before anything reaches the hiring panel. The discipline matters legally (EEOC-protected attributes can’t enter the evaluation file) and operationally (hiring decisions should turn on the work, not the person).

#Sherlock and the curation discipline

I lean on Sherlock for username discovery across 400+ public sites in one pass.³ What I do not do is treat raw Sherlock hits as evidence. A common username generates fifty to a hundred matches, most of them different people. The signal lives in the curated set: which platforms had a profile that actually matched the candidate’s job history, location, and stated interests. Real curation rate hovers around five percent. The other ninety-five is collision.

Sherlock surfaces. I verify. The split is the whole point.

#Cost, and the arms race

Five hours for the first ten. Sixty-four candidates in, the per-candidate cost keeps falling, because the brief is tighter and a triage step now runs first. The agent token cost still lands well under one senior-engineer-hour per wave. I haven’t tallied it more precisely than that.

It’s an arms race. AI-generated resumes exist because candidates are running their own parallel-agent workflows, tailoring ten applications to ten roles in the time it used to take to write one. The defender side of that equation is what this post describes. Same underlying capability, different ends.

Signal five is that arms race made visible. A year ago the tell was no GitHub. Now it’s a GitHub with a birthday three days before the application. The move is the same one it has always been: check the timestamps, not just the artifact.

#At scale, it stops catching and starts ranking

Here’s the shift the bigger sample forced. At ten candidates the workflow was a fake detector. At sixty-four, most of the fakes get caught in the first five minutes of triage, and what’s left is the harder, more useful problem: ranking the real ones.

Most real senior iOS candidates are solid generalists. They’ve shipped apps, their employers check out, their GitHub has the right sediment. What separates them is a rare signal the role actually needs — for this one, media and streaming depth. AVFoundation, HLS, the living-room screen. A decade of competent iOS work with zero AVFoundation is common. The few candidates with real tvOS or streaming experience — one who shipped a four-stream Twitch app for Apple TV, one with a decade at a streaming platform and an Apple TV launch to show for it — rank well above an equally-real generalist. Once the fakes are gone, the workflow’s job is to surface that rare signal and rank on it.

AI fluency is the same story one level up. It has become a resume keyword the way “agile” was: most candidates claim a CLAUDE.md, skills files, a daily Claude-and-Cursor habit, with no public artifact behind it. The rare one has an actual config file checked into a repo you can read. Treat the claim like any other — verify the artifact, not the vocabulary.⁴

#What stays when the model changes

Triage moved earlier, and it’s no longer hypothetical. A five-minute application-triage checklist — file size, JD-vocabulary check, App Store link verification, and now a glance at GitHub creation dates — runs before any agent launches. The agents point only at candidates who survive it, not at fabricated ones who’d waste the depth.

The compounding return is something else. The brief is the artifact; the agents are fungible — Claude Code today, whatever Anthropic ships next year, whatever Cursor or OpenAI builds for the same job. What survives a model change, a tool change, a team-member change is the brief. The agents just render it.

Hiring depth at speed used to cost a week. Now it costs an afternoon. And somewhere past the fortieth candidate, the harder problem took over: not which resumes are fake, but which of the real ones can actually do the rare part of the job.⁵

Notes

Claude Code's subagents and the run_in_background pattern are documented in the official SDK: docs.claude.com. The mechanic is straightforward; the discipline is in the brief. ↩
GitHub's Octoverse reports cover public-footprint baselines for staff-level engineers across roles. The short version: most career-stage engineers leave some public artifact, even when the bulk of their work is private. The exception class -- engineers who worked for a decade inside a company that owns their work product -- is real and deserves the verification step before treating absence as evidence. ↩
Sherlock Project: github.com/sherlock-project/sherlock. OSINT username enumeration across hundreds of public platforms. ↩
A companion post, A writing-voice guide for your AI assistants, makes the same bet one level up: the reusable instruction is the artifact, the model is just the renderer. ↩
Two follow-ups pick up where this leaves off: Discovery is cheap, curation is the cost on swapping the OSINT engine and what it revealed about the pool, and The interview after the research on turning the ranked pool into a 30-minute screen that grades itself. ↩