Skip to content
See the thinking behind the code

Promptster shows how your candidates actually build.

We sit inside the candidate's real coding session, sign the transcript, and hand you the evidence — so you can hire on signal, not theater.

The problem · as of 2026

The interview no longer measuresthe job.Two candidates ship code that passes the same tests. One pasted what the model gave them; the other actually reasoned about the problem. Take-homes can’t tell them apart.

A
Candidate ASenior · 6 yrs
00:42
  1. 00:05prompt“write a stripe webhook handler in node, return 200 on success”
  2. 00:11ai pasteroutes/webhook.ts · +47 −0 · pasted verbatim
  3. 00:42submitsubmitted — no retry test, no duplicate-event test
Promptster signalPrompt-proxy
B
Candidate BSenior · 6 yrs
06:20
  1. 00:09readscanned spec — at-least-once delivery, signed payloads
  2. 00:38planidempotency key: event.id · DB unique constraint
  3. 01:05prompt“verify signature first, then idempotency-key lookup”
  4. 01:48editroutes/webhook.ts · +52 −0
  5. 02:30editmigrations/…add_webhook_events_unique.sql · +8 −0
  6. 03:12testreplayed same event twice — second is no-op ✓
  7. 03:55prompt“what if Stripe retries before our DB commit lands?”
  8. 04:30editwrap in transaction · row-level lock on event id
  9. 05:10testconcurrent replay (race) — one side-effect only ✓
  10. 05:48revertdropped early-200 return · breaks at-least-once
  11. 06:20submitsubmitted with notes on retries + reconciliation
Promptster signalArchitect

Same output. Different humans.Today’s interview only sees the green check.

opportunity cost · estimated

One bad senior hire isn’t a misstep. It’s a quarter.

  1. 6+ moLead time burned per bad senior hireRoadmap slip from a hire who can't actually orchestrate.
  2. ~$200KFully-loaded cost of one mis-hireComp, onboarding, ramp, severance, opportunity. Estimated.
  3. 47 hrsEngineering time per closed-loop interviewPanel × prep × debrief, summed across the team.
  4. 3–6 moMedian time-to-detect a mis-hireThen you run the loop again. And again.

Every false negative costs another loop. Every false positive costs the team’s lead time. Today’s interview can’t tell which is which.

How it works · 3 steps · ≈ 90 seconds of setup

Candidates work in their real editor. You get the signed transcript.

01

Your team emails the candidate a key.

You email the candidate one assessment key. No account, no portal — they install the CLI and type the key. Telemetry starts on their consent — never before.

# candidate
curl -fsSL https://get.promptster.ai | sh
promptster start PST-8R2M-4KVN
02

They work normally.

Claude Code — the tool they use every day. Same prompts, same tool calls, same flow. A lightweight Go binary routes Claude through our proxy and registers native hooks. Zero impact on their workflow.

● telemetry · active · proxy + hooks
prompts · responses · tool calls · file diffs
× keystrokes · × clipboard · × webcam · × screen
03

You open the case file in your browser.

A signed transcript with line-level attribution, a calibrated score against our reference cohort today — tuning to your funnel as you onboard — and a copy-paste-ready brief for your hiring panel. Every claim is replayable.

# reviewer · in your browser
app.promptster.ai / sessions / PST-8R2M-4KVN
78/100 · top 28% · signed ✓
Insight, not surveillance

Everything a reviewer needs. Nothing a candidate should dread.

Scoring · the model

A classifier — not a chatbot
grading another chatbot.

Every session runs through a classifier we trained specifically on AI-tool orchestration: the things seniors do differently with Claude Code — scoping before writing, naming tradeoffs, adversarial prompts, self-correction, and knowing when not to delegate.

The output is a calibrated percentile, not a 1–5 vibes score. You see the factors, the weights, and the moments in the replay that moved the score.

  • Trained on real session telemetry — prompts, tool calls, and decision points, not surface-level diff stats.
  • Calibrated against our reference cohort today — tunes to your funnel as you onboard. A 78 means top 28% of the cohort we've graded so far, not a number on a rubric a chatbot made up.
  • Per-factor breakdown linked to replay timestamps. The score is auditable — every claim is a moment you can scrub back to.
Orchestration classifier · v0.4PST-3K9X-7FQM
78/ 100top 28%
Scoping before writingw 22%92
Tradeoff articulationw 18%84
Adversarial promptingw 18%71
Self-correction ratew 16%66
Edge-case ownershipw 14%48
Tool-call sequencingw 12%79
One factor flagged: edge-case ownership. Replay 44:55 — candidate soft-delegated to the agent.
v0.4 · weights locked · calibration ongoing with design partners
Why we're not HackerRank

A different category, not a
prettier leaderboard.

The comparison below is on what each platform actually measures, not which dashboard ships the slickest charts. HackerRank and CodeSignal grade the artifact. We grade the engineer behind it.

Comparison of HackerRank, CodeSignal, and Promptster across six dimensions of technical hiring assessment.
DimensionHackerRankLegacy assessmentCodeSignalLegacy assessmentPromptsterProcess telemetry
AI-tool postureDetect-and-block. Cat-and-mouse with copy-paste.Detect-and-block. Lockdown + plagiarism checks.Embrace. Measure how the candidate orchestrates the AI on purpose.
What you actually getA final diff and a pass/fail score.A final diff plus a percentile on an IQ-style scale.Process telemetry — prompts, tool calls, attribution, decision points.
Candidate environmentSandboxed editor in a locked-down browser.Sandboxed editor in a locked-down browser.Their own editor, their own repo, Claude Code. The job, basically.
Evidence behind the decisionThe score. Trust it.The score and a percentile. Trust them.Replayable session, line-level attribution, signed transcript.
Auditability of the scoreBlack-box rubric — proprietary.Black-box rubric — proprietary.Open rubric. Every rationale links back to a moment in the replay.
Skills it actually measuresAlgorithm trivia and timed problem-solving.Algorithms plus IQ-style cognitive proxies.Orchestration, judgment, AI-tool fluency — the work they'd actually do.
Design partner program · by invitation

Twelve teams get
founding spots.

Paarth, founder
Promptster · building it with you

I'm an engineer. Heard “everyone is cheating” at a career fair from a recruiter who wasn't even being quiet about it, then sat my own proctored OA two weeks later and watched the test become a test of whether I'd be the chump. UT-Austin → built Promptster after interviewing at 30+ companies and watching the take-home stop telling anyone anything.

If you hire 5+ engineers a year and your current take-home can't tell paste from craft, I'd love to build this with you. Twelve teams on the founding cohort — small enough that I'm in the room for every one of them.

Here's what that actually looks like: a weekly 45-minute call with me, on the record, walking through your replays together — what the transcript is telling you that the diff isn't. Your founding price stays locked through 2028, even when we raise. Features you ask for ship under your team's name on the changelog.

Book a 15-min intake and we'll talk through your roles. If it isn't a fit, I'll say so on the call.

  • Assessments custom-built to your taste — we source and tailor problems to your stack, your seniority bar, and the bugs you actually ship. You don't write the questions; we do, and you sign off.
  • Weekly founder session — 45 min, on the record, about your roles and what the transcript is actually telling you.
  • Everything unlocked — OSS issue library, hiring brief, interview probes, calibrated percentiles. No gating.
  • Founding price — locked in. If we raise, you don't.
  • Name on the changelog — features you ask for ship under your team's name.
founding price · lockedbilled monthly
$499$199/seat/mo
≈ $3.6K/seat saved per year
Cancel anytime — they don't
price lock
$199 holds through 2028 — even when we raise.
Book a 15-min intake
15-min intake · 14-day free trial on all plans
FAQ · the objections worth answering

Honest answers,
not marketing answers.

  • What if my engineers use Cursor, Windsurf, Codex, or Copilot?
    We built the Claude Code adapter first because it has the cleanest tool-call telemetry of any agent — prompts, tool calls, file diffs, all structured. Codex and Cursor adapters are next on the roadmap, with Windsurf and Copilot following. Design partners get to influence the order. If your team is split across multiple agents today, tell us on the intake call — we'll be honest about timelines.
  • What if a candidate refuses to be recorded?
    Consent-first by design. The session opens with an explicit screen listing every event type we capture (prompts, tool calls, file diffs) and every type we don't (keystrokes, screen, clipboard, webcam, browser history). If the candidate declines, no session runs and nothing is captured. You'll see the decline in your dashboard so you can offer an alternative.
  • What does the candidate actually see during the session?
    Their normal editor, their repo, their flow. No browser lockdown, no proctoring, no second window watching them. Telemetry runs in the background via a Claude Code API proxy and native hooks. At the end they get their own debrief — what went well, one thing to watch — whether or not your team advances them.
  • Do I have to write my own assessment problems?
    No. We custom-build and source assessments to your taste — your stack, your seniority bar, the kinds of bugs and decisions your team actually ships. You give us the role and a few representative problems you've seen in your codebase; we draft the assessment, you sign off. Design partners get this included; off-the-shelf problem packs are coming for everyone else.
  • How long does setup take?
    About 15 minutes during the intake call. We generate an assessment key, you email it to candidates, they install the CLI with one curl command. Sessions stream back to your dashboard. No integration with your ATS required — though we can wire one up if you want.
  • How do I compare scores across different roles?
    Per-role classifiers (backend, frontend, ML, SRE, infra) with a shared calibration model. Percentiles are within-role, not cross-role — a 78 in backend means top 28% of the backend cohort, not 78th percentile against every engineer who's ever taken an assessment. The factors and weights are visible in the dashboard so you can audit how each role is scored.
  • What happens after the design-partner period ends?
    Founding price ($199/seat/mo) stays locked through 2028. You keep access to everything you had during the design-partner period — no feature claw-back, no bait-and-switch. If we raise list prices, you don't.
  • What's the pricing post-2028?
    Public list price hasn't been set yet — we'll know more once we've calibrated against real funnels. Design partners get a 90-day heads-up before any price change applies to them. Grandfathered teams keep their founding rate. We'd rather be quiet about future pricing than make a promise we'll have to walk back.
  • Is this spyware?
    No. We don't capture keystrokes, screen, clipboard, webcam, browser history, or biometrics. The diff between what we capture and what we don't is in our public repo — you can read it before you ship it to candidates. Process telemetry is the same shape of data you'd put in a PR description: what was prompted, what was tried, what was tested. Candidate-positive by design.
On the record · signed · replayable

Read the process,
not just the commit.

Twelve founding teams will ship this with us. If you hire 5+ engineers a year and your current take-home can't tell paste from craft, we should talk.

Founding seat$499$199/seat/molocked through 20281 of 12 claimed
Claude Code todayCodex + Cursor adapters next