What if my engineers use Cursor, Windsurf, Codex, or Copilot?

We built the Claude Code adapter first because it has the cleanest tool-call telemetry of any agent — prompts, tool calls, file diffs, all structured. Codex and Cursor adapters are next on the roadmap, with Windsurf and Copilot following. Design partners get to influence the order. If your team is split across multiple agents today, tell us on the intake call — we'll be honest about timelines.

What if a candidate refuses to be recorded?

Consent-first by design. The session opens with an explicit screen listing every event type we capture (prompts, tool calls, file diffs) and every type we don't (keystrokes, screen, clipboard, webcam, browser history). If the candidate declines, no session runs and nothing is captured. You'll see the decline in your dashboard so you can offer an alternative.

What does the candidate actually see during the session?

Their normal editor, their repo, their flow. No browser lockdown, no proctoring, no second window watching them. Telemetry runs in the background via a Claude Code API proxy and native hooks. At the end they get their own debrief — what went well, one thing to watch — whether or not your team advances them.

Do I have to write my own assessment problems?

No. We custom-build and source assessments to your taste — your stack, your seniority bar, the kinds of bugs and decisions your team actually ships. You give us the role and a few representative problems you've seen in your codebase; we draft the assessment, you sign off. Design partners get this included; off-the-shelf problem packs are coming for everyone else.

How long does setup take?

About 15 minutes during the intake call. We generate an assessment key, you email it to candidates, they install the CLI with one curl command. Sessions stream back to your dashboard. No integration with your ATS required — though we can wire one up if you want.

How do I compare scores across different roles?

Per-role classifiers (backend, frontend, ML, SRE, infra) with a shared calibration model. Percentiles are within-role, not cross-role — a 78 in backend means top 28% of the backend cohort, not 78th percentile against every engineer who's ever taken an assessment. The factors and weights are visible in the dashboard so you can audit how each role is scored.

What happens after the design-partner period ends?

Founding price ($199/seat/mo) stays locked through 2028. You keep access to everything you had during the design-partner period — no feature claw-back, no bait-and-switch. If we raise list prices, you don't.

What's the pricing post-2028?

Public list price hasn't been set yet — we'll know more once we've calibrated against real funnels. Design partners get a 90-day heads-up before any price change applies to them. Grandfathered teams keep their founding rate. We'd rather be quiet about future pricing than make a promise we'll have to walk back.

No. We don't capture keystrokes, screen, clipboard, webcam, browser history, or biometrics. The diff between what we capture and what we don't is in our public repo — you can read it before you ship it to candidates. Process telemetry is the same shape of data you'd put in a PR description: what was prompted, what was tried, what was tested. Candidate-positive by design.

See the thinking behind the code

Promptster shows how your candidates actually build.

We sit inside the candidate's real coding session, sign the transcript, and hand you the evidence — so you can hire on signal, not theater.

Book a 15-min intake See a real session

promptster — auth-middleware — morgan.r@candidate.io

COMPLETED

Explorer

▾src

▾middleware

▾lib

▾types

▾tests

1--- a/src/middleware/auth.ts

2+++ b/src/middleware/auth.ts

3@@ -1,8 +1,56 @@

4 import { NextRequest, NextResponse } from "next/server"

5 import { verifyToken } from "@/lib/jwt"

7−export async function authMiddleware(req: NextRequest) {

8− const token = req.headers.get("authorization")

9− return NextResponse.next()

10+export async function authMiddleware(

11+ req: NextRequest

12+): Promise<NextResponse> {

13+ const token = req.headers.get("authorization")

14+ ?.replace("Bearer ", "")

16+ if (!token) {

17+ return NextResponse.json({ error: "Unauthorized" }, { status: 401 })

18+ }

20+ const payload = await verifyToken(token)

21+ if (!payload) {

22+ return NextResponse.json({ error: "Invalid token" }, { status: 401 })

23+ }

Timeline

prompttool callfile diffcommand

0:000:401:202:002:40

file diff·src/middleware/auth.ts (+48 −0)t=0:41

The problem · as of 2026

The interview no longer measuresthe job.Two candidates ship code that passes the same tests. One pasted what the model gave them; the other actually reasoned about the problem. Take-homes can’t tell them apart.

submission · both candidates

Passedseed tests · green CI · same files touched

legacy verdict

Candidate ASenior · 6 yrs

00:42

00:05prompt“write a stripe webhook handler in node, return 200 on success”
00:11ai pasteroutes/webhook.ts · +47 −0 · pasted verbatim
00:42submitsubmitted — no retry test, no duplicate-event test

Promptster signalPrompt-proxy

Candidate BSenior · 6 yrs

06:20

00:09readscanned spec — at-least-once delivery, signed payloads
00:38planidempotency key: event.id · DB unique constraint
01:05prompt“verify signature first, then idempotency-key lookup”
01:48editroutes/webhook.ts · +52 −0
02:30editmigrations/…add_webhook_events_unique.sql · +8 −0
03:12testreplayed same event twice — second is no-op ✓
03:55prompt“what if Stripe retries before our DB commit lands?”
04:30editwrap in transaction · row-level lock on event id
05:10testconcurrent replay (race) — one side-effect only ✓
05:48revertdropped early-200 return · breaks at-least-once
06:20submitsubmitted with notes on retries + reconciliation

Promptster signalArchitect

Same output. Different humans.Today’s interview only sees the green check.

opportunity cost · estimated

One bad senior hire isn’t a misstep. It’s a quarter.

6+ moLead time burned per bad senior hireRoadmap slip from a hire who can't actually orchestrate.
~$200KFully-loaded cost of one mis-hireComp, onboarding, ramp, severance, opportunity. Estimated.
47 hrsEngineering time per closed-loop interviewPanel × prep × debrief, summed across the team.
3–6 moMedian time-to-detect a mis-hireThen you run the loop again. And again.

Every false negative costs another loop. Every false positive costs the team’s lead time. Today’s interview can’t tell which is which.

How it works · 3 steps · ≈ 90 seconds of setup

Candidates work in their real editor. You get the signed transcript.

Your team emails the candidate a key.

You email the candidate one assessment key. No account, no portal — they install the CLI and type the key. Telemetry starts on their consent — never before.

# candidate

❯ curl -fsSL https://get.promptster.ai | sh

❯ promptster start PST-8R2M-4KVN

They work normally.

Claude Code — the tool they use every day. Same prompts, same tool calls, same flow. A lightweight Go binary routes Claude through our proxy and registers native hooks. Zero impact on their workflow.

● telemetry · active · proxy + hooks

prompts · responses · tool calls · file diffs

× keystrokes · × clipboard · × webcam · × screen

You open the case file in your browser.

A signed transcript with line-level attribution, a calibrated score against our reference cohort today — tuning to your funnel as you onboard — and a copy-paste-ready brief for your hiring panel. Every claim is replayable.

# reviewer · in your browser

app.promptster.ai / sessions / PST-8R2M-4KVN

→ 78/100 · top 28% · signed ✓

Insight, not surveillance

Everything a reviewer needs. Nothing a candidate should dread.

Scoring · the model

A classifier — not a chatbot
grading another chatbot.

Every session runs through a classifier we trained specifically on AI-tool orchestration: the things seniors do differently with Claude Code — scoping before writing, naming tradeoffs, adversarial prompts, self-correction, and knowing when not to delegate.

The output is a calibrated percentile, not a 1–5 vibes score. You see the factors, the weights, and the moments in the replay that moved the score.

Trained on real session telemetry — prompts, tool calls, and decision points, not surface-level diff stats.
Calibrated against our reference cohort today — tunes to your funnel as you onboard. A 78 means top 28% of the cohort we've graded so far, not a number on a rubric a chatbot made up.
Per-factor breakdown linked to replay timestamps. The score is auditable — every claim is a moment you can scrub back to.

Orchestration classifier · v0.4PST-3K9X-7FQM

78/ 100top 28%

Scoping before writingw 22%92

Tradeoff articulationw 18%84

Adversarial promptingw 18%71

Self-correction ratew 16%66

Edge-case ownershipw 14%48

Tool-call sequencingw 12%79

One factor flagged: edge-case ownership. Replay 44:55 — candidate soft-delegated to the agent.

v0.4 · weights locked · calibration ongoing with design partners

Why we're not HackerRank

A different category, not a
prettier leaderboard.

The comparison below is on what each platform actually measures, not which dashboard ships the slickest charts. HackerRank and CodeSignal grade the artifact. We grade the engineer behind it.

Comparison of HackerRank, CodeSignal, and Promptster across six dimensions of technical hiring assessment.
Dimension	HackerRankLegacy assessment	CodeSignalLegacy assessment	PromptsterProcess telemetry
AI-tool posture	Detect-and-block. Cat-and-mouse with copy-paste.	Detect-and-block. Lockdown + plagiarism checks.	Embrace. Measure how the candidate orchestrates the AI on purpose.
What you actually get	A final diff and a pass/fail score.	A final diff plus a percentile on an IQ-style scale.	Process telemetry — prompts, tool calls, attribution, decision points.
Candidate environment	Sandboxed editor in a locked-down browser.	Sandboxed editor in a locked-down browser.	Their own editor, their own repo, Claude Code. The job, basically.
Evidence behind the decision	The score. Trust it.	The score and a percentile. Trust them.	Replayable session, line-level attribution, signed transcript.
Auditability of the score	Black-box rubric — proprietary.	Black-box rubric — proprietary.	Open rubric. Every rationale links back to a moment in the replay.
Skills it actually measures	Algorithm trivia and timed problem-solving.	Algorithms plus IQ-style cognitive proxies.	Orchestration, judgment, AI-tool fluency — the work they'd actually do.

AI-tool posture

HackerRank: Detect-and-block. Cat-and-mouse with copy-paste.
CodeSignal: Detect-and-block. Lockdown + plagiarism checks.
Promptster: Embrace. Measure how the candidate orchestrates the AI on purpose.

What you actually get

HackerRank: A final diff and a pass/fail score.
CodeSignal: A final diff plus a percentile on an IQ-style scale.
Promptster: Process telemetry — prompts, tool calls, attribution, decision points.

Candidate environment

HackerRank: Sandboxed editor in a locked-down browser.
CodeSignal: Sandboxed editor in a locked-down browser.
Promptster: Their own editor, their own repo, Claude Code. The job, basically.

Evidence behind the decision

HackerRank: The score. Trust it.
CodeSignal: The score and a percentile. Trust them.
Promptster: Replayable session, line-level attribution, signed transcript.

Auditability of the score

HackerRank: Black-box rubric — proprietary.
CodeSignal: Black-box rubric — proprietary.
Promptster: Open rubric. Every rationale links back to a moment in the replay.

Skills it actually measures

HackerRank: Algorithm trivia and timed problem-solving.
CodeSignal: Algorithms plus IQ-style cognitive proxies.
Promptster: Orchestration, judgment, AI-tool fluency — the work they'd actually do.

Design partner program · by invitation

Twelve teams get
founding spots.

Paarth, founder

Promptster · building it with you

I'm an engineer. Heard “everyone is cheating” at a career fair from a recruiter who wasn't even being quiet about it, then sat my own proctored OA two weeks later and watched the test become a test of whether I'd be the chump. UT-Austin → built Promptster after interviewing at 30+ companies and watching the take-home stop telling anyone anything.

If you hire 5+ engineers a year and your current take-home can't tell paste from craft, I'd love to build this with you. Twelve teams on the founding cohort — small enough that I'm in the room for every one of them.

Here's what that actually looks like: a weekly 45-minute call with me, on the record, walking through your replays together — what the transcript is telling you that the diff isn't. Your founding price stays locked through 2028, even when we raise. Features you ask for ship under your team's name on the changelog.

Book a 15-min intake and we'll talk through your roles. If it isn't a fit, I'll say so on the call.

Assessments custom-built to your taste — we source and tailor problems to your stack, your seniority bar, and the bugs you actually ship. You don't write the questions; we do, and you sign off.
Weekly founder session — 45 min, on the record, about your roles and what the transcript is actually telling you.
Everything unlocked — OSS issue library, hiring brief, interview probes, calibrated percentiles. No gating.
Founding price — locked in. If we raise, you don't.
Name on the changelog — features you ask for ship under your team's name.

founding price · lockedbilled monthly

$499$199/seat/mo

≈ $3.6K/seat saved per year

Cancel anytime — they don't

price lock

$199 holds through 2028 — even when we raise.

Book a 15-min intake

15-min intake · 14-day free trial on all plans

FAQ · the objections worth answering

Honest answers,
not marketing answers.

What if my engineers use Cursor, Windsurf, Codex, or Copilot?
We built the Claude Code adapter first because it has the cleanest tool-call telemetry of any agent — prompts, tool calls, file diffs, all structured. Codex and Cursor adapters are next on the roadmap, with Windsurf and Copilot following. Design partners get to influence the order. If your team is split across multiple agents today, tell us on the intake call — we'll be honest about timelines.
What if a candidate refuses to be recorded?
Consent-first by design. The session opens with an explicit screen listing every event type we capture (prompts, tool calls, file diffs) and every type we don't (keystrokes, screen, clipboard, webcam, browser history). If the candidate declines, no session runs and nothing is captured. You'll see the decline in your dashboard so you can offer an alternative.
What does the candidate actually see during the session?
Their normal editor, their repo, their flow. No browser lockdown, no proctoring, no second window watching them. Telemetry runs in the background via a Claude Code API proxy and native hooks. At the end they get their own debrief — what went well, one thing to watch — whether or not your team advances them.
Do I have to write my own assessment problems?
No. We custom-build and source assessments to your taste — your stack, your seniority bar, the kinds of bugs and decisions your team actually ships. You give us the role and a few representative problems you've seen in your codebase; we draft the assessment, you sign off. Design partners get this included; off-the-shelf problem packs are coming for everyone else.
How long does setup take?
About 15 minutes during the intake call. We generate an assessment key, you email it to candidates, they install the CLI with one curl command. Sessions stream back to your dashboard. No integration with your ATS required — though we can wire one up if you want.
How do I compare scores across different roles?
Per-role classifiers (backend, frontend, ML, SRE, infra) with a shared calibration model. Percentiles are within-role, not cross-role — a 78 in backend means top 28% of the backend cohort, not 78th percentile against every engineer who's ever taken an assessment. The factors and weights are visible in the dashboard so you can audit how each role is scored.
What happens after the design-partner period ends?
Founding price ($199/seat/mo) stays locked through 2028. You keep access to everything you had during the design-partner period — no feature claw-back, no bait-and-switch. If we raise list prices, you don't.
What's the pricing post-2028?
Public list price hasn't been set yet — we'll know more once we've calibrated against real funnels. Design partners get a 90-day heads-up before any price change applies to them. Grandfathered teams keep their founding rate. We'd rather be quiet about future pricing than make a promise we'll have to walk back.
Is this spyware?
No. We don't capture keystrokes, screen, clipboard, webcam, browser history, or biometrics. The diff between what we capture and what we don't is in our public repo — you can read it before you ship it to candidates. Process telemetry is the same shape of data you'd put in a PR description: what was prompted, what was tried, what was tested. Candidate-positive by design.

Read the process,
not just the commit.

Twelve founding teams will ship this with us. If you hire 5+ engineers a year and your current take-home can't tell paste from craft, we should talk.

Founding seat$499$199/seat/molocked through 20281 of 12 claimed

Book a 15-min intake See how it works

Claude Code todayCodex + Cursor adapters next

Promptster shows how your candidates actually build.

The interview no longer measuresthe job.Two candidates ship code that passes the same tests. One pasted what the model gave them; the other actually reasoned about the problem. Take-homes can’t tell them apart.

One bad senior hire isn’t a misstep. It’s a quarter.

Candidates work in their real editor. You get the signed transcript.

Your team emails the candidate a key.

They work normally.

You open the case file in your browser.

Everything a reviewer needs. Nothing a candidate should dread.

The summary you actually read.

Line-by-line, human or AI.

Scrub the whole session.

One paragraph, copy-paste.

A classifier — not a chatbot
grading another chatbot.

A different category, not a
prettier leaderboard.

Twelve teams get
founding spots.

Honest answers,
not marketing answers.

Read the process,
not just the commit.

Promptster shows how your candidates actually build.

The interview no longer measuresthe job.Two candidates ship code that passes the same tests. One pasted what the model gave them; the other actually reasoned about the problem. Take-homes can’t tell them apart.

Candidates work in their real editor. You get the signed transcript.

Your team emails the candidate a key.

They work normally.

You open the case file in your browser.

Everything a reviewer needs. Nothing a candidate should dread.

The summary you actually read.

Line-by-line, human or AI.

Scrub the whole session.

One paragraph, copy-paste.

A classifier — not a chatbotgrading another chatbot.

A different category, not aprettier leaderboard.

Twelve teams getfounding spots.

Honest answers,not marketing answers.

Read the process,not just the commit.

A classifier — not a chatbot
grading another chatbot.

A different category, not a
prettier leaderboard.

Twelve teams get
founding spots.

Honest answers,
not marketing answers.

Read the process,
not just the commit.