April 18, 2026

CodeSignal watches a screen recording. Promptster reads the event log.

CodeSignal just launched agentic coding assessments. Their own cheating page tells you why a browser IDE is the wrong surface to measure AI collaboration.

Paarth Jamdagneya

positioningcodesignalai-collaboration

On April 2, 2026, CodeSignal launched "Agentic Coding Assessments" — a product that lets candidates use Claude Code or Codex to build a solution, then explain their reasoning to a human reviewer.

It sounds like what we do. It isn't.

Their own cheating page gives away the game

Quoted directly from CodeSignal's Cheating & Fraud page:

Desktop-based AI coding assistants operate outside the browser sandbox, meaning CodeSignal has no authority or technical means to monitor other software running on a candidate's machine.

Every CodeSignal assessment happens inside a hosted browser IDE. Every Claude Code session, every MCP call a candidate makes on their own machine — invisible to them. They can record what's typed into their web editor and what shows up in a chat pane bolted to the side. Everything else is outside the sandbox.

That's not a gap they can close with a feature update. It's the ceiling of the architecture.

What a "session replay" actually contains

CodeSignal's reviewer UI surfaces two things from an agentic assessment:

A chat transcript. What the candidate typed at the AI, and what the AI replied.
A keystroke-level playback. Variable-speed scrubbing, skipping idle time, like watching a screen recording.

That's a movie. You can't filter it. You can't search it. You can't jump to the moment the candidate pushed back on a bad suggestion, because "pushback" isn't a concept the capture layer understands.

What Promptster captures

When a candidate runs promptster start PST-XXXX, we install hooks directly into their Claude Code on their own machine. Every event is normalized into a structured type:

type TimelineEvent =
  | { kind: "prompt"; text: string; turnId: string }
  | { kind: "file_diff"; path: string; before: string; after: string }
  | { kind: "command"; source: "agent" | "terminal"; argv: string[] }
  | { kind: "decision_event"; rationale: string; references: string[] }
  | { kind: "mcp_call"; server: string; tool: string; args: unknown }

The reviewer dashboard reads from that event log. You can filter by decision. You can search prompts across every session the candidate ran. You can diff the first file write against the final submission. You can replay the exact sequence of tool calls that produced a change — not a video of it, the events themselves.

The comparison that matters

	CodeSignal Agentic	Promptster
Candidate environment	Hosted browser IDE	Candidate's real machine
Claude Code visibility	"No technical means to monitor"	Hook-level capture
Capture fidelity	Chat transcript + keystrokes	Normalized events: prompt, diff, command, decision, MCP
Reviewer UI	Video playback	Structured timeline, searchable, filterable
Code correctness	Human-graded	Automated passing test suites (23 verified OSS bugs with `brokenSha`)
Explain phase	"Human reviewer" — format undisclosed	`promptster explain` inline, auto-scored by rubric
AI-collab rubric	Not published	Published, scored, weighted
Cheating heuristics	Typing linearity, paste events	Prompt↔diff contradiction, code provenance, git-state anomalies

Why this matters for your hiring loop

Every cheating signal CodeSignal trained on — "unusually linear typing," "minimal pauses between characters," "high paste volume" — is a description of good agentic coding. A senior engineer collaborating with Claude Code looks, from keystroke telemetry, exactly like someone pasting an answer.

The integrity story has to change. Ours is: we don't flag fast typing. We flag contradictions between the candidate's prompt history and the final diff. We flag unattributed code provenance. We flag git-state anomalies. None of that is measurable from a browser sandbox.

The one-liner