April 19, 2026

The code is no longer the signal

LeetCode screens and take-homes measured one thing: whether a candidate could write code alone. Claude Code made that question useless. Here's what we need to measure instead — and why no existing hiring tool can.

Paarth Jamdagneya

hiringai-collaborationpositioning

Sometime in the last eighteen months, the default way technical hiring worked stopped working.

For twenty years, the proxy was simple: can this person write code by themselves, under mild time pressure, in a sterile environment? LeetCode screens, take-home assignments, whiteboard interviews — different formats, same measurement. Nobody thought it was a great proxy. The classic criticisms applied: it rewarded pattern-matching over systems thinking, it filtered out senior engineers who'd forgotten the trick to reversing a linked list in thirty seconds, it mapped poorly to the actual job. But it was a tolerable bad proxy, because at least it measured one thing consistently: what the candidate could produce with their own hands, on their own.

Then Claude Code happened.

The break was sudden and total

This isn't a gradual shift. It's a phase change.

Agentic coding tools can read a repository, plan changes across files, run tests, iterate on failures, and ship a working PR in the time it takes a human to read the problem statement. CodeSignal's own March 2026 survey found 91% of engineers use agentic AI tools at work, and 75% shipped AI-generated production code in the last six months.

The companies setting the norms are already moving. Meta piloted AI-enabled coding interviews in October 2025. Shopify tells candidates to use whatever tools they want. The debate at the top of the market isn't whether AI belongs in the interview — it's how to grade the candidate who's obviously already using it.

If you ran a LeetCode screen today, you're not measuring the candidate's ability to write code. You're measuring whether they decided to cheat. Either they used AI and solved the problem instantly — in which case the signal is noise — or they didn't, and you rejected them for failing to use the tools their future coworkers use every day. The signal is inverted.

Take-homes aren't much better. A three-hour project a candidate could grind through in 2021 can be dispatched in thirty minutes with a half-decent prompt. The final diff tells you almost nothing about the person who submitted it.

The three wrong responses

Every technical hiring team is responding to this. Most of the responses are wrong.

Ban AI. Impossible to enforce, and the detectors are broken. HackerRank ships an AI-plagiarism layer. CodeSignal publishes "suspicion scores" from typing linearity, paste frequency, and pause patterns. Every one of those signals was trained on a pre-agentic world. Fast, linear typing with frequent pastes is what good agentic coding looks like — the cheaters and the serious engineers now throw the same signature. You cannot separate them with keystroke statistics.

Detect AI. Same failure, worse side effect: you punish your best candidates. A senior engineer pairing effectively with Claude Code types exactly like someone pasting an answer. You're running a filter that correlates with skill.

Skip take-homes entirely. Shift everything to live on-sites. But live interviews are expensive — five engineers, two hours each, per candidate — and they still don't tell you what you want to know. Does this person make good decisions when they're alone with an AI and a codebase? You can't observe that in a Zoom room.

None of these responses change the underlying problem: the thing we used to measure is no longer what we want to know.

What we actually want to know

We never actually cared whether a candidate could write code alone. We cared whether they'd be a good engineer on the team — which means: do they make good decisions under uncertainty, do they notice when something is wrong, do they push back when they should, do they know when to test, when to ship, and when to stop.

Those questions didn't used to be separable from "can you code." Writing the code was the medium through which we observed the judgment. Now that AI writes most of the code, the judgment has to be observed directly.

What does good AI-collaboration actually look like?

Prompt quality. Can the candidate specify a problem precisely? Do they provide context? Do they know what to ask for?
Pushback. When the AI suggests something wrong, do they catch it? Do they argue with it?
Verification. Do they run the tests? Read the diff? Spot-check the output, or accept it blindly?
Debug arcs. When something breaks, can they actually drive a fix — or do they spin in circles re-asking the AI the same question?
Decision points. The moments where the candidate chose one approach over another. Can they articulate why?

These aren't things you can grade from a final pull request. They live in the process — the prompts, the tool calls, the file reads, the terminal commands, the back-and-forth with the model.

To hire for them, you need observability into the workflow.

No existing tool captures this

Here's what every assessment platform does today:

HackerRank, CodeSignal, Codility run candidates in a hosted browser IDE with a chat pane bolted on. They capture keystrokes and a transcript with their own built-in AI. They have no visibility into Claude Code, Copilot, or anything running on the candidate's actual machine. CodeSignal's own Cheating & Fraud page admits this directly:

Desktop-based AI coding assistants operate outside the browser sandbox, meaning CodeSignal has no authority or technical means to monitor other software running on a candidate's machine.

CoderPad, Qualified, TestGorilla use the same sandbox model with thinner AI integration.
Karat and CodeInterview run human-led live interviews. No instrumentation. No scalable signal.

The architecture is wrong. A browser IDE with a keystroke recorder captures a movie of the candidate typing. It does not capture what happened. You cannot filter it. You cannot search across it. You cannot jump to the moment the candidate pushed back on a bad suggestion, because "pushback" is not a concept the capture layer understands.

What you need is an event log.

Observability, for hiring

When an engineer works, they produce structured events. A prompt sent to an AI. A tool call the AI executed. A file read. A file written. A diff applied. A command run. A decision made. These aren't keystrokes — they're semantic events. Each one answers a question a recruiter actually has.

Promptster captures that log.

We install hooks directly into Claude Code on the candidate's own laptop. Every event is normalized and streamed to our backend: prompt, file_diff, command, decision_event, mcp_call. The candidate works in their real environment — their editor, their dotfiles, their model preferences, their MCP servers. We don't replace the sandbox with a browser. We instrument the sandbox the candidate already has.

The reviewer dashboard reads from that event log. Filter sessions by decision type. Search prompts across every candidate you've evaluated. Diff the first file write against the final submission. Replay the exact sequence of tool calls that led to a change — not a video of typing, the events themselves.

Correctness gets verified with real test suites. We maintain a library of OSS bugs pinned to specific commits where the tests fail, and pass after the correct fix. Ground truth, not human review.

The new proxy

The old proxy was: can you write code alone?

The new proxy is: how do you work?

This isn't a skill a browser IDE can measure. It isn't a signal a keystroke recorder can extract. It's an observability problem — and observability has been a solved category in every other part of software for a decade. Hiring is just the last one to catch on.

Promptster is the observability layer for hiring engineers in the agentic era. If you care what a candidate will actually be like on your team, you have to stop grading the output and start reading the log.

If you want to see what a structured agentic session looks like in a recruiter UI, book a demo.