L
Now Accepting Early Access

The litmus test forhow engineers actually work

Put engineers in a real environment — browser, AI, terminal, everything — and see how they actually work. The AI then evaluates how they use AI.

The way we measure engineering skill is stuck in 2015.

01

Engineers use AI daily, but we don’t measure it

Every engineer ships with AI. But interviews, assessments, and training programs still pretend it doesn’t exist.

02

We test the wrong things

Leetcode measures memorization. Take-homes measure free time. Neither measures how someone actually builds software.

03

No visibility into process

You see the final output, never the journey. Did they plan first? Did they verify AI output? Did they debug systematically?

04

Work trials are the gold standard, but don’t scale

The best signal is watching someone do the work. For hiring, training, or self-assessment — we make that possible at scale.

“We don't just check if the code works. We measure how they think.”
What We Measure

The Skills That Actually Matter

psychology

Verification Depth

Do they look past surface-level correctness? We track if they test edge cases, check for scale issues, and validate AI-generated code.

account_tree

Architectural Reasoning

Do they understand the system as a whole? We measure how they reason about code organization, dependencies, and maintainability.

edit_note

Spec-Driven Approach

Do they plan before coding? We detect if they write specs, define requirements, or break down problems before prompting AI.

smart_toy

AI Collaboration Skill

Do they treat AI as an oracle or an intern? We score prompt quality, context-setting, output verification, and iteration on AI suggestions.

bug_report

Debugging Methodology

Systematic or random? We track how they diagnose issues—do they read errors, isolate variables, or just re-prompt AI hoping for a fix?

shield

Quality Gate Awareness

Do they set up linting, type checking, error handling, and testing? Strong engineers establish guardrails before writing code.

psychology

AI-Scored
Evaluation

Define the skills that matter. AI scores every session against your criteria — for hiring, training, or self-assessment.

01
terminal
Cloud
Sandboxes
call
Live Video
Calls

Every Interaction Captured

Every keystroke, AI prompt, terminal command, and file change is recorded. See not just what was built\u2014but how the engineer thinks.

replay
Live & Replay

See the process,
not just the output.

Whether you're evaluating a candidate, training a team, or improving your own workflow — Litmus captures the full picture. Drop into live sessions, replay recordings, and review AI-scored diffs.

call
Live Drop-in Calls
replay
Session Timeline
difference
Code Diff Review
+ export const useScale = (node) => {
+ const [scale, setScale] = useState(1);
- function handleResize() {
+ useEffect(() => {
+ const obs = new ResizeObserver...
Baseline vs Submission DiffSESSION REVIEW
AI SCORED:
RUBRIC-BASED EVALUATION
The Platform

How It Works

01. Set Up the Environment

Connect a GitHub repo or choose a template. Define what skills matter. Share a single link — for a candidate, a team member, or yourself.

const session = {
repo: 'your-org/your-repo',
criteria: customRubric,
mode: 'hire | train | practice'
};

02. Build in a Real Environment

A browser-based IDE with AI assistant, terminal, and live preview. No local setup. Build, debug, and refactor real code — the way you actually work.

await sandbox.start(candidate);
// Recording every interaction...
// AI assistant available...

03. Get the Full Picture

AI scores every session against your criteria. Review the code diff, recording timeline, and behavioral signals — whether you're hiring, coaching, or self-improving.

return {
score: 87,
diff: baselineVsSubmission,
timeline: recordingEvents
};
Without Litmus
  • close

    You see the final output, never the process behind it

  • close

    No way to measure how engineers actually use AI

  • close

    Assessments and training feel artificial — nothing like real work

With Litmus
  • check_circle

    Full visibility into how engineers think, prompt, debug, and build

  • check_circle

    AI evaluates AI collaboration — the skill that matters most now

  • check_circle

    Real environment, real code, real tools — for hiring, training, or practice

One Platform, Three Perspectives

It works three ways.

person_search

For Recruiters

Evaluate candidates on real problems at scale. Jump into any live session. Stop guessing—see exactly how they think.

corporate_fare

For Companies

Train your existing engineers on AI collaboration. See where your team’s gaps actually are, not where you assume they are.

school

For Candidates

Practice on real problems and get AI feedback on your own process. Know where you stand before the interview.

Litmus tests what
actually matters.

Still early — looking for recruiters, companies, and candidates to try this out.

check_circleInvitation will be dispatched shortly

L