Put engineers in a real environment — browser, AI, terminal, everything — and see how they actually work. The AI then evaluates how they use AI.
The way we measure engineering skill is stuck in 2015.
Engineers use AI daily, but we don’t measure it
Every engineer ships with AI. But interviews, assessments, and training programs still pretend it doesn’t exist.
We test the wrong things
Leetcode measures memorization. Take-homes measure free time. Neither measures how someone actually builds software.
No visibility into process
You see the final output, never the journey. Did they plan first? Did they verify AI output? Did they debug systematically?
Work trials are the gold standard, but don’t scale
The best signal is watching someone do the work. For hiring, training, or self-assessment — we make that possible at scale.
“We don't just check if the code works. We measure how they think.”
The Skills That Actually Matter
Verification Depth
Do they look past surface-level correctness? We track if they test edge cases, check for scale issues, and validate AI-generated code.
Architectural Reasoning
Do they understand the system as a whole? We measure how they reason about code organization, dependencies, and maintainability.
Spec-Driven Approach
Do they plan before coding? We detect if they write specs, define requirements, or break down problems before prompting AI.
AI Collaboration Skill
Do they treat AI as an oracle or an intern? We score prompt quality, context-setting, output verification, and iteration on AI suggestions.
Debugging Methodology
Systematic or random? We track how they diagnose issues—do they read errors, isolate variables, or just re-prompt AI hoping for a fix?
Quality Gate Awareness
Do they set up linting, type checking, error handling, and testing? Strong engineers establish guardrails before writing code.
AI-Scored
Evaluation
Define the skills that matter. AI scores every session against your criteria — for hiring, training, or self-assessment.
Cloud
Sandboxes
Live Video
Calls
Every Interaction Captured
Every keystroke, AI prompt, terminal command, and file change is recorded. See not just what was built\u2014but how the engineer thinks.
See the process,
not just the output.
Whether you're evaluating a candidate, training a team, or improving your own workflow — Litmus captures the full picture. Drop into live sessions, replay recordings, and review AI-scored diffs.
RUBRIC-BASED EVALUATION
How It Works
01. Set Up the Environment
Connect a GitHub repo or choose a template. Define what skills matter. Share a single link — for a candidate, a team member, or yourself.
repo: 'your-org/your-repo',
criteria: customRubric,
mode: 'hire | train | practice'
};
02. Build in a Real Environment
A browser-based IDE with AI assistant, terminal, and live preview. No local setup. Build, debug, and refactor real code — the way you actually work.
// Recording every interaction...
// AI assistant available...
03. Get the Full Picture
AI scores every session against your criteria. Review the code diff, recording timeline, and behavioral signals — whether you're hiring, coaching, or self-improving.
score: 87,
diff: baselineVsSubmission,
timeline: recordingEvents
};
Without Litmus
- close
You see the final output, never the process behind it
- close
No way to measure how engineers actually use AI
- close
Assessments and training feel artificial — nothing like real work
With Litmus
- check_circle
Full visibility into how engineers think, prompt, debug, and build
- check_circle
AI evaluates AI collaboration — the skill that matters most now
- check_circle
Real environment, real code, real tools — for hiring, training, or practice
It works three ways.
For Recruiters
Evaluate candidates on real problems at scale. Jump into any live session. Stop guessing—see exactly how they think.
For Companies
Train your existing engineers on AI collaboration. See where your team’s gaps actually are, not where you assume they are.
For Candidates
Practice on real problems and get AI feedback on your own process. Know where you stand before the interview.
Litmus tests what
actually matters.
Still early — looking for recruiters, companies, and candidates to try this out.
check_circleInvitation will be dispatched shortly
