LITMUS | The Intellectual Architect

Now Accepting Early Access

The litmus test forhow engineers actually work

Put engineers in a real environment — browser, AI, terminal, everything — and see how they actually work. The AI then evaluates how they use AI.

Get Early Access See How It Works arrow_downward

The way we measure engineering skill is stuck in 2015.

Engineers use AI daily, but we don’t measure it

Every engineer ships with AI. But interviews, assessments, and training programs still pretend it doesn’t exist.

We test the wrong things

Leetcode measures memorization. Take-homes measure free time. Neither measures how someone actually builds software.

No visibility into process

You see the final output, never the journey. Did they plan first? Did they verify AI output? Did they debug systematically?

Work trials are the gold standard, but don’t scale

The best signal is watching someone do the work. For hiring, training, or self-assessment — we make that possible at scale.

“We don't just check if the code works. We measure how they think.”

What We Measure

The Skills That Actually Matter

psychology

Verification Depth

Do they look past surface-level correctness? We track if they test edge cases, check for scale issues, and validate AI-generated code.

account_tree

Architectural Reasoning

Do they understand the system as a whole? We measure how they reason about code organization, dependencies, and maintainability.

edit_note

Spec-Driven Approach

Do they plan before coding? We detect if they write specs, define requirements, or break down problems before prompting AI.

smart_toy

AI Collaboration Skill

Do they treat AI as an oracle or an intern? We score prompt quality, context-setting, output verification, and iteration on AI suggestions.

bug_report

Debugging Methodology

Systematic or random? We track how they diagnose issues—do they read errors, isolate variables, or just re-prompt AI hoping for a fix?

shield

Quality Gate Awareness

Do they set up linting, type checking, error handling, and testing? Strong engineers establish guardrails before writing code.

psychology

AI-Scored
Evaluation

Define the skills that matter. AI scores every session against your criteria — for hiring, training, or self-assessment.

terminal

Cloud
Sandboxes

call

Live Video
Calls

Every Interaction Captured

Every keystroke, AI prompt, terminal command, and file change is recorded. See not just what was built\u2014but how the engineer thinks.

replay

Live & Replay

See the process,
not just the output.

Whether you're evaluating a candidate, training a team, or improving your own workflow — Litmus captures the full picture. Drop into live sessions, replay recordings, and review AI-scored diffs.

call

Live Drop-in Calls

replay

Session Timeline

difference

Code Diff Review

+ export const useScale = (node) => {
+   const [scale, setScale] = useState(1);
- function handleResize() {
+   useEffect(() => {
+     const obs = new ResizeObserver...
Baseline vs Submission DiffSESSION REVIEW

AI SCORED:
RUBRIC-BASED EVALUATION

The Platform

How It Works

01. Set Up the Environment

Connect a GitHub repo or choose a template. Define what skills matter. Share a single link — for a candidate, a team member, or yourself.

const session = {
repo: 'your-org/your-repo',
criteria: customRubric,
mode: 'hire | train | practice'
};

02. Build in a Real Environment

A browser-based IDE with AI assistant, terminal, and live preview. No local setup. Build, debug, and refactor real code — the way you actually work.

await sandbox.start(candidate);
// Recording every interaction...
// AI assistant available...

03. Get the Full Picture

AI scores every session against your criteria. Review the code diff, recording timeline, and behavioral signals — whether you're hiring, coaching, or self-improving.

return {
score: 87,
diff: baselineVsSubmission,
timeline: recordingEvents
};

Without Litmus

close
You see the final output, never the process behind it
close
No way to measure how engineers actually use AI
close
Assessments and training feel artificial — nothing like real work

With Litmus

check_circle
Full visibility into how engineers think, prompt, debug, and build
check_circle
AI evaluates AI collaboration — the skill that matters most now
check_circle
Real environment, real code, real tools — for hiring, training, or practice

One Platform, Three Perspectives

It works three ways.

person_search

For Recruiters

Evaluate candidates on real problems at scale. Jump into any live session. Stop guessing—see exactly how they think.

corporate_fare

For Companies

Train your existing engineers on AI collaboration. See where your team’s gaps actually are, not where you assume they are.

school

For Candidates

Practice on real problems and get AI feedback on your own process. Know where you stand before the interview.

Litmus tests what
actually matters.

Still early — looking for recruiters, companies, and candidates to try this out.

check_circleInvitation will be dispatched shortly