Claude Code QA Setup: CLAUDE.md and Skills Guide

Four files. Zero hand-holding. Here's exactly what's in them and why.

Claude Code out of the box writes mediocre tests. Not because it's bad at coding — because it doesn't know your standards. It guesses. And in QA, guessing is expensive.

I spent the first few weeks fighting it. I'd ask for a Playwright test, get something that looked reasonable, run it — and watch it fall apart. Wrong locators. No label assertions. No data stored for review page checks. Every session felt like onboarding a new junior engineer who forgot everything overnight.

Then I stopped trying to fix the output and started fixing the input.

The problem with vanilla Claude Code

When Claude Code has no context about your project, it fills the gaps with assumptions. For QA automation, those assumptions are almost always wrong:

It writes tests without asking what they should verify
It picks locators based on what looks stable, not what is stable
It has no concept of your team's standards — no POM conventions, no assertion patterns, no debugging protocol
Every new conversation resets to zero

The result is code that runs once and breaks on the second attempt. Or code that technically passes but misses half the acceptance criteria.

The fix isn't better prompting. It's persistent, structured instructions that Claude Code reads before it does anything.

Two layers: CLAUDE.md and Skills

Claude Code supports two types of instruction files:

CLAUDE.md lives in the root of your project. Claude Code reads it automatically at the start of every session. Think of it as your onboarding document for the AI — project structure, conventions, critical rules, things that should never happen.

Skills (.claude/skills/*.md) are task-specific workflows. They're not loaded automatically — Claude Code picks them up when the task matches the skill's description. Each skill is a step-by-step protocol for a specific job: create a test, fix a failing test, write a POM class.

CLAUDE.md defines what your project is and what the rules are. Skills define how to execute specific tasks.

Together, they turn a capable but directionless AI into something that actually behaves like a senior engineer on your team.

CLAUDE.md and Skills architecture diagram

My CLAUDE.md — the key sections

I won't paste the whole file (link to the GitHub repo below), but here are the sections that actually changed how Claude Code behaves.

Authentication rules

My project uses Playwright's storageState to handle auth — the setup project runs once, saves session cookies to disk, and every test picks them up automatically.

Without explicit instruction, Claude Code would add login calls inside individual tests. That breaks the entire auth model. So my CLAUDE.md has a critical rule:

DO NOT call the programmatic login helper inside individual tests. Each browser context already has the session loaded from storageState.

One rule. Saves hours of debugging auth failures that look like flakiness.

E2E writing standards — the 5-point checklist

This is the most valuable part of the file. Every test Claude Code writes must satisfy all five:

Label visibility checks — assert every field label is visible before interacting with it
Scoped locators — if a label appears more than once in the DOM, scope the locator to its parent container
Store entered data — assign every value typed into a form to a variable, then reuse it for assertions
Review page assertions — on any confirmation or summary screen, assert all stored values are displayed
Step-by-step assertions — after each significant action, assert the expected outcome before moving on

None of these are rocket science. But without explicit instruction, Claude Code skips most of them. With them in CLAUDE.md, they show up consistently — every test, every session.

DOM recon before writing code

This one is counterintuitive. Before writing any test code, Claude Code must:

Navigate to the target page
Catalog all form fields, labels, roles, and parent containers
Identify duplicate labels and plan a scoping strategy
Map the full user flow including what appears on review screens

Then write the code.

It slows the first step down slightly. It eliminates two or three debugging cycles entirely.

My three Skills

`create-e2e` — test creation workflow

Five steps, in order, no skipping:

Gather requirements — ask for the test ID and fetch acceptance criteria before writing anything
DOM recon — inspect the live page, catalog all interactive elements, identify duplicate labels
Write the test — using POM, with the 5-point checklist enforced
Run and validate — run only the new test, fix failures before presenting
Self-check — Claude reviews its own output against the checklist before showing it to me. In my experience, most people skip this step when writing skills. It catches the most issues.

`fix-test-loop` — autonomous debugging

This skill implements a self-healing loop:

MAX_ITERATIONS = 10

while iteration < MAX_ITERATIONS:
  1. Run the failing test
  2. If passing → done, report success
  3. If failing → diagnose root cause, apply fix, re-run only the failing test
  4. Log: which test failed, root cause, what changed

The diagnosis phase covers four failure categories: selector/locator issues, timing and race conditions, data/state problems, and environment/build issues. For each category, there's a specific investigation approach.

fix-test-loop autonomous debugging flow diagram

What makes this skill valuable isn't the loop — it's the hard stops. Claude Code will not retry on:

Missing environment variables
Network unreachable
TypeScript compilation errors
Test infrastructure failure

Teaching an AI agent when to stop is as important as teaching it what to do.

`page-objects` — POM consistency

A focused skill with one core rule: check existing POMs first, extend them, don't duplicate.

It also enforces a locator priority order:

getByRole() — semantic, always preferred
getByText() / getByLabel()
locator() with CSS or XPath only as a last resort

And a class structure pattern — constructor initializes all locators as typed properties, methods are async and include their own assertions. Every POM Claude Code writes in this project follows the same shape.

Without this skill, Claude would invent a new locator for the same button in every test that touches it. Three tests, three different selectors, all fragile in different ways. This skill forces it to check whether the element already exists in a POM before creating anything new.

Before and after

Before these files existed, I spent 20 to 30 minutes per session correcting things that should have been right the first time — wrong locator strategies, missing assertions, login calls in the wrong place. Most tests needed at least two correction cycles before they were usable. Each session started from zero.

After: Claude Code opens the project, reads the context, and produces tests that match team standards without me explaining them again. Most tests pass on the first run. The fix-test-loop skill means I can hand off a failing test and come back to a green one.

The 20 minutes I used to spend correcting AI output now go into reviewing what the tests actually verify.

The template

I'm publishing a stripped, company-agnostic version of all four files as a GitHub template:

github.com/dusan011/claude-code-qa-setup

What's in the repo:

claude-code-qa-setup/
├── CLAUDE.md
├── .claude/
│   └── skills/
│       ├── create-e2e.md
│       ├── fix-test-loop.md
│       └── page-objects.md
└── README.md

Fork it, adapt it to your project, remove what doesn't apply. The structure matters more than the specific content.

If you haven't read how I used Claude Code to fix real UI bugs — that's Post #1.

One thing before you go

These files are not magic. They're instructions. Claude Code will still make mistakes — but they become fewer, more predictable, and easier to fix because the baseline is consistent.

The real shift is this: instead of prompting your way through every session, you invest once in defining your standards, and the AI applies them every time.

What does your CLAUDE.md look like? I'd genuinely like to know how other QA engineers are solving this — drop a comment or reach out on LinkedIn.

If this was useful, I publish practical posts about AI in QA workflows at dusanpetrovic.dev. No fluff, no generic AI takes — just what's actually working.

Next post: I'll walk through the create-e2e skill in action — from a real Jira ticket to a finished Playwright test, with the full Claude Code session shown. If you don't want to miss it, subscribe below.

Last Update: April 15, 2026

Tagged in:

Claude Code, QA Automation, AI Testing, Playwright

How I turned Claude Code into a disciplined QA engineer — my CLAUDE.md and Skills setup

The problem with vanilla Claude Code

Two layers: CLAUDE.md and Skills

My CLAUDE.md — the key sections

Authentication rules

E2E writing standards — the 5-point checklist

DOM recon before writing code

My three Skills

`create-e2e` — test creation workflow

`fix-test-loop` — autonomous debugging

`page-objects` — POM consistency

Before and after

The template

One thing before you go

I Showed Claude Code Two UI Bugs. It Fixed Both, Updated the Tests, and Committed — All Before My Coffee Got Cold.

The problem with vanilla Claude Code

Two layers: CLAUDE.md and Skills

My CLAUDE.md — the key sections

Authentication rules

E2E writing standards — the 5-point checklist

DOM recon before writing code

My three Skills

create-e2e — test creation workflow

fix-test-loop — autonomous debugging

page-objects — POM consistency

Before and after

The template

One thing before you go

Share on:

I Showed Claude Code Two UI Bugs. It Fixed Both, Updated the Tests, and Committed — All Before My Coffee Got Cold.

`create-e2e` — test creation workflow

`fix-test-loop` — autonomous debugging

`page-objects` — POM consistency