AI-Powered QA Testing with playwright-cli and GitHub Copilot

Most AI-assisted QA workflows assume you have access to everything: Playwright MCP configured in VS Code, Copilot Vision enabled, the embedded browser panel working. In an enterprise environment, those assumptions often don’t hold. Security policies restrict which tools can connect to which services. Features get disabled. The standard setup isn’t available. This post documents a different approach factoring in certain constraints. The combination: playwright-cli for browser interaction, GitHub Copilot CLI for the agent loop, and a plain natural language prompt describing what to test. No MCP. No generated test files. No vision model. Just a coding agent running shell commands against a real browser. ...

April 9, 2026 · 6 min · Tyler

How to Design RAG Eval Test Cases

A working RAG pipeline is easy. Knowing whether it will keep working after you change something is harder, and most projects skip that part entirely. Here the focus is designing an eval harness that catches real problems, using the Anthropic docs RAG agent as the example. What an eval harness does An eval harness is a script that runs a fixed set of test cases against your pipeline and produces a pass/fail score. Run it before and after a change — if the score drops, the change broke something. If it improves, the change helped. ...

January 24, 2026 · 7 min · Tyler