parallax circle item parallax circle item parallax circle item
Insights & Blogs

Synthetic User Testing: What AI can and can't tell you about your UX

It’s a familiar story: a software project runs out of time, and "nice-to-have" tasks get chopped from the scope. Sadly, user testing is usually the first to go. While incredibly valuable, setting up proper human testing scenarios is historically the first thing deemed too time-consuming to save.

That's when Synthetic User Testing comes into play.

Synthetic User Testing is an AI-assisted UX evaluation method in which a language model, given browser access via tools like Chrome DevTools MCP, navigates a working prototype and reports friction points, accessibility violations, and goal-completion outcomes — without involving human participants.

Think of it as a pre-flight check for your UI. While it isn't a total replacement for human testing, it allows designers to rapidly test complex user journeys and validate interactions early in the prototyping phase.

 


What is Synthetic User Testing good for?

Synthetic User Testing is most effective for functional validation, accessibility auditing, and catching obvious UX friction points early in the design process - tasks where machine speed and consistency outperform the time cost of recruiting human participants.

Validate user flows

Ensure that both the happy paths work and that the unhappy paths can be resolved by your personas. Functional testing, automated. Gather basic functional feedback for a multitude of scenarios early on, then translate this into actual test cases afterwards.

Accessibility and performance audits

Catch WCAG violations during the development phase, before they become expensive to fix.

Rapid iterations

Test a large number of iterations of a landing page's layout or structure to see which solution fits best for a specific persona - quickly, and at scale.

Catch obvious UX friction points

The synthetic tests will surface the obvious friction, leaving the in-depth, nuanced testing to actual humans.

What is Synthetic User Testing not useful for?

Synthetic User Testing cannot replicate emotional depth, real-world unpredictability, or the final human judgment that determines whether a product truly makes sense to the people using it.

Emotional depth

While AI can simulate a persona, it can't truly map the frustrations and emotions involved while interacting with a system.

Real-world chaos

Synthetic testing happens in a safe environment. There is no chaos, everything is predictable and predetermined. It is not simulating real-world conditions.

Final decisions

This is a very inhuman way to test human software. It tells us that the system works functionally, but it doesn't explain whether it actually makes sense to a real person.

How does Synthetic User Testing compare to regular user testing?

The limitations and advantages of synthetic user testing are directly correlated to the implicit qualities of machines versus humans. Fitts defined these properties in 1951:

Human strengths

Machine strengths

Detection & perception

Speed & power

Judgment & induction

Computation & replication

Improvisation

Simultaneous operations

Long-term memory

Short-term memory

Source: Fitts, 1951

The obvious main advantage to a designer is rapidly catching UX friction points during the prototyping phase - a phase that is starting increasingly earlier on in projects. The tests can provide a quick validated answer from different perspectives, not just answering 'does this work?' but answering 'does this interaction work for this persona in this scenario?'.

How do you set up a Synthetic User Testing proof of concept?

A Synthetic User Testing proof of concept requires three components: an LLM with browser access, a working software prototype to be tested, and configured browser MCPs — such as chrome-devtools-mcp and playwright-mcp - that give the agent the ability to navigate the prototype.

I was tasked with evaluating whether this approach would make sense to implement in Sandfield's workflow. Our LLM licensing supports GitHub Copilot and Claude Code, we'll use both interchangeably throughout the project. We tested the concept initially on an existing prototype for one of our clients, but we'll do a re-run on our Origin website for this blog post.

To be able to talk to the browsers, we'll use chrome-devtools-mcp and playwright-mcp. The Chrome MCP allows for easy navigation in the browser, while the Playwright MCP adds automated quality testing and regression testing to the mix - more on that later.

After creating a folder for the project and setting up the MCPs, it's time to test the waters. Here's a folder structure that makes sense:

The requirements folder is where functional requirements live. The more context the LLM understands, the better it can fine-tune the personas to the project and ensure the scenarios being tested are correct and useful. In most cases, we'll create our own personas - but it doesn't hurt to have the LLM generate several personas from the requirements alone. After all, the agent might surface creative ideas we'd never thought of.

The reports folder holds the final Markdown reports for each test run, highlighting UX friction points, successes, failures, and WCAG audit results.

The tests folder is the place for Playwright. Since Synthetic User Testing focuses on persona and scenario-based testing, it's useful to connect the scenarios that have been created to Playwright: automated quality assurance testing.

What does a Synthetic User Testing session look like in practice?

To show this in action, we ran a session on our own Origin website, which is currently undergoing a revamp, so the findings are directly useful.

Defining the persona

We created a project called 'Origin Supply Chain' and used three example personas. One of them is Marcus:

Marcus

Role: IT Systems Manager Company: Multi-entity logistics group Location: Auckland, New Zealand

Who they are

Marcus is 38, IT Systems Manager at a multi-entity logistics group that runs SAP for finance and a legacy WMS built in-house. He's been pulled into a TMS evaluation by the ops team and his job is to answer two questions before the exec team signs off: can this integrate with what we already have, and is it secure enough to pass their ISO compliance audit? He's not the decision-maker, but he's the veto. A bad outcome for Marcus is recommending a vendor who later turns out to have a "custom integration project" price tag attached, or a security posture that fails the audit.

Goals for this session

  1. Understand what integration options Origin offers — specifically whether SAP and existing WMS connections are standard or bespoke

  2. Find Origin's security posture, certifications, and data residency information — he needs to complete a vendor risk assessment form

  3. Determine whether customers get API access or if everything goes through a managed integration service

What matters to them

  • Technical specifics, not marketing language: "seamless integration" means nothing; supported protocols, connectors, and EDI formats mean everything

  • Security documentation: ISO certifications, data residency, penetration testing cadence — he'll need to screenshot these for the audit

  • Honest scoping of custom work: What's in the standard product vs. what requires a project? He's been burned by scope creep before

  • The ability to do his own research: He'll avoid sales calls until he has a clear picture; a demo request is a last resort

Behaviours to simulate

  • Go directly to the Products > Integration page first

  • Look for protocol/standard/connector lists on integration pages

  • Navigate to About > Security looking for certification badges or download links

  • Search for API documentation or developer resources

  • Check footer and nav for a "developers" or "docs" link

  • Read carefully for "managed service" vs. "self-service" language around integration

Success criteria

  • Finds the Integration product page and understands the service model

  • Locates the Security page and identifies any certifications

  • Gets a clear answer on whether API access is available to customers

  • Can partially fill out a vendor risk assessment

Red flags to watch for

  • Integration page is vague about protocols and connectors

  • No security page or certification information

  • No distinction between what's standard and what requires scoping

  • API or developer documentation absent or behind a contact wall

  • Security certification links broken or generic

 

We’ll test to see whether Marcus can successfully reach his goals: learning whether SAP and WMS connections are a ‘default’ option or a ‘bespoke’ option, completing a vendor risk assessment and to find out if our API is accessible. Specific goals, let’s put our current website to the test.

MCP set up

Before starting the test, it's worth confirming that both MCPs are running correctly so the agent can browse through the project. With chrome-devtools and playwright both confirmed as live, the agent has access to both - meaning we can perform the user testing process and follow up with automated QA testing.

Approach

Since working with LLMs is inherently non-deterministic, we need to provide sufficient guidelines and structure to ensure consistent output. To do this, we created two markdown files:

 

Both files contain instructions regarding the workflow. The synthetic user testing process is highly structured to ensure consistent, actionable results. It begins by defining a clear persona and real-world task scenarios (Goals). The core execution involves a "think-aloud" session, where the agent narrates the persona's inner monologue while navigating the product at both desktop and mobile viewports.

Friction is classified by severity (Goal-blocking 🔴, Goal-friction 🟠) and documented in a report that prioritises actionable recommendations. The process concludes by generating two separate engineering artefacts: Playwright tests for automated regression coverage on critical findings, and an automated accessibility audit using axe-core across all tested pages.

The think-aloud method

The most useful insights from user tests often come to light using the think-aloud method: a simple approach that asks participants to verbalise their thoughts while using a product. The markdown file stresses the importance of first-person, present-tense narration - including honesty about confusion and reasoning for decision-making.

Workflow summary

  1. Persona defined (.md file in personas/)

  2. Chrome MCP live browser exploration, goal-first

  3. Friction report written to reports/

  4. Playwright .spec.ts written from report findings, saved to tests/personas/

  5. Accessibility audit run

What did the Synthetic User Testing run reveal?

The synthetic run for Marcus, the IT Systems Manager, provided immediate, valuable clarity on the self-service evaluation journey. While he ultimately needed a sales call to complete his vendor audit, the test identified several strong points and key areas for improvement.

On the positive side, Marcus easily found our verifiable ISO 27001:2022 certification. However, friction arose in two areas: the website was ambiguous about whether SAP integration was a standard connector or required a bespoke project, and the "fully managed service" messaging incorrectly suggested a total lack of customer-accessible API access.

The agent did its job well - it recognised the earlier report without being specifically instructed to, and decided to start a fresh session. The browser opened at two different screen resolutions. The first issue came to light quickly: we do not mention API access on our website. The agent then moved on to the second goal - the security audit - found the security page, and cleared the ISO 27001 goal after independently verifying the JASANZ register link.

Within five minutes, the report was written.

Think-aloud excerpt: Goal 1

Goal 1: Understand what integration options Origin offers - specifically whether SAP and existing WMS connections are standard or bespoke

Outcome: 🟠 Goal-friction

"OK, I'm on the homepage. 'Logistics software for operators who refuse to compromise' - fine, I'm not here for the marketing. Let me find Integration." Marcus scans the nav - Products is right there. He opens the dropdown.

"Integration - yes, and the description says 'connect your systems, partners, and customers with our fully managed supply chain integration service.' Fully managed. I'll note that." He clicks Integration.

He reaches the connector ecosystem list. "Internal Systems: ERP (SAP, NetSuite), WMS, TMS, Forwarding. Good - SAP is there by name. WMS too. But it's a bullet point, not a connector catalogue. I don't know if SAP is a standard connector or if 'we've done it before, let's scope your project' is the actual answer."

He scrolls to the bottom. Contact form. "Of course. Nothing else here - no spec sheet, no connector list, no pricing. I'll look at Crossfire." He navigates to Crossfire's site, finds the SAP connector listed. "So SAP is a real connector, not a hypothetical. But I'm now on a completely different website and I still don't know what's pre-built versus what Origin would want to scope as a project."

Recommendations

Based on the Marcus audit, here are the prioritised fixes:

Recommendations

High impact

  • Add a "What's included vs. what's scoped" section to the Integration product page — even a simple table listing standard connectors vs. connectors requiring a project conversation would let evaluators like Marcus answer their core question without booking a demo

  • Add Security to the About dropdown nav — it currently exists only in the footer, meaning any user who navigates About > [looking for Security] hits a dead end and may not find the page at all

  • Specify the AWS data region on the Security page — "hosted in Amazon Web Services" with no region is an automatic audit gap for any enterprise vendor assessment in NZ or AU

Medium impact

  • Surface Crossfire's protocol details (REST, SOAP, JSON, XML, webhooks, EDI) on Origin's own Integration page — currently Marcus must navigate off-site and dig into Crossfire's FAQ to find protocol specifics that would answer G1 and G3 directly

  • Clarify on the Integration page that customers can self-manage API keys via the Crossfire Customer Portal — the "fully managed service" framing implies a black box when in fact there is a customer-accessible API layer

  • Add a NZ Privacy Act / Australian Privacy Act reference to the Security page alongside GDPR — the current compliance section cites GDPR but neither regional framework despite serving exclusively Australasian enterprise clients

Low impact

  • Replace "For details of our ISO 27001 certification, please contact us" with an unambiguous CTA — the link beside it already goes to the live JASANZ register, making the "contact us" instruction confusing and underselling the fact that the cert is independently verifiable right now

  • Add a named pen testing vendor and annual/biannual cadence detail to the Environment section — "frequent pen testing" is too vague for a vendor risk assessment form

What did we learn from running the proof of concept?

Fine-tuning of reporting is ongoing

The initial test runs quickly showed that LLM agents need ongoing refinement to ensure output is consistently useful. While the think-aloud method immediately pinpointed friction, raw outputs sometimes lacked the structured clarity needed for a busy engineering team. The lesson: continually fine-tune the agent's prompts to produce not just data, but highly specific and actionable analysis - so every test run delivers maximum value for sprint planning and design review.

A dedicated recommendations section changes everything

A key structural improvement was adding a 'Recommendations' section at the top of the final report. Initially, findings were buried deep within the goal-specific outcomes, making it difficult for busy stakeholders to grasp the high-impact fixes quickly. By introducing prioritised, action-oriented items - such as "Add Security to the About dropdown nav" - we created an easily scannable list that transformed the report from an audit document into a sprint-ready playbook.

Persona quality determines result quality

The utility of synthetic user testing is highly dependent on matching digital personas to real-life users and specific scenarios. Generalised testing yields general, low-value feedback. When we apply highly specific goals - like Marcus's need to find "AWS data residency" or "standard vs. bespoke integration" - the results immediately become powerful. This underscores the need for rigorous persona engineering at the outset of every project, ensuring scenarios are not just functional checks, but true reflections of actual high-value user behaviour.

When should design teams use Synthetic User Testing?

Synthetic User Testing is most valuable during the prototyping phase, before real-world user acceptance testing - when catching obvious friction early saves the most time and cost.

Its core value lies in providing a pre-flight check for your UX and UI by rapidly testing complex user journeys, validating flows, and catching obvious friction points in a fraction of the time. For Sandfield, this method is useful both internally and externally: it accelerates our design process by providing quick, validated answers from different perspectives during prototyping, and it ensures we deliver higher quality, functionally sound products to our clients through automated validation and accessibility audits.

The specific findings from the Marcus audit - run on our current Origin website - are now being actioned to directly inform the new website revamp, ensuring our new design addresses critical evaluator needs like security detail and integration clarity. This systematic approach ensures that by the time a product reaches real-world User Acceptance Testing, the majority of obvious friction has been smoothed out.

Furthermore, by using Playwright to turn key friction points into automated QA tests, we build a robust regression suite into the project from the very start.

Frequently asked questions

Is Synthetic User Testing a replacement for real user testing?

No. Think of it as the pre-flight check before you bring in the pilots. It catches the obvious friction so your real user sessions can be dedicated to exploring more complex, high-impact challenges.

Any LLM with browser tool support works. We used both GitHub Copilot and Claude Code, configured with chrome-devtools-mcp and playwright-mcp.

From prompt to full Markdown report, the Marcus session took under five minutes. Writing Playwright specs from the findings adds more time, but the initial audit is very fast. The key to executing this quickly is having the right persona and testing criteria set up before starting the test run.

Automated QA testing checks whether functionality works. Synthetic User Testing checks whether a specific persona can achieve their goals. It's scenario-based and judgment-driven, not simply pass/fail.

Ideally during the prototyping phase, early and often. The earlier friction is caught, the cheaper it is to fix — and the more focused your real user testing sessions can be.

profile picture of Max van IJsselmuiden
Posted by Max van IJsselmuiden

Max van IJsselmuiden is our Senior UX/UI Designer and leads our design research and prototyping methods, helping our clients bring their innovative ideas to life. Max brings a wealth of design expertise to the Sandfield team, ensuring our digital solutions are both intuitive and deeply user-focused. When not delivering innovation to our customers, Max can be found embarking on motorcycle camping adventures, playing football, writing, or diving into video games.

Follow us for the latest insights