Modernization Made Easy: How AI-Powered UI Tests are Revolutionizing User Experience

The Make-or-Break Factor in Modernization: Ensuring a Stellar User Experience

Embarking on a modernization journey for legacy systems is a critical undertaking. While the allure of cutting-edge technology, enhanced scalability, and improved efficiency is undeniable, there’s one element that often determines the ultimate success or failure: the user interface.

Think about it. No matter how robust and lightning-fast the backend becomes, it’s the front-end that users interact with daily. It’s where trust is forged, and where adoption flourishes or falters. This is precisely why, when we talk about modernizing complex systems – whether it’s a grand Hyper Velocity Engineering initiative or a strategic shift from monoliths to microservices – ensuring a seamless user experience is paramount. This article delves into how we’ve been leveraging the power of AI-generated tools, specifically Stagehand in conjunction with Microsoft’s robust Playwright framework, to achieve just that.

Why Your Modernization Project Lives and Dies by UI Testing

At its core, modernization is about breathing new life into outdated systems. It’s about upgrading to contemporary technologies, frameworks, and architectures that can better serve today’s and tomorrow’s business demands. This often involves migrating applications from their aging platforms to the flexibility of the cloud, adopting agile modern frameworks, or deconstructing massive monolithic applications into smaller, more manageable microservices. The goal? To unlock greater scalability, build more resilient systems, and foster agility within your operations.

But here’s the crux of the challenge: while the technical underpinnings are being overhauled, what truly resonates with the end-user is their day-to-day interaction with the system. If familiar functionalities suddenly go haywire – perhaps navigation becomes a maze, data retrieval is a chore, or forms refuse to cooperate – it can swiftly erode the trust painstakingly built. This can, in turn, significantly slow down the adoption of the new, improved system.

This is where the unsung hero, UI testing, steps onto the stage. It acts as a vigilant guardian of the user experience, ensuring that users can continue to accomplish their tasks with ease and consistency. More importantly, it acts as an early warning system, catching those tricky integration issues that might slip through the cracks of other testing methodologies. In essence, UI testing is the bridge that ensures modernization delivers its promised benefits without sacrificing usability or customer confidence.

And this is precisely where the emergence of AI-powered testing tools has become a game-changer. These tools are offering innovative, efficient ways to build and maintain comprehensive UI test coverage, far surpassing the limitations of traditional methods.

Our Evolution: A Tale of Playwright, Stagehand, and the Quest for Efficiency

Our journey into the realm of UI testing for modernization projects has been an evolutionary one, marked by a strategic adoption of cutting-edge tools.

Phase 1: The Playwright Foundation

Initially, our approach to UI testing was grounded in traditional Playwright scripting. This was a manual endeavor, requiring diligent test creation and ongoing maintenance. While Playwright is an exceptionally capable framework, this manual approach, while effective, was inherently time-consuming. It also demanded a deep, intimate knowledge of element selectors and intricate user flows.

Phase 2: Embracing the AI Revolution with Stagehand

Seeking greater efficiency and a lower barrier to entry for test creation, we transitioned to Stagehand, an innovative automation tool powered by Large Language Models (LLMs). The promise of Stagehand was compelling: the ability to generate UI tests through simple, natural language instructions. Imagine describing a user scenario in plain English, and having an AI engine automatically translate that into executable test logic. This dramatically simplified the complexity of test creation.

Phase 3: The Hybrid Harmony – Stagehand Integrated with Playwright

As we explored Stagehand’s capabilities, a powerful synergy emerged. We discovered that Stagehand became most potent when integrated back into the Playwright ecosystem. The team learned to leverage Stagehand’s AI-generated test logic and then seamlessly convert it back into standard Playwright code.

This hybrid approach offered the best of all worlds. We gained the remarkable speed and simplicity of AI-driven test generation, while retaining the rock-solid reliability, sophisticated debugging capabilities, and the extensive ecosystem support that Playwright is renowned for.

This evolution encapsulates a broader trend: teams moving from purely manual, traditional methods towards AI-powered generation, ultimately settling on a hybrid model that masterfully blends the strengths of both approaches, proving invaluable for our complex modernization projects.

Understanding the Core Components: Playwright and Stagehand

To fully appreciate our journey, it’s essential to understand the tools at play.

What is Playwright?

Playwright is a cutting-edge, open-source end-to-end testing framework developed by Microsoft. Its primary function is to enable reliable automation of modern web applications across a multitude of browsers. For modernization projects, Playwright’s strength lies in its unwavering ability to validate user workflows with exceptional consistency. It can automatically handle complex user interactions and provides detailed, insightful debugging information when tests falter. This makes it an indispensable tool when the goal is to migrate legacy systems while ensuring that the user experience remains unbroken.

Why Playwright Shines in Dev Containers

Adopting a Dev Container-based approach proved incredibly beneficial for rapidly bootstrapping our local development environments. This facilitated fast iteration cycles by ensuring that our environments were not only reproducible but also easily shareable across our distributed teams. This is particularly advantageous in modernization projects, where team members might be working with diverse legacy system configurations and require a consistent testing backdrop. This benefit, however, holds true regardless of the specific UI testing framework employed.

Yet, the introduction of Dev Containers also presented unique challenges, especially for browser-based UI testing. These hurdles necessitated careful consideration to ensure our tests remained reliable and our development workflow stayed as smooth as possible.

Effortless Test Generation with Playwright’s codegen

Playwright offers a built-in codegen feature, a powerful tool that allows you to record your interactions with an application instead of painstakingly writing every line of code manually.

The process is elegantly simple:

Initiate Recording: Run the command npx playwright codegen https://your-app-url.com.
Interact Naturally: This command opens a browser window where you can interact with your application as you normally would – clicking buttons, typing in fields, navigating through pages.
Automatic Scripting: As you perform these actions, Playwright intelligently auto-generates the corresponding test script in real-time.
Save and Refine: You can then save this generated test script and further refine it for your specific needs.

Ensuring Browser Visibility and Smooth Test Recording: The X11 Forwarding Setup

To run Playwright tests in ‘headed’ mode (where you can visually see the browser executing the tests), record tests using codegen, or debug tests visually, you’ll need to set up X11 forwarding. This might sound technical, but it’s crucial for a seamless visual testing experience.

WSL (Windows Subsystem for Linux): Fortunately, this often works out of the box, requiring no additional setup.
macOS: You’ll need to install XQuartz. You can do this via Homebrew (brew install --cask xquartz) or by downloading and installing it directly from https://www.xquartz.org/.
- Important: After installation, it’s a good idea to restart your computer.
Configure X11 Forwarding: Once XQuartz is set up, you’ll need to enable local connections to the X server. You can typically do this by running the following command in your terminal:
```
xhost +localhost
```

Deep Dives: Playwright’s Reporting and Tracing Capabilities

Playwright’s reporting and tracing features are invaluable for understanding precisely what happened during a test run. They provide deep insights into test execution, making debugging a far more efficient process.

HTML Reports: After every test run, Playwright automatically generates detailed HTML reports. These reports vividly display test results, capture screenshots at the point of failure, and provide crucial performance metrics.
- Example: Imagine a failed login test. The HTML report will clearly highlight the failing test case in red, present a screenshot of the login page exactly when it failed, and provide a stack trace pinpointing the exact assertion that didn’t pass. This visual clarity dramatically speeds up the identification and communication of test failures during those crucial modernization sprints.
Trace Viewer: This built-in tool is a detective’s dream. It captures every single action performed during a test, including DOM snapshots, network requests, console logs, and timeline events. When a test fails, you can use the trace viewer to replay the exact sequence of events and inspect the application’s state at any given moment.
- Example: Picture a complex checkout flow. The trace viewer allows you to meticulously step through each browser action – clicks, navigations, data inputs – and examine the DOM, network calls, and console logs at each stage. This capability is absolutely critical for diagnosing those intricate integration issues that inevitably surface when migrating established legacy workflows to brand-new, modern architectures.
Performance Metrics: Beyond functional testing, the reports can include execution times for individual tests and resource usage data. This helps in identifying potential performance bottlenecks that might be impacting the speed or efficiency of your tests.

The AI Surge: Unpacking Stagehand

Stagehand represents a significant leap forward in simplifying web testing through the power of AI.

What is Stagehand?

Stagehand is an AI-powered browser automation tool designed to make web testing incredibly accessible. It empowers users to write tests using intuitive, natural language instructions. Built upon the robust foundation of Playwright, Stagehand leverages advanced Large Language Models (LLMs) to interpret human-readable commands and translate them automatically into functional browser automation code.

Instead of wrestling with complex CSS selectors or intricate interaction logic, you can simply articulate your testing needs. Commands like ‘click the login button’ or ‘fill out the registration form with valid data’ are understood by Stagehand’s AI engine, which then handles the technical implementation behind the scenes.

This innovative approach drastically lowers the barrier to entry for UI testing. It makes the process accessible to team members who might not possess deep automation expertise, such as business analysts and QA professionals. Stagehand is particularly adept at accelerating modernization projects where stakeholders need to rapidly validate that critical workflows remain intact post-system changes, without requiring them to delve into the complexities of DOM manipulation or CSS selectors.

Stagehand’s Intuitive Core Functions

Stagehand offers three fundamental operations that make its usage remarkably straightforward:

ACT – Performing User Interactions: This function allows you to direct user interactions on the page using natural language. Forget about painstakingly locating specific CSS selectors. You simply describe the action, and Stagehand’s AI intelligently identifies the correct elements and executes the interaction.
EXTRACT – Retrieving Webpage Data: This capability lets you fetch data from a webpage based on descriptive instructions. Instead of manually querying DOM elements, you simply state what information you need, and Stagehand returns it in a usable format.
OBSERVE – Waiting for Page States: This function allows you to define conditions or page states that the system should wait for before proceeding. It replaces complex waiting logic with simple, human-readable descriptions of what you expect to see or happen on the page.

Here’s a glimpse of Stagehand in action:

import { Stagehand } from "@browserbase/stagehand";

const stagehand = new Stagehand();
await stagehand.init();

// ACT: Perform actions using natural language
await stagehand.act("click the login button");
await stagehand.act("type 'user@example.com' in the email field");

// EXTRACT: Get information from the page
const userInfo = await stagehand.extract("get the user's name from the profile section");

// OBSERVE: Wait for and verify page states
await stagehand.observe("wait for the dashboard to load");

The Azure OpenAI Integration Challenge: Bridging the Gap

One notable challenge we encountered with Stagehand, at the time of writing this post, was its lack of direct, out-of-the-box integration with Azure OpenAI. For organizations that rely on Azure OpenAI for its enterprise-grade security, compliance, and data governance features, this posed a hurdle.

To overcome this, our team implemented a custom LLMClient. This client was designed to interface with enterprise Azure OpenAI deployments, allowing Stagehand to leverage these powerful backend services. The implementation involves creating a custom LLMClient class that adheres to the ChatCompletion interface.

import { Stagehand, LLMClient, ChatCompletion, AvailableModel, LanguageModel, CreateChatCompletionOptions } from "@browserbase/stagehand";
import { openai } from "@ai-sdk/openai"; // Assuming you use the ai-sdk for OpenAI integration

export class AISdkClient extends LLMClient {
 constructor({ model }: { model: LanguageModel }) {
 super(model.modelId as AvailableModel);
 this.model = model;
 }

 async createChatCompletion<T = ChatCompletion>({
 options,
 }: CreateChatCompletionOptions): Promise<T> {
 // Format messages and handle Azure-specific logic here.
 // This would involve constructing the correct payload for Azure OpenAI.
 // ... implementation details specific to Azure OpenAI API interaction
 }
}

// Custom Azure OpenAI client configuration
const azureModel = openai({
 baseURL: "https://your-azure-endpoint.openai.azure.com/openai/deployments/your-model",
 apiKey: process.env.AZURE_OPENAI_API_KEY,
 apiVersion: "2024-02-15-preview"
});

const aiSdkClient = new AISdkClient({ model: azureModel });

const stagehand = new Stagehand({
 llmClient: aiSdkClient, // Use our custom client
 // other configuration options
});

This custom integration, while requiring additional setup and configuration, proved essential for unlocking the full potential of Stagehand within our enterprise Azure environments.

Embracing the Hybrid Approach: A Smarter Way to Test

Our exploration led us to a powerful hybrid strategy that optimizes both speed and reliability.

Experimenting with a Stagehand-First Approach

We embarked on an experiment with a "Stagehand-first" methodology. The primary goal was to accelerate the test creation process and, crucially, democratize test writing across the entire team. The workflow begins with individuals crafting tests using Stagehand’s intuitive natural language format.

This approach empowers team members without extensive coding backgrounds – such as business analysts and QA professionals – to contribute comprehensive test scenarios. Once these Stagehand tests are validated and confirmed to be working correctly, the team then converts them into standard Playwright code for seamless integration into our production environments.

This methodology offers several significant advantages:

Accelerated Test Creation: Natural language tests are inherently quicker to write and modify, leading to faster validation cycles.
Enhanced Team Accessibility: Individuals without deep coding expertise can now actively contribute to test coverage, fostering a collaborative testing culture.
Cost Optimization: Once converted to Playwright, the tests run without continuous LLM API calls, significantly reducing operational costs.
Seamless Maintainability: Standard Playwright code integrates effortlessly with our existing CI/CD pipelines, ensuring smooth deployment and maintenance.

The Architecture: A Visual Flow from Stagehand to Playwright

To illustrate this hybrid workflow, consider this architectural representation:

graph LR
    A[Business User/QA Engineer] --> B{Writes natural language tests in Stagehand};
    B --> C{Validates tests in Stagehand};
    C --> D{Converts to Playwright Code};
    D --> E[Production CI/CD Pipeline];

    F[Natural Lang. Test Scenarios] --> G[Stagehand Test Execution];
    G --> H[Quick Validation & Iteration];
    H --> I[Playwright Test Code Generation];
    I --> E;

This visual demonstrates how we maximize team productivity while ensuring that our production-ready test automation for modernization projects is both cost-effective and highly reliable.

Streamlining Conversion: Stagehand to Playwright with GitHub Copilot

While the speed and accessibility of Stagehand are undeniable, running dynamic tests each time can incur substantial costs due to token consumption. Therefore, we employ Stagehand strategically, particularly for its ability to enable non-technical team members with deep product knowledge to create tests. This significantly lowers the barrier to contribution.

Our process for converting Stagehand tests to Playwright code has been greatly streamlined using GitHub Copilot. This is achieved through the use of carefully crafted prompts and by leveraging detailed information logs generated during Stagehand test execution.

By enabling verbose logging in Stagehand, we can capture the underlying browser interactions and element identifications performed by the AI. These logs, combined with XPath validations, provide GitHub Copilot with the essential context needed to generate accurate Playwright test code. In essence, Stagehand acts similarly to Playwright’s codegen feature, but instead of just raw clicks and inputs, it produces more structured, semantically clear steps that translate into robust, repeatable Playwright tests.

While direct recording-to-test creation is feasible, this Stagehand-first approach offers a superior balance, providing accessibility for non-technical contributors while simultaneously yielding stronger automation outcomes.

A Concrete Example: Stagehand to Playwright Conversion in Action

Let’s walk through a real-world example to illustrate the conversion process:

Stagehand Test Code Snippet:

await test.step('Navigate to application', async () => {
 console.log("Navigating to SpringDashboard application...");
 await page.goto("http://localhost:8080/dashboard");
 await page.waitForTimeout(2000);
});

await test.step('Perform login', async () => {
 const currentUrl = page.url();
 console.log("Current URL:", currentUrl);

 // If not already on login page, try to find and click login link
 if (!currentUrl.includes('login') && !currentUrl.includes('Login')) {
 console.log("Looking for login link...");
 await page.act("Click on the login link or navigate to login page");
 await page.waitForTimeout(1000);
 }

 // Perform login
 console.log(`Attempting to login with username: ${TEST_USER.username}`);
 await page.act(`Enter '${TEST_USER.username}' in the username field`);
 await page.waitForTimeout(500);
 console.log("Entering password...");
 await page.act(`Enter '${TEST_USER.password}' in the password field`);
 await page.waitForTimeout(500);
 console.log("Clicking login button...");
 await page.act("Click the login or submit button");

 // Wait for navigation after login
 await page.waitForTimeout(3000);
});

await test.step('Verify login success', async () => {
 const postLoginUrl = page.url();
 console.log("URL after login attempt:", postLoginUrl);

Converted Playwright Code Snippet:

test('Complete CRUD workflow: Login, Create and Delete Customer', async ({ page }) => {
 await test.step('Navigate to application', async () => {
 console.log("Navigating to SpringDashboard application...");
 await page.goto("http://localhost:8080/dashboard");
 await page.waitForTimeout(2000);
 });

 await test.step('Perform login', async () => {
 // Wait for login page to load
 await page.waitForURL('**/showMyLoginPage');
 console.log("Current URL:", page.url());

 // Fill username using exact selector from Stagehand logs
 console.log(`Entering username: ${TEST_USER.username}`);
 await page.locator('xpath=/html[1]/body[1]/div[1]/div[1]/div[1]/form[1]/div[2]/input[1]').fill(TEST_USER.username);

 // Fill password using exact selector from Stagehand logs
 console.log("Entering password...");
 await page.locator('xpath=/html[1]/body[1]/div[1]/div[1]/div[1]/form[1]/div[3]/input[1]').fill(TEST_USER.password);

 // Click login button using exact selector from Stagehand logs
 console.log("Clicking login button...");
 await page.locator('xpath=/html[1]/body[1]/div[1]/div[1]/div[1]/form[1]/div[4]/button[1]').click();

 // Wait for navigation after login
 await page.waitForURL('**/customer/list');
 });

Exploring Semantic Selectors: A Deeper Dive

While the XPath approach proved reliable for conversion, the team also explored using Playwright’s semantic selectors, like getByRole, for more maintainable and readable tests. This approach leverages accessibility attributes and semantic naming conventions, offering a more robust alternative to brittle XPath selectors.

However, this semantic selection strategy, while excellent for developer-written tests, requires a deeper understanding of accessibility attributes and naming patterns. This makes it less suitable for automated conversion directly from Stagehand logs, as Stagehand’s generated logs might not always capture the precise semantic information needed for these advanced selectors.

An example of using semantic selectors:

await test.step('Navigate to application', async () => {
 console.log("Navigating to SpringDashboard application...");
 await page.goto("http://localhost:8080/dashboard");
 await page.waitForLoadState('networkidle');
});

await test.step('Perform login', async () => {
 console.log(`Attempting to login with username: ${TEST_USER.username}`);

 // Wait for login page to load
 await expect(page).toHaveURL(/.*login.*/i);

 // Fill username - using getByRole for semantic selection
 const usernameField = page.getByRole('textbox', { name: /username/i })
 .or(page.locator('input[name="username"]'))
 .or(page.locator('input[type="text"]').first());
 await usernameField.fill(TEST_USER.username);

 // Fill password - using getByRole for semantic selection
 const passwordField = page.getByRole('textbox', { name: /password/i })
 .or(page.locator('input[name="password"]'))
 .or(page.locator('input[type="password"]'));
 await passwordField.fill(TEST_USER.password);
});

While this method yields more maintainable and readable tests, it requires developers to possess a more profound understanding of accessibility attributes and semantic naming patterns, making it less ideal for automated conversion from Stagehand logs.

The Conversion Process Unveiled

The conversion process using Stagehand logs and GitHub Copilot typically involves these steps:

Run Stagehand with Verbose Logging: Execute Stagehand tests with detailed logging enabled to capture all browser interactions and element information.
Extract Logged Data: Parse the logs to extract valuable information, including XPath selectors and element validation patterns.
Prompt GitHub Copilot: Feed the extracted logs and validation requirements into GitHub Copilot using a specific prompt.
Generate Playwright Code: Allow Copilot to generate equivalent Playwright code that precisely replicates the behavior of the original Stagehand test.

This systematic approach ensures that the converted Playwright tests maintain the same level of reliability and accuracy as the original Stagehand tests, while simultaneously offering the performance and cost advantages of native Playwright execution.

The GitHub Copilot Conversion Experience: Remarkable Accuracy

Our experience with GitHub Copilot for this conversion has been overwhelmingly positive. A typical prompt used was:

“Generate Playwright tests using the logging detailed information by running Stagehand tests. Convert the natural language actions to specific element selectors and interactions based on the captured browser automation logs.”

The results? The generated Playwright code was approximately 85% correct, requiring only minor adjustments and fine-tuning. The element selectors, test logic, and overall flow were accurately translated from the Stagehand logs, showcasing Copilot’s impressive contextual understanding.

Evaluating the Stagehand-to-Playwright Approach: The Good, the Bad, and the Nuanced

Like any powerful methodology, the Stagehand-to-Playwright hybrid approach comes with its own set of advantages and challenges.

Pros:

Rapid Prototyping: Natural language tests allow for extremely quick validation of test scenarios, accelerating the feedback loop.
Democratized Testing: This approach empowers individuals without coding expertise to contribute significantly to test coverage, fostering a more inclusive testing environment.
Cost Optimization: Production-ready Playwright tests run without ongoing LLM API costs, leading to substantial savings.
Faster Development Cycles: The time from conceiving a test scenario to its full implementation is dramatically reduced.
Improved Collaboration: Business stakeholders can directly contribute to defining and validating test scenarios, bridging the gap between business needs and technical implementation.

Cons and Challenges:

Additional Conversion Step: The process necessitates a manual or semi-automated conversion step from Stagehand to Playwright code.
Learning Curve: Teams need to develop a foundational understanding of both Stagehand and Playwright to leverage the full benefits of the hybrid approach.
Maintenance Overhead: Changes made in one tool might require corresponding updates in the other, adding a layer of maintenance complexity.
Limited for Complex Scenarios: Some advanced testing patterns, particularly those involving dialogs outside the standard DOM, might not translate perfectly or easily, requiring more manual intervention.

Technical Considerations: Navigating the Nuances

XPath Considerations: Precision vs. Fragility

Advantages: XPath selectors, derived from Stagehand logs, offer precise element targeting and flexibility in traversing complex DOM structures. They also provide a clear path from natural language intent to an exact element selection.
Disadvantages: XPath selectors can be quite fragile and prone to breaking with even minor UI changes. They can also be slower than CSS selectors and are often difficult to read and modify when they become long and complex.
Recommendation: While XPath selectors derived from Stagehand logs are excellent as a starting point for conversion, it’s best practice to refactor them to more robust CSS selectors or data attributes for production tests whenever feasible. This enhances maintainability and reduces the likelihood of test failures due to DOM structural changes.

Alternative Prototype Approach: Stagehand in Playwright Record Mode

As a prototype, we also explored running Stagehand directly within Playwright’s record mode. The idea was to automatically capture the generated test code. However, this approach presented challenges, primarily due to spawning two browser instances simultaneously, which complicated environment management.

While this method shows promise for achieving direct code generation without a manual conversion step, it necessitates significant one-time engineering effort. This effort is required to properly coordinate the dual-instance setup and to manage the synchronization between Stagehand’s AI-driven actions and Playwright’s recording mechanism. Despite the added complexity, this remains a valuable area for future exploration, especially for teams aiming to fully automate the conversion process.

Conclusion: A Smarter Path to Seamless User Experiences

The hybrid Stagehand-to-Playwright approach presents a compelling and highly effective solution for modernization projects. It directly addresses the critical needs for rapid test creation and broad team accessibility, which are often the cornerstones of successful digital transformations.

While this methodology introduces a degree of complexity, the profound benefits of democratized test writing – empowering more team members to contribute – coupled with significant cost optimization, make it an incredibly valuable strategy. For teams striving to accelerate their UI testing workflows while steadfastly maintaining production-ready automation, this hybrid approach offers a clear and intelligent path forward, ensuring that the user experience remains not just functional, but truly exceptional, throughout the modernization journey.