J G

Feature Experimentation (FX)

Product Usability Study

Improved the non-technical user experience by uncovering key usability pain points and shaping design solutions that drive adoption.

Overview

Research

Findings

Design

Next Steps

Reflections

Project Overview

Context

Optimizely is a B2B SaaS company that provides digital experience optimization tools. Its Feature Experimentation (FX) platform helps businesses run A/B tests without modifying source code.

However, non-technical users found the platform unintuitive and difficult to navigate, which risked slowing adoption.

As part of a client-sponsored project through UW HCDE, I worked with a 5-person team to investigate these challenges. Partnering closely with Optimizely’s product manager, we aligned on business goals and identified usability barriers in the A/B test setup flow.

Our goal was to uncover where non-technical users struggled most, deliver actionable design recommendations to improve satisfaction and drive adoption.

My Contribution

Research leadership: Designed the test protocol, moderated usability sessions, and synthesized findings.

Design exploration: Ideated and illustrated solutions based on research insights to support new user adoption and satisfaction.

Client collaboration: Acted as the main point of contact and presented the final report to stakeholders.

Client

Optimizely

Role

UX Researcher & Designer

Timeline

8 Weeks (2025)

Team

4 Researchers

1 Product Manager

Skills

User Test Plan & Kit Development

Remote Usability Test Moderation

Data Analysis (Quant + Qual)

Client Communication

Impact

14 issues identified

From moderated usability tests and affinity mapping, we uncovered 14 issues across global and feature-level UX, categorized into 5 areas of improvements.

Prioritized insights & designs

We delivered clear, prioritized insights and design recommendations backed by data. The product team plans to use them in future redesigns.

This is so insightful!

... Super complicated and deep industry space, you guys got up to speed in no time!

—— FX Product Team at Optimizely

Design Exploration Preview

My design mockups: onboarding (left), ruleset panel (middle), system language (right)

Research Process

Problem

A core product, but low satisfaction

While FX is one of the most-used products in Optimizely, it consistently received feedback as confusing, especially for less technical users like product managers and marketers. Many struggled to navigate and launch A/B tests with confidence.

400+

users

FX is one of the most used screens in our product.

90%

usability pain points

In a recent product satisfaction survey, 90% of feedback on FX focused on poor usability.

project Goal

Uncover friction points and improve adoption

Our goal was to identify specific usability pain points and propose clear design fixes to improve adoption, reduce support needs, and help users feel more confident using FX.

Defining the research scope

Upon aligning with the client on the business needs and problems, I facilitated defining the research scope, narrowing our focus to usability issues in the FX interface — especially for non-technical users. We focused our research around the following key questions:

Key research questions

How intuitive is FX for setting up and managing A/B tests?
What challenges and frustrations do users face when navigating and using FX?
How well do users understand the purpose and functionality of the FX interface while running A/B testing?

UNDERSTANDING the USERS

Our research target

We focused on non-technical users with little to no FX experience but some familiarity with A/B testing. This profile reflects FX’s target audience—business roles expected to run tests without engineering help. Prior A/B testing knowledge helped us ensure issues we uncovered were due to the interface, not domain unfamiliarity.

Understanding the product

Familiarizing with the FX platform

To prepare for the user study, we began with a tool walkthrough and interaction mapping exercise to onboard ourselves.

FX Interaction Map

My first-hand struggles with FX deepened user empathy

As a non-technical first-time user, I faced a key obstacle during onboarding: the FX interface felt complex, and I lacked domain knowledge in A/B testing. Concepts like “flags,” “variants,” and “variables” were unfamiliar. Even understanding how a typical A/B test runs wasn’t straightforward.

To overcome this, I reviewed developer documentation, initiated team discussions, and asked clarifying questions in client meetings.

This fast-tracked my understanding of both the platform and its terminology. More importantly, it helped me build empathy for our target users — many of whom would face the similar hurdles. It also shaped how I approached task design later in the study.

❓ What is a "Feature Flag"?

A "feature flag" lets you turn features on or off without changing the code. It helps teams test features safely before a full launch.

🧐 “Variant” v.s. “Variable?”

A variant is one version of a feature or experience shown in an A/B test. For example: Variant A shows the original layout; Variant B shows a redesigned one. A variable is a value or element that can change. For example: button color, text label, or image.

🤔 "Development environment" v.s. "Production environment"?

The Development environment is for building and testing features. The Production environment is live, what real users see and interact with.

❓ What is a "Feature Flag"?

A "feature flag" lets you turn features on or off without changing the code. It helps teams test features safely before a full launch.

🧐 “Variant” v.s. “Variable?”

🤔 "Development environment" v.s. "Production environment"?

The Development environment is for building and testing features. The Production environment is live, what real users see and interact with.

❓ What is a "Feature Flag"?

A "feature flag" lets you turn features on or off without changing the code. It helps teams test features safely before a full launch.

🧐 “Variant” v.s. “Variable?”

🤔 "Development environment" v.s. "Production environment"?

The Development environment is for building and testing features. The Production environment is live, what real users see and interact with.

plan and conduct the study

Design broad, scenario-driven tasks to reflect real workflows

I collaborated with the PM to design scenario-driven tasks that mimic real-world use. Because FX has no standard “happy path,” I avoided prescriptive task flows. Instead, I made tasks broad enough for users to explore naturally, as they would in a real case.

Conduct remote usability test

Moderated usability test, with Qualitative + Quantitative data

We ran remote moderated usability tests with 7 participants, ranging from product managers to content strategists. I moderated 3 sessions.

Each session followed a consistent flow and combined both quantitative metrics and qualitative observations. This helped us uncover pain points, measure severity, and understand user behavior in context.

Quantitative Data

Qualitative Data

Quantitative Data

Qualitative Data

Task success rate
# of clicks to task completion, benchmarked against the minimum possible
Time on task
System Usability Scale (SUS) score

Research Findings

Research findings & Prioritization

14 challenges identified, prioritized by user severity and product impact

Through the research, we identified 14 usability issues, spanning both global and feature-specific aspects. They fall into categories including:

1) information hierarchy, 2) functionality, 3) status visibility, 4) affordance, 5) terminology.

With each of our findings, I organized them into different levels, helping Optimizely to prioritize their efforts towards the issues.

Severity → captures the immediate pain for the user during tasks (how hard it is to succeed).

Impact → captures the longer-term product and business consequences (trust, misuse, adoption).

Severity x Impact Prioritization Matrix

Design Recommendations & Exploration

System-level challenges and opportunities were identified from our research. Below are some design solutions that I explored to address those challenges. They show how targeted improvements can reduce friction across major flows, support onboarding, and improve usability for both new and returning users.

1. Make system language accessible.

Problem

Across the platform, users struggled with abstract system terms like “flag” and “rule”. Without context, they hesitated or asked for clarification before proceeding. This lack of clarity slowed setup and created avoidable errors for non-technical users.

Solution

Hoverhelp tooltip, with documentation links embedded.

I proposed a low-effort, short-term solution: surfacing definitions directly in the product. Instead of inline explanatory text, which would clutter the UI, I added hoverhelp tooltips. This gives clarity when needed while keeping the layout clean for advanced users.

Current design

Technical jargon with no explanation.

Users must leave the workflow to check documentation.

New design

✅ Terms explained in context with hover help

✅ Keeps layout clean while supporting novices when needed.

2. Strengthen hierarchy and visual clarity.

Problem

Weak hierarchy and subtle cues caused users to miss critical context. For example, environment labels were overlooked, leading to misconfigurations. Active and concluded tests were mixed together, creating clutter. CTAs and status indicators also lacked visual emphasis.

Solution

Restructured hierarchy – clearer sections and emphasized indicators.

I redesigned the ruleset panel, where I separated active and concluded experiments into distinct sections, made the environment more prominent at the top, and emphasized CTAs and status chips. These changes reduce clutter, prevent environment errors, and support faster scanning.

Current ruleset panel design

Active and concluded experiments mixed together, creating clutter.

Environment labels subtle and easily overlooked.

CTAs and status indicators visually understated.

New ruleset panel design

✅ Environment context gets prominent at the top, less likely to be missed.

✅ CTAs and status chips emphasized for quick recognition.

✅ Active and concluded sections clearly separated made navigation easy.

3. Provide robust onboarding guidance to lower the learning curve.

Problem

First-time and non-technical users struggled with multi-step workflows such as creating experiments or interpreting traffic controls. The existing onboarding relied on static checklists and documentation, which failed to explain interactions in context or prevent early mistakes.

Solution

Blank state prompts paired with a guided walkthrough.

To better support first-time and non-technical users, I designed two complementary onboarding approaches. A blank state prompt in the empty panel would guide users toward their first action. An interactive learn-by-doing walkthrough would let them practice setting up a dummy experiment step by step. Each tooltip anchors to real controls, explains the action, and only progresses once completed. Together, these approaches reduce early errors, clarify key concepts, and build user confidence faster.

Current onboarding design

Static checklists offered little real guidance.

Reading documentation felt effortful and disconnected from the task.

New onboarding design

✅ Interactive walkthrough clarifies concepts, lowers learning curve, and accelerates first success.

✅ Blank state prompt directs users to their first key action towards their goal.

Next Steps

Internal Review & Design Implementation

Based on the prioritized findings, the Optimizely PM and Design team will evaluate solutions in light of engineering effort and business goals. The proposed designs serve as references and a solid foundation for implementation, to be adapted and expanded through internal review.

Reflection

1. Always run pilot tests when possible.

Running a pilot test with a peer (who matched the target participant profile) surfaced critical issues early. It helped us refine task flow, adjust timing, and clarify instructions. If resources allowed, I would also pilot with a real user or client to ensure the study design directly supports research goals.

2. Align more actively with the client on recruitment.

In this study, the client managed recruitment, and some participants only partially matched our criteria. For future projects, I would be more involved in recruitment to ensure consistency and participant quality. This would strengthen the reliability of findings.

3. Plan session logistics with more buffer.

Some sessions were scheduled last-minute, leaving little time to refine tasks or align expectations with the client. This led to issues, such as discovering task gaps during live sessions. In the future, I would build in more time between tests, review tasks jointly with the client, and validate them in a pilot before full rollout.