The Objeqt Optimization Blog:

Actionable Posts to Help You Improve Your Conversion Rates

Whether you’re just getting started with conversion rate optimization or you’re an old pro, check out our library of how-to-posts, guides, and more. Click the topic you’re interested in now!

A/B Testing: What Hypothesis Should You Test First?

A/B Test Hypothesis

Published by

October 26, 2017

If you’re a regular reader of conversion optimization blogs, you know that the process goes something like this:

  1. Research.
  2. Hypothesize.
  3. Test.
  4. Lather, rinse, repeat!

But it’s not that simple — especially when you get to the “hypothesize” part.

The process of optimizing a website begins with research, the aim of which is to discover problems and issues on the site. However, not every issue is equally important.

Once you complete the research, a tough task lies ahead. You’ll have a bunch of data, containing a virtually infinite number of different ideas and insights. Your job? To sift ever-so-carefully through your research, note the website problems it indicates, and try to come up with ideas to solve them.

(Creating a hypothesis can be daunting on its own. Fortunately, we’ve got you covered there too. Just check out our hypothesis creation guide, and you’ll be on your way.)

So, now you’ve got your hypotheses. You’ve analyzed the data and come up with ideas that you think could explain your visitors’ behavior. And you’re ready to frame tests based on this knowledge.

But which hypothesis should you test first?

You’ve got to play favorites

Unless you’re dealing with a highly optimized site, you’ll end up with a number of different hypotheses. Those may range from simple — say, changing the copy of your call to action — or complicated, like changing the structure of a funnel or redesigning a product page.

As you can see from the two examples above, some hypotheses require more effort than others, while some have a bigger impact.

To avoid testing hypotheses at random, regardless of their importance or the effort required to actually translate them into page treatments, you have to rank and differentiate them.

From the outset of conversion optimization as a field, practitioners have sought a systematic way to solve this issue: a way to evaluate and rank hypotheses, especially in cases where it’s inadvisable or impossible to run more than one test at a time.

A/B Test Hypothesis - To do list
A/B Testing Hypothesis – To do list

Optimizers needed a way to sort their hypotheses according to a set of criteria that allows for quick and easy selection of what to implement first. To meet this need, several frameworks for hypotheses prioritization and ranking have been put forth over the years.

Before we get into those, let’s briefly go through why it’s important to prioritize your hypotheses.

Why prioritizing your hypotheses matters

As we’ve said, not all hypotheses are equally important or require as much effort to implement. This means that, without any evaluation criteria, we could end up testing a hypothesis that has limited potential impact, but requires a ton of time and effort to implement.

Meanwhile, another hypothesis that would result in immediate improvement — and which could have been implemented quickly and easily — gets neglected.

The consequence of such an unstructured approach may eventually lead to you (or your client) becoming disappointed by the limited results after all this time and effort, and simply giving up the optimization effort altogether. To avoid this admittedly sad outcome, the process of creating hypotheses must be structured instead of haphazard.

After all, structure permeates the entire optimization effort. So why would hypothesizing be any different?

VWO points out that structure is what separates the men from the boys pros from the real pros:

Often, organizations do not have a structured approach to conversion rate optimization (CRO) and arbitrarily pick out a hypothesis from the pool of options they have.

However, organizations following a structured approach to CRO realize the need for a robust prioritization framework. Let’s first look at why your optimization process needs a prioritization framework in the first place.

A properly prioritized hypothesis list helps you get more meaningful wins sooner, and leads to the establishment of a successful testing program in the long run.

Let’s look at the criteria that should influence your hypothesis ranking.

Factors that influence ranking

Impact or importance

Think back to our earlier example, where we had the choice to prioritize a copy change on a call to action OR redesign an entire product page.

Obviously, one of the first criteria you take into account should be a given hypothesis’ impact or importance. “Impact or importance” simply means how critical the the problem is in relation to the big picture.

Now, the order of the items in navigation menu probably won’t have that much of an impact on overall conversions. On the other hand, product page design changes can have much a bigger impact — and problems in this area will likely be pretty important.

Looks like we’ve found the first criterion we can use to rank hypotheses. The higher the potential conversion impact of a discovered problem, the higher the related hypothesis should rank.

Time & effort required to implement the test treatment

Our second possible consideration would be the effort necessary to implement the solution to the problem. For example, a complete product page redesign would require an extensive effort by developers and designers, while a call-to-action copy rewrite could be completed in far less time.

As the effort required to implement a solution increases, the ranking of the hypothesis decreases.

Time is also inextricably linked to effort. If we need weeks to implement a solution, then it’s possible that that time could be spent more efficiently and usefully on testing something else (while the first test is being prepared and implemented).

The time factor can also help us prioritize in another way — but more on that when the time comes. 😉

Potential ROI

One final factor that needs to be considered is the potential to achieve a positive effect, or return on investment (ROI). This factor seems obvious, but it’s sometimes overlooked… to the detriment of the testing program.

Or, as Optimizely puts it:

The most fundamental theory of prioritization is to try to assign criteria that proxies what the return on investment (ROI) of a test will be.

Return on Investment is, at the most basic level, a ratio between the cost of doing something (the Investment) and the expected revenue/conversions/leads generated/downloads/video-views/pages viewed/articles shared (the Return) of that something.

The effect is an indicator of the success of the solution. For example, “By how much did X Solution lift the KPI we tested for?”

In layman’s terms, this translates to “Don’t be the guy stuck testing low-impact stuff”.

ROI is often closely linked to importance, since there’s a direct correlation between the size of a problem and the quality of its solutions or treatments (especially in the case of so-called conversion killers).

These are just about all the important factors that need to be considered. Armed with these criteria, we can review our list of hypotheses and rank them, right?

Not so fast, sport.

A/B Test Hypothesis - Planning
A/B Testing Hypothesis – Planning

How to evaluate and balance these factors

The tricky part now is assigning value to each factor. It might seem straightforward, but let’s quickly see how it looks in practice.

Here is a list of problems identified on a real website (derived from qualitative and quantitative research, of course):

  1. High shipping costs
  2. Funnel data is wrong
  3. “Enhanced ecommerce” features of Google Analytics not implemented
  4. Visitors have no idea how to pick the best product
  5. Prospects worried about product quality
  6. Navigation is a mix of transactional and informational menu items
  7. Dutch shows up on English-language site
  8. Prices shown in Euros when not in a Euro country
  9. Header is too tall
  10. Copy is vague and/or lacks value proposition
  11. Site search users (10%) convert ~4-5x better than other visitors
  12. Not enough people served by live chat
  13. Information under Info tabs might help increase user motivation
  14. Site feels untrustworthy
  15. Page loads too slow
  16. Sale prices not visible prominently, so visitors miss them
  17. Target audience (non-US citizens) do not use ZIP code

There are 17 issues altogether. Note that not all need a hypothesis to be solved. Some, like “Page loads too slow” can be improved upon immediately.

Others, like “Funnel data is wrong,” are instrumentation or tool issues. These also don’t need a hypothesis to be solved.

So let’s concentrate on the issues that require a hypothesis to devise a solution.

A/B Test Hypothesis - Data Analysis
A/B Testing Hypothesis – Data Analysis

Here’s the same list, condensed to the six problems that don’t have obvious or technical solutions, and thus need hypotheses.

Using the criteria we established, we’ll prioritize two of the six and see what happens.

  1. High shipping costs
  2. Visitors have no idea how to pick the best product
  3. Visitors worried about product quality
  4. Vague and lacking value proposition
  5. Site search users (10%) convert ~4-5x better than other visitors
  6. Site feels untrustworthy

Let’s go with “Site feels untrustworthy” and “Visitors have no idea how to pick the best product”.

How to solve for “My visitors think they can’t trust my site”

Obviously, trust is an important issue in ecommerce, and solving it has an effect on all visitors. So it ranks pretty high on the scale of importance. That’s criterion #1 (impact or importance).

But how do we improve trust? By improving design, copy and trust indicators.

For the sake of simplicity, let’s say the design is of this site is decent, and the copy is acceptable. This helps us narrow our focus to a hypothesis that the trust issue can be solved by adding more trust indicators, such as security seals and social proof.

A/B Test Hypothesis - Trust
A/B Testing Hypothesis – Trust

Obviously, adding these indicators can be done relatively easily and with little effort. Collecting the reviews and social proof may take some time for sites that don’t have them on hand — but the effort to enable that collection is also not too taxing for developers and designers. That’s criterion #2 (time and effort).

If trust is very low, a treatment that successfully boosted it would have a huge impact. It would require some customer and visitor research (surveys or interviews, for example) to ascertain just how big a problem it is and what effect a solution could have. That’s criterion #3 (potential ROI).

How to solve for “I can’t compare products”

The other issue is easier to define. Through user testing, we have discovered visitors may be confused by the variety of products available on the site. If we offered a way to allow visitors to compare products, it would be easier for them to decide and buy a product.

The impact of this change, in all likelihood, will not encompass the entire website or all prospective customers — since some visitors absolutely arrive at the site with a clear idea of what to buy. That’s criterion #1 (impact or importance).

The effort necessary to implement the solution would likely be somewhat intensive. Both designers and developers would have to be involved. Once implemented, the potential impact of the solution would also be limited.

Using the ranking method we elaborated above, the two hypotheses/issues would rank like this:

Issue Importance Effort Impact
Site feels untrustworthy High Low High
No way to compare products Medium Medium Low

As you can see, we’ve used “Low,” “Medium,” and “High” to describe the issues. This is hardly conducive to actual ranking.

But if we translate these descriptors to numbers on a scale of 1 to 3, we can sum those numbers up and get an average score. The higher number would signify a higher ranking — so, for example, in the case of “Effort,” the higher number would indicate easier implementation.

Issue Importance Effort Impact Score
Site feels untrustworthy 3 3 3 3
No way to compare products 2 2 1 1.5

Obviously, the first hypothesis has a higher numerical rank than the second one, and would thus be higher on the priority list for implementation. Once we use this methodology on all of the 17 issues listed above, we’ll have a prioritized list of hypotheses.

There are many ranking methodologies based on variations of the above, such as “PIE” (which ranks potential, importance, and ease); “ICE” (which ranks impact, confidence, and ease) and so on.

All of them offer an answer to the question “What should we test first?”

So the issue is solved. Or is it? You may have already spotted a problem with this method.

All of the ranking methods above depend on highly subjective opinions. For example, the impact of a solution cannot be objectively, accurately estimated. So, while you can certainly rely on a prioritization framework, you still can’t be sure whether you’re prioritizing the right things.

Ultimately, devising a virtually objective ranking is the only way to solve this issue. And rankings can be objective only if they’re tied to real, data-driven inputs.

The best methodology currently available is PXL, developed by ConversionXL. This methodology is highly customizable and can be expanded to include a number of indicators, such as position of content or issue on a page, how soon visitors notice the issue, and other numerically expressed indicators.

ConversionXL Prioritization
ConversionXL Prioritization

Here at Objeqt, we prefer this methodology, since it enables us to quickly create a list of hypotheses and structure a testing program. And as you know, only a structured program can lead to optimal results and make good use of your time!

Published by

Edin is a Senior CRO Consultant. Edin is into Google Analytics and testing (any A/B testing tool really) and likes to write about it. You can follow Edin on Twitter.