The Objeqt Optimization Blog:

Actionable Posts to Help You Improve Your Conversion Rates

Whether you’re just getting started with conversion rate optimization or you’re an old pro, check out our library of how-to-posts, guides, and more. Click the topic you’re interested in now!

Ecommerce A/B Testing: How To Fail Your Way To Success

e-Commerce A/B Testing

Published by

July 6, 2017

The website’s landing page wasn’t converting as high as the owner had hoped, so he decided a revamp was in order. He changed the images, and more importantly, the copy. But he didn’t just switch from old to new (he was smarter than that!). He tested his old copy against the new copy.

After sending 10,000 clicks to the new landing page, he found his shiny new copy resulted in an 82% bounce rate (and only 100 visitors pressed the buy button, but none of them purchased). People left as soon as they arrived.

It turned out that his original copy wasn’t bad after all – though it could benefit from a little polish (with changes A/B tested). But with this “failed” experiment, he learned a lot about his audience, what they look for, what they value, and the ideas that speak best to them. It wasn’t a wasted effort at all.

A/B testing never is.

Your E-Commerce Website is a Workhorse – A/B Testing Ensures You’re Ploughing in the Right Direction

Any eCommerce business owner has had to make a change, an update, a re-brand or refresh at some point, and committing to a change can be quite traumatic. This is especially true if you already have a reasonably well performing website.

You don’t want to “rock the boat.”

You don’t want to risk revenue.

 You should be willing to test, even if it fails you learn something

You should be willing to test, even if it fails you learn something – image source

But at the back of your mind there’s this voice whispering “No risk, no reward.”

Don’t listen to that voice – it doesn’t know what it’s talking about. Because when we test proposed changes, the risk becomes minuscule and the reward is so much greater because we can tweak, monitor, and change until a landing page, product page or sales page takes off like wildfire.

But they key to reducing risk and reaping those rewards is to stop guessing – and start testing.

Without testing – if you just make changes blindly – you risk decreasing performance and even alienating your existing customer base.

“Experiment” Sounds Risky – But it’s Riskier if You Don’t

If you make a change and it results in worse performance, like the example above, not only could you lose revenue, but you will lose prospects who won’t come back. The stakes are high. And within your company, you may have to fight vested interests and perceptions of wasted effort that went into designing and implementing the new elements of the website. After all, without testing, the appearance and elements of your website are up to individual interpretations and tastes, which means you’re working with opinions over facts (and people are opinionated!).

But let’s look at the best-case scenario of making a change without testing.

Let’s say you knocked it out of the park – the change you made had a positive effect.

How can you be sure that doing something else wouldn’t have resulted in even bigger benefits?
Maybe you increased revenue by 4% and are feeling pretty good about yourself. Until you realize that you could have achieved a 6% increase in revenue if you had done something just a bit different.

When you commit to testing – not just testing one change, but evolving into a “testing culture,” you’ll find that every decision you make is grounded in research and objective data. There are so many tools available to help you find out how visitors arrive to the site, what pages they find more interesting, what content is the most influential to their decision to purchase, and so on.

Strong research is where we always begin – but it’s not enough in itself. Frequently, the data we collect leads to a number of solutions and a nearly infinite number of possible variations you can try to improve the performance and conversion rate of a web site.

How can we decide which one is the best? How can we find the variation that delivers results faster? The answer: Experimentation through testing.

When you want to find the best way – not just a better way – to improve your website, that is why we do split testing.

Some things you can test include:

  • Placement, prominence, shape and color of CTA or “Add to Cart” buttons
  • Color scheme of your website
  • Size and readability of font
  • Different types of images (or video) on product pages
  • Placement of reviews or testimonials
  • Landing page/product page copy

And so much more.

Each of these could be changed in any number of ways. And you can change a single element, or a combination of elements.

Sure, you could implement a change and see how it goes, accepting any positive outcome as a success.

Or you could expose your visitors to multiple versions simultaneously, compare the results, and then choose the best performing one.

AB Testing to Optimize Conversions
How testing looks like

Clearly, this is a better option. But it’s easy to make a change and see how it goes. Showing multiple versions to different visitors? How does that even work?

How Visitors See Different Versions of the Same Website (we are truly living in the future)

Split testing enables you to direct different parts of your audience to two or more versions of the same website element (a page or even a path) simultaneously so you can compare the results.

In real time.

You start by forming an educated guess (a hypothesis) for what change will yield a positive result. Then you test that guess against the existing version – or another guess, if you’re doing multivariate testing (ie. more than just two options).

Coming up with a strong hypothesis is a science by itself (and we’ve written a post explaining exactly how to construct a hypothesis to get the best results here).

Once you have a viable hypothesis, next step is to transform it into an actual design. This design is then turned into an alternative version of the web page that you want to change.

The decisive part comes next. In order to test the proposed variation and compare the result, the two versions need to be presented simultaneously to equal, random portions of visitors to your website in order to tell which performs better. Then you run the test until it reaches what is called statistical significance, which means that enough people, over enough time, have been to the site to yield a decisive result.

When you do it right, coming up with the hypothesis is, by far, the most demanding part of the conversion optimization process.

The rest is easy – there are so many tools available for split testing. The hard part is figuring out what tests to run and when to call the results conclusive.

But, you do have to have a little knowledge of statistics to really understand your results.

Statistics 101: What you need to know

Testing is actually a statistical term. While running the tests themselves does not require an advanced knowledge of statistics, it is important to identify the right conditions that enable you to run statistically meaningful tests.

The first thing to keep in mind is that a statistically significant test requires a large enough sample size of visitors. Even a simple A/B test requires a sample of at least 100,000 visits a month to yield statistically significant results. The more variants you test, the larger sample you need (because you have to send equal numbers of people to each one). For multivariate tests, the required sample size increases exponentially.

The sample size required for significance depends on three main elements:

  1. Baseline rate
  2. Minimum detectable effect
  3. Significance level

Baseline rate

The baseline rate is the current value of the indicator we aim to improve, for example the conversion rate. It’s the existing number we’re comparing the variant against when we test.

Minimum detectable effect

The minimum detectable effect is the minimum improvement over the baseline that you are willing to detect in an experiment. We use minimum detectable effect (MDE) to prioritize experiments based on ROI because it functions as an estimate of the effort involved versus impact of an individual experiment. It’s basically a way to see the relationship between effort and impact (or cost vs. value).

Usually, the improvement we’re looking for is in the conversion rate of our web page. For example, when we set up the test, we will postulate the expected increase will be at least 10% relative to the existing conversion rate. The test will be conducted in an effort to find out the increase within the tested sample.

The objective of sampling is to provide you with as accurate representation of general population as possible
The objective of sampling is to provide you with as accurate representation of general population as possible

The larger the effect is, the smaller sample we require to have a definitive result. The lower the effect, the larger the sample size needs to be to detect a significant change.

To put this in practical terms, let us examine what happens if we conduct an experiment on a website that has a conversion rate of 2%. After conducting a research, we detect that the design of a product pages is harming conversions, by not having prominent enough add to cart button, product images are low quality and product descriptions are too long and incomprehensible to prospects. Due to all these reasons, the conversion rate suffers.

To solve these problems, we hypothesized solution in changing the page layout, shortening description copy and providing high quality images of the products. The new layout is due to be put to the test. However, before starting the test, we need to decide how many observations we need to make the experiment valid.

After we surmised the quantitative research, we realized that in addition to 2% of visitors who bought the product, there was at least 1% more who said the major impediment to their conversion was lack of clear information on the product. Therefore, we hypothesized that improving the product copy we would increase the conversions on the website for 1% or 50% total lift.

That means that the minimum detectable effect we aim for is 50%. This information enables us to make the sample size calculation.

There are many sample size calculators you can find on the net and Evan Miller’s is just one of them. When you provide the data from our example, this is what you get:

A sample size calculator output
A sample size calculator output

As you can see, we need 3,292 observations per each variation (for a total of 6,584) to reach the significance level of 95%, usually the minimum acceptable significance level. And with that we reach the subject of significance level, another major factor in sample size calculation.

Significance level

Significance level represents the percent of certainty that the result we are seeing (an improvement, hopefully) is the result of the change we made and not pure chance. Generally, 95% is considered minimum acceptable for this value. The higher this value is set, the larger sample size that will be required.

Conducting a test

In order to conduct a test, as we have already mentioned, you need a properly formed hypothesis. Only a valid hypothesis can lead to a meaningful test.

Proper hypothesis creation requires a great deal of work and we have already covered this in one of our previous posts.

So let’s cover some basics of how to conduct a successful test.

Once you have decided what to test, created the basic design(s) and calculated the sample size you need for the test, it’s time to actually DO IT.

This will involve creating an experiment in your tool of choice (one of the most popular is Optimizely, but there is also Visual Web Optimizer, SiteGainer and Google Optimize), and launching the experiment. The experiment will run within the tool until the specified conditions of sample size and statistical significance are reached.

To ensure that your tests are effective, we recommend letting them run for full weeks and at least two buying cycles to eliminate any outliers. You don’t want spikes or slumps in traffic due to weekends or holidays to affect your tests. And, of course, if you don’t have high enough traffic, that’s also going to affect your timeline (and whether you’re ready to take on A/B testing at all).

As a rule, the vast majority of tests should not run for longer than one month. The test results depend upon the randomness of the sample and many people (up to 30% according to this study) delete their cookies, causing them to be recounted as unique visitors. This results in sample pollution.

When your sample is no longer random, without you or your testing software knowing it, the results will not be reliable and you may get a false positive. This would render your test a failure because the result is not a true reflection of the actual performance of the variation you tested.

Once you have a conclusive result that has been tested with sufficient sample size and duration, you will have either confirmed or disproven your initial hypothesis. If the proposed variation performed better than unchanged page (the control) we call it a winning test. You can implement it immediately, but you may want to phase out the old page gradually, if you are not 100% confident in the result.

We recommend directing all traffic to the winning variation, since leaving any part of the traffic on the original page will likely result in conversion being lower than it could have been (with loss of revenue as the ultimate consequence).

Conclusion: Testing isn’t something you do once – it’s how you grow

You can make a change by guessing. You can make a change after research and gathering qualitative and quantitative data – but even that’s still guessing (though better guessing) unless you actually test your results.

Testing brings the scientific method into optimizing your e-commerce site, which comes with a long list of benefits.

You can explain why you’ve implemented each change and prove that it works.

You have a rational, impartial way to decide on design and usability questions.

And, basically, you can eliminate a lot of back-and-forth around the conference table that is the inevitable result of guesswork!

The best part, possibly, is that unless you run your test the wrong way, you really can’t fail, because you will always learn something valuable to use in your next iteration.

In short, experimenting with content variations helps to reduce or eliminate many of the obstacles and risks inherent in changing an e-commerce website and allows you to optimize the performance using a sound statistical and scientific foundation.

But before you sign up for your new A/B testing tool, remember this: The process of testing cannot be conducted without first gathering and analyzing data on the current performance of the website. This combination of analysis and experimentation provides the foundation for managing and maintaining the constant long term growth and development of your business.

Testing is not a one off way to make more money or save some money. It is an essential ingredient for the long term growth of your company. Having a testing mindset in your organization will ensure your ability to adjust to changes in your business environment and client expectations.

Methodical testing has led many of the e-commerce giants like Amazon and Google to grow from small businesses to the colossal multinational companies they are today.

Amazon in june 2007
Amazon in June 2007

Ten years of constant improvement have made Amazon into this:

Amazon in June 2017
Amazon in June 2017

Is your company next? Also let us know if you require guidance and we’ll be happy to help you get started.

Published by

Edin is a Senior CRO Consultant. Edin is into Google Analytics and testing (any A/B testing tool really) and likes to write about it. You can follow Edin on Twitter.