User loginNavigationSearch 
When a Test Rollout Shows Different ResultsI received this question: "We ran a campaign where we observed an uplift with a change in button text. When we "rolled" it out with 5% default 95% change, the new data is showing that it's under performing. Have you observed situations like these before?" Here's my response: Let's assume everything was done "by the book" in terms of the test setup and analysis. Even so, there is still at least a 1 in 20* chance the initial "winner" was really not a winner but appeared so due to sampling error. (*Assuming you ran a statistics test and picked winners when the results are p<0.05). I say "at least 1 in 20" because these stats tests are designed around the idea that there is an underlying "truth" that is stable. In the natural sciences there is usually an underlying stable truth rooted in the laws of nature. In marketing scenarios however, the underlying truth can have a great deal of fluctuation. The population of site visitors changes from day to day due to seasonality, promotions, competitive influences, and so on.
Running Multiple Tests At A TimeThe most dogged misperception about split testing is that you should only run one test at a time. For example there are two tests conceived of by two different people in your organization – one on the home page and one in the checkout – and someone says, “We can’t run them at the same time because we won’t get clean results.” Should you believe them? It depends. Most of the time you can run tests simultaneously without worrying. Having the flexibility allows you to learn more quickly. There are scenarios when you have to be careful however. Let’s setup one of those examples. Test #113: Home Page Let’s say if tested in isolation Test #113 shows higher conversion rates with Style Blue. And in isolation Test #114 shows higher conversion rate with Style Purple.
Estimating Sample Sizes by Experiment Effect SizeIf you are running an A/B test there is an inherent tradeoff between how long you run the test and how precise your answer will be. This reference chart below should give you some guidance. Find the baseline conversion right on the lefthand size that is closest to your conversion rate. Then select a column that approximates how big an impact you guess your experimental version will have on your conversion rate. Next, locate the number of visitors needed PER recipe for a tworecipe test in the first grid of numbers. If you'd rather estimate based on number of conversions in the baseline recipe then look at the second set of numbers, or else the number conversions in the challenger in the third set of numbers. (click to open larger image in new window) Overall, the smaller the effect size you need to detect, the more traffic you need. If you don't have that traffic volume, you'll only be able to detect effects that are larger. Also, the smaller your baseline conversion rate is, the more traffic you need to detect the same effect size.
That's totally random!Actually, no it's not. When you run a test and split the traffic "evenly" it's unlikely you will get exactly the same number of visitors in each recipe. The most common method of splitting traffic is via a random number stored in a cookie. Some other methods are used as well: "round robin" and "time splitting". Assuming you are using a random number generator there are two main sources of variance off a completely even split. First is that the random number generator isn't completely random. In this illustrator we stick to a common method of generating a fairly random number which does a pretty good job, as you can see when you increase the sample size per round. We must admit however that computers can't generate something purely random, but they can get very close. In Javascript we flip a coin like: randomnumber = Math.floor(Math.random() * 2) + 1; The remaining variance is due to "sampling error". The sampling error is the variation that occurs due to the fact that you are only sampling some members of a larger population, and those samples may not be perfect representations of the larger population. This is also known as statistical noise. In this simulator you can adjust the sample size per round and visually see the kinds of variations you can expect to get when using samples of this size. You may try my random number generator simulation which will open in a new window. This will allow you to play with the sample size and see the impact on the variance that is due mainly to sampling error.
Sample Size Calculator & Statistical PowerCalculating sample size is an important first step in the process of running an a/b split test. Many people ignore this step, or do not understand the purpose it serves. Let’s explore why to calculate sample size and how to do it. What is Statistical Power
What to Consider When Building Your Own A/B Test SystemSometimes Adobe Test & Target is the right tool for your A/B testing needs. Or maybe Google Website Optimizer fits the need (and budget) better. How about SiteSpect or Optimost? But what if your particular business requires you to build your own A/B testing or MVT framework from scratch? That's the situation that confronted me during the past year. It can feel a little daunting, but what I've learned is that if you stick with it, it could be worth the effort to code your own website optimization system. Why would you want to build your own website optimization platform? When you’re optimizing a landing page for leads or sales, and those leads or sales are going to happen right away, an offtheshelf solution may be just perfect. Adobe Test & Target will track how many people come into each test recipe, and how many people convert. You’ve got a conversion rate on your control and your test treatments. Voila – you’ve got a working A/B testing solution. For subscription businesses or softwareasaservice (SaaS) businesses, this may not be so simple. A given recipe may outperform control on some
Trader Joe's Gives a Lesson in Measuring A/B Test Success
“Are all these signs handmade?” I asked the three friendlylooking employees behind the Customer Service desk.
