## Calculating Statistically Significant Split Tests

You may wonder how it’s calculated. Or if it can be trusted?

In Google Analytics experiments, it looks like this:

Since watching the talented Jason Cohen present on false positives in testing, I prefer to use a simpler method.

**The A/B Hamster Method!**

Assuming the control and variations have the same sample size:

Version |
Visits |
Conversion Rate |
Conversions |

Control |
1000 | 5.00% | 50 |

Variation |
1000 | 6.00% | 60 |

* Define N as the number of conversions: so

**N = 110**in the example above (50 + 60)

* Define D as the difference between the winner and loser divided in half: So, using the above data, this =

**5**((60-50) / 2)

* The test result is statistically significant if D2 is bigger than N. So, above

**D2 = 25**(5 x 5), and

**N = 110**, so this is not a significantly significant test.

So… what? Run it for longer? Sure… lets run it for another week (you should always run tests for a least a week, so cover a full 7 days of weekly seasonality):

Version |
Visits |
Conversion Rate |
Conversions |

Control |
2000 | 4.84% | 97 |

Variation |
2000 | 6.43% | 129 |

So, we have slight changes in those conversion rates, but more importantly:

*

**N = 226 (97 + 129)**

*

**D2 = 256**

So... D2 is greater than N, and therefore the above is now statistically significant. Stop the test, we have a winner!

I just love this and really helps non statistics people get how confidence thresholds are calculated. Satisfying the above measure to statistical significance actually gives you a 96% confidence threshold, so you have an extra 1% of reassurance!

**Summary-Arium:**

* If D2 is greater than N, you can be confident you’ve found a winner in your split test.

* Give split tests time to find a conclusion

**A Nod to…**

Jason Cohen and the AB Hamster method to determine statistical significance