6/7/2023 0 Comments Sequential testing ab testing![]() Since this is the expected samplesize, to be sure that the test doesn’t often require a much higher samplesize, let’s also take a look at the more extreme outcomes, for instance the 5th and 95th percentiles (with p 1 and p 2 close to 0.5 as earlier):įor most of the true differences the samplesize is still lower than the fixed-samplesize test, except for differences below 0.015. ![]() What about when p 1 and p 2 are farther from 0.5?Īctually, as the true p 1 and p 2 get closer to either 0 or 1, the expected samplesize will always be smaller than the fixed samplesize test. ![]() However, this is just when the proportions are close to 0.5. the smallest difference we were interested in detecting. The only case where the sample size for the sequential GLR test can be expected to be larger, is when the true difference between p 1 and p 2 is just below 0.01, i.e. Let’s take a closer look at the expected samplesize when the differences between the true proportions are small. The test will stop especially early when there is a large difference between the proportions, so if there is a significant advantage of choosing one of the alternatives, this can be acted upon as early as possible. We’ll first look at the case where the expected samplesize for the sequential GLR test is worst, when the proportions are closest to 0.5.Īs you can see, the expected samplesize of the sequential GLR test is much smaller for almost any value of the true difference. Note that the expected samplesize for the sequential GLR test vary depending on the true proportions p 1 and p 2, so we compare the samplesize at different true proportions. We compare the tests at the same levels, α-level 0.05 and β-level 0.10, and say that we want to detect a difference between proportions larger than 0.01 (in sequential analysis this is usually called an “indifference region” of size 0.01). The test I’ll compare is a comparison of proportions test, which is commonly used in A/B-testing to compare conversion rates. I’ll give a brief example of usage below, but to give you some idea about the potential savings, I’ll first show you a comparison of the needed samplesize for a fixed samplesize test versus the sequential GLR test. This means the test could be stopped as early as after a handful of samples if there is a strong effect present.ĭespite this very nice property, I couldn’t find any public implementation of this test, so I’ve created a node.js implementation of this test, SeGLiR, which can easily be used in web application A/B testing. Unlike classical fixed sample-size tests, where significance is only checked after all samples have been collected, this test will continously check for significance at every new sample and stop the test as soon as a significant result is detected, while still guaranteeing the same type-1 and type-2 errors as the fixed-samplesize test. The Sequential Generalized Likelihood Ratio test (or sequential GLR test for short) is a test that is surprisingly little known outside of statistical clinical research. In this post I’ll introduce a very little known test that in many cases severely reduces the number of samples needed, namely the Sequential Generalized Likelihood Ratio Test. In many cases it can take several weeks, months or even years to collect enough data to conclude a test. Rapid A/B-testing with Sequential AnalysisĪ common issue with classical A/B-tests, especially when you want to be able to detect small differences, is that the sample size needed can be prohibitively large.
0 Comments
Leave a Reply. |