The ultimate question in applied research is: does event A cause event B? The search for causality relationships have pushed applied scientists to rely more and more on field experiments. To be able to detect a treatment effect (or the causality impact), and if setting up an experiment, researchers need to determine the required sample size. Stata 15 has a command power that allows users tremendous flexibility in determining sample size, power of test, and graph those. In the following example we compute sample size for a specific power of test; we could have also computed different powers for a specific sample size.

Assume that we want to calculate the impact of an innovative teaching program on General Equivalency Development (GED) completion rates for a young population (between the age of 17 and 25). In order to do so, we need to run a randomized control trial and randomly pick participants in the innovative program. We know from the literature that the proportion of the population in question with a GED is around 66%. Experts in the education field expects the new program to increase GED completion rates by 20 percentage points (to 86%). How many study participants do we need to test this hypothesis?

Using the command below Stata helps identify the total number of participants (and also the number in each group: treatment and control) needed in the study.

`power twoproportions 0.66 0.86, power(0.8) alpha(0.05)`

The `twoproportions`

method is used because we are comparing two proportions, 0.66 is the proportion of the population with GED and 0.86 is what we expect the proportion of the population with GED will be after completing the program. `power()`

is the power of the test and `alpha()`

is the significance level of the test; both values here are the default values.

`Performing iteration ...`

`Estimated sample sizes for a two-sample proportions test`

Pearson's chi-squared test

Ho: p2 = p1 versus Ha: p2 != p1

`Study parameters:`

`alpha = 0.0500`

power = 0.8000

delta = 0.2000 (difference)

p1 = 0.6600

p2 = 0.8600

`Estimated sample sizes:`

`N = 142`

N per group = 71

The result of the power analysis tells us that we need to recruit 142 participants into the study in which we enroll 71 in the new program and rely on the remaining 71 as a control group.

Let us say we now want to relax the 0.8 assumption of the power of the test and allow 4 different values ranging from a low of 0.6 to a high of 0.9

`power twoproportions 0.66 0.86, power(0.6 0.7 0.8 0.9) alpha(0.05)`

`Performing iteration ...`

`Estimated sample sizes for a two-sample proportions test`

Pearson's chi-squared test

Ho: p2 = p1 versus Ha: p2 != p1

`+-----------------------------------------------------------------+`

| alpha power N N1 N2 delta p1 p2 |

|-----------------------------------------------------------------|

| .05 .6 90 45 45 .2 .66 .86 |

| .05 .7 112 56 56 .2 .66 .86 |

| .05 .8 142 71 71 .2 .66 .86 |

| .05 .9 188 94 94 .2 .66 .86 |

+-----------------------------------------------------------------+

The command gives us the sample size of the 4 different scenarios with a low of 90 participants to a high of 188. As expected, the higher the power of the test the larger the sample size that is required.

We can graph the above table using the below command:

`power twoproportions 0.66 0.86, power(0.6 0.7 0.8 0.9) alpha(0.05) graph`

In some cases, the sample size has been predetermined. For a variety of reasons which could eligibility into a program, financial, etc. the number of participants in a study is known and also fixed. In the above example assume the number of participants eligible for the innovative education program is 120. How much power can we detect from the given sample size and a number of different magnitude effects of the program? As a reminder, power of a test is the probability of making a correct decision (in order words to reject the null hypothesis) when the null hypothesis is actually false.

`power twoproportions 0.66 (0.71 0.76 0.81 0.86), n(120) alpha(0.05) graph`

In the above command, we fix sample size to 120 and suggest 4 different effect sizes each increasing by 5 percentage points from the initial 66% proportion of GED completion rates.

The resulting graph suggests low power with the highest being 73% for an effect of 20 percentage points increase in the proportion of GED completion rate. This is not surprising given that the first command gave us a required 142 participants to get the same effect with a power of 80%. With less number of participants, we expect lower power.