Power Curve in R
Power curves are line plots that show how the change in variables, such as effect size and sample size, impact the power of the statistical test.
For this tutorial, we will be generating power and interpreting power curves.
We can use the pwr package to perform statistical power analysis in R.
This package has statistical power analyses for many experiment or study types. These have a common approach: enter three of the four parameter options above (sample size, effect size, statistical significance, and power) and the package will calculate the fourth parameter.
library(tidyverse)
## -- Attaching packages -------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.4
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0library(pwr)
Let’s understand, What is POWER ?
The power of a hypothesis test is the probability that the test correctly rejects the null hypothesis. The power of a hypothesis test is affected by the sample size, the difference, the variability of the data, and the significance level of the test.
In R, it is fairly straightforward to perform a power analysis for the paired sample t-test using R’s pwr.t.test
function.
1. Generate and interpret the power curve for a t-test with a fixed sample size of 25 per group.
Let’s Create Function to Generate power curve
for t-test
for given sample size of α = 5% ( significance level)
power.curve <- function(n){
cd <- seq(.1,1.5,.1) #Vector of effect size
samp.out <- NULL
for(i in 1:length(cd)){
power <- pwr.t.test(d=cd[i],n=n,sig.level=.05,type="two.sample")$power
power <- data.frame(effect.size=cd[i],power=power)
samp.out <- rbind(samp.out,power)
}
ggplot(samp.out, aes(effect.size,power))+
geom_line() +
geom_point() +
theme_minimal() +
geom_hline(yintercept = .8,lty=2, color='blue') +
labs(title=paste0("t-test Power Curve for n=", n),
x="Cohen's d",
y="Power")
}
Call function to generate Power curve for Sample Size 25
n <- 25
power.curve(n)
Interpretation: In this plot, the power curve for a sample size of 25 shows that the test has a power of 0.8 for a difference of 0.8. As the difference approaches 0, the power of the test decreases and approaches α (also called the significance level), which is 0.05 for this analysis.
2. Generate and interpret the power curve for a t-test with a fixed sample size of 100 per group.
Call function (created to solve problem 1) for Sample Size 100
n <- 100
power.curve(n)
Interpretation: In this plot, the power curve for a sample size of 100 shows that the test has a power of 0.8 for a difference of 0.4. As the difference approaches 0, the power of the test decreases and approaches α (also called the significance level), which is 0.05 for this analysis.
3. Generate and interpret the power curve for a 2 proportion test with a fixed sample size of 30 per group
Create Function to Generate power curve
for 2-proportion test
for given sample size
power.p.curve <- function(n){
cd <- seq(.1,1.5,.1) #Vector of effect size
pwr.2p.test(h=.33,power = .8,sig.level = .05)
samp.p.out <- NULLfor(i in 1:length(cd)){
power <- pwr.2p.test(h=cd[i],n=n,sig.level=.05)$power
power <- data.frame(effect.size=cd[i],power=power)
samp.p.out <- rbind(samp.p.out,power)
}ggplot(samp.p.out, aes(effect.size,power))+
geom_line() +
geom_point() +
theme_minimal() +
geom_hline(yintercept = .8,lty=2, color='blue') +
labs(title=paste0("2 proportion test power curve for n=", n),
subtitle = "Two proportions",
x="Cohen's d",
y="Power")
}
Call function to generate power curve
for t-test
for sample size 30
n <- 30
power.p.curve(n)
Interpretation: In this plot, the power curve for a sample size of 30 shows that the 2 proportion test has a power of 0.8 for a difference of 0.71. As the difference approaches 0, the power of the test decreases and approaches α (also called the significance level), which is 0.05 for this analysis.
4. Generate and interpret the power curve for a 2 proportion test with a fixed sample size of 50 per group.
Call function (created to solve problem 3) for sample size 50
n <- 50
power.p.curve(n)
Interpretation: In this plot, the power curve for a sample size of 50 shows that the 2 proportion test has a power of 0.8 for a difference of 0.55. As the difference approaches 0, the power of the test decreases and approaches α (also called the significance level), which is 0.05 for this analysis.
The plots for problems 5–7 are slightly different since we have fixed power at 80%. Think about what values you will use for the x-axis and which values you will use for the y-axis.
5. Generate and interpret the power curve for a t-test with a fixed sample size of 50 per group, power of 80% for values of the significance level between 0.01 and 0.10.
Here, we are asked to generate 80% power curve between 0.01 to 0.10 significance level for size of 50 per group. What we need to visualize is, the difference by significance level.
- X axis — Effect Size
- Y axis — Significance level
sig.level.list <- seq(.01,0.10,.01) #Vector of sig..level
samp.out <- NULL
for(i in 1:length(sig.level.list)){
eff.xxx <- pwr.t.test(power=.80, sig.level= sig.level.list[i], n=50)$d
eff.xxx <- data.frame(sig.level=sig.level.list[i],effect.size=eff.xxx)
samp.out <- rbind(samp.out,eff.xxx)
}
ggplot(samp.out, aes(effect.size,sig.level))+
geom_line() + theme_bw() +
geom_hline(yintercept = .05,lty=2, color='blue') +
theme_minimal() +
geom_point() +
labs(title="Significance level vs effect Size, power=0.80, n=50",
y="Significance Level",
x="Cohen's d")
Interpretation: In this plot, 80% power curve for a sample size of 50 shows that the t-test has a difference of 0.57 at significance level 0.05. Which is considered as medium. We need a bigger sample size to match the effect size of study.
6. Generate and interpret the power curve for a two proportion test with a fixed sample size of 60 per group, power of 80% for values of the significance level between 0.01 and 0.10.
sig.level.list <- seq(.01,0.10,.01) #Vector of sig..level
samp.p.out <- NULLfor(i in 1:length(sig.level.list)){
eff.xxx <- pwr.2p.test(power=.80, sig.level= sig.level.list[i], n=60)$h
eff.xxx <- data.frame(sig.level=sig.level.list[i],effect.size=eff.xxx)
samp.p.out <- rbind(samp.p.out,eff.xxx)
}
ggplot(samp.p.out, aes(effect.size,sig.level))+
geom_line() + theme_bw() +
theme_minimal() +
geom_hline(yintercept = .05,lty=2, color='blue') +
geom_point() +
labs(title="Significance level vs effect Size, power=0.80, n=60",
subtitle = "Two proportions",
y="Significance Level",
x="Cohen's d")
Interpretation: In this plot, 80% power curve for a sample size of 60 shows that the 2 proportion test has a difference of 0.52 at significance level 0.05. Which is considered as medium. We need a bigger sample size to match the effect size of study.
7. Generate and interpret the power curve for a t-test with power of 80%, effect size of 0.7 for values of the significance level between 0.01 and 0.10.
sig.level.list <- seq(.01,0.10,.01) #Vector of sig..level
samp.out <- NULLfor(i in 1:length(sig.level.list)){
n.xxx <- pwr.t.test(power=.80, sig.level= sig.level.list[i], d=.7)$n
n.xxx <- data.frame(sig.level=sig.level.list[i],sample.size=n.xxx)
samp.out <- rbind(samp.out,n.xxx)
}
ggplot(samp.out, aes(sample.size,sig.level))+
geom_line() + theme_bw() +
theme_minimal() +
geom_hline(yintercept = .05,lty=2, color='blue') +
geom_point() +
labs(title="t-test Power Curve for Significance level vs Sample Size, power=0.80, effect=.7",
y="Significance Level",
x="Sample size")
Interpretation As you can see, the sample size increases from 25 to 50 for specified power of .80 when alpha(significance level) drops from .10 to .05. This means if we want our test to be more reliable, i.e., not rejecting the null hypothesis in case it is true, we will need a larger sample size.