Title: | Design and Analysis of Replication Studies |
---|---|
Description: | Provides utilities for the design and analysis of replication studies. Features both traditional methods based on statistical significance and more recent methods such as the sceptical p-value; Held L. (2020) <doi:10.1111/rssa.12493>, Held et al. (2022) <doi:10.1214/21-AOAS1502>, Micheloud et al. (2023) <doi:10.1111/stan.12312>. Also provides related methods including the harmonic mean chi-squared test; Held, L. (2020) <doi:10.1111/rssc.12410>, and intrinsic credibility; Held, L. (2019) <doi:10.1098/rsos.181534>. Contains datasets from five large-scale replication projects. |
Authors: | Leonhard Held [aut] |
Maintainer: | Samuel Pawel <[email protected]> |
License: | GPL (>=2) |
Version: | 1.3.3 |
Built: | 2025-02-19 05:15:43 UTC |
Source: | https://github.com/crsuzh/replicationsuccess |
Convert between estimates, z-values, p-values, and confidence intervals
ci2se(lower, upper, conf.level = 0.95, ratio = FALSE) ci2estimate(lower, upper, ratio = FALSE, antilog = FALSE) ci2z(lower, upper, conf.level = 0.95, ratio = FALSE) ci2p(lower, upper, conf.level = 0.95, ratio = FALSE, alternative = "two.sided") z2p(z, alternative = "two.sided") p2z(p, alternative = "two.sided")
ci2se(lower, upper, conf.level = 0.95, ratio = FALSE) ci2estimate(lower, upper, ratio = FALSE, antilog = FALSE) ci2z(lower, upper, conf.level = 0.95, ratio = FALSE) ci2p(lower, upper, conf.level = 0.95, ratio = FALSE, alternative = "two.sided") z2p(z, alternative = "two.sided") p2z(p, alternative = "two.sided")
lower |
Numeric vector of lower confidence interval bounds. |
upper |
Numeric vector of upper confidence interval bounds. |
conf.level |
The confidence level of the confidence intervals. Default is 0.95. |
ratio |
Indicates whether the confidence interval is for a
ratio, e.g. an odds ratio, relative risk or hazard ratio.
If |
antilog |
Indicates whether the estimate is reported on the ratio scale.
Only applies if |
alternative |
Direction of the alternative of the p-value. Either "two.sided" (default), "one.sided", "less", or "greater". If "one.sided" or "two.sided" is specified, the z-value is assumed to be positive. |
z |
Numeric vector of z-values. |
p |
Numeric vector of p-values. |
z2p
is vectorized over all arguments.
p2z
is vectorized over all arguments.
ci2se
returns a numeric vector of standard errors.
ci2estimate
returns a numeric vector of parameter estimates.
ci2z
returns a numeric vector of z-values.
ci2p
returns a numeric vector of p-values.
z2p
returns a numeric vector of p-values. The
dimension of the output depends on the input. In general,
the output will be an array of dimension
c(nrow(z), ncol(z), length(alternative))
. If any of these
dimensions is 1, it will be dropped.
p2z
returns a numeric vector of z-values. The
dimension of the output depends on the input. In general,
the output will be an array of dimension
c(nrow(p), ncol(p), length(alternative))
. If any of these
dimensions is 1, it will be dropped.
ci2se(lower = 1, upper = 3) ci2se(lower = 1, upper = 3, ratio = TRUE) ci2se(lower = 1, upper = 3, conf.level = 0.9) ci2estimate(lower = 1, upper = 3) ci2estimate(lower = 1, upper = 3, ratio = TRUE) ci2estimate(lower = 1, upper = 3, ratio = TRUE, antilog = TRUE) ci2z(lower = 1, upper = 3) ci2z(lower = 1, upper = 3, ratio = TRUE) ci2z(lower = 1, upper = 3, conf.level = 0.9) ci2p(lower = 1, upper = 3) ci2p(lower = 1, upper = 3, alternative = "one.sided") z2p(z = c(1, 2, 5)) z2p(z = c(1, 2, 5), alternative = "less") z2p(z = c(1, 2, 5), alternative = "greater") z <- seq(-3, 3, by = 0.01) plot(z, z2p(z), type = "l", xlab = "z", ylab = "p", ylim = c(0, 1)) lines(z, z2p(z, alternative = "greater"), lty = 2) legend("topright", c("two-sided", "greater"), lty = c(1, 2), bty = "n") p2z(p = c(0.005, 0.01, 0.05)) p2z(p = c(0.005, 0.01, 0.05), alternative = "greater") p2z(p = c(0.005, 0.01, 0.05), alternative = "less") p <- seq(0.001, 0.05, 0.0001) plot(p, p2z(p), type = "l", ylim = c(0, 3.5), ylab = "z") lines(p, p2z(p, alternative = "greater"), lty = 2) legend("bottomleft", c("two-sided", "greater"), lty = c(1, 2), bty = "n")
ci2se(lower = 1, upper = 3) ci2se(lower = 1, upper = 3, ratio = TRUE) ci2se(lower = 1, upper = 3, conf.level = 0.9) ci2estimate(lower = 1, upper = 3) ci2estimate(lower = 1, upper = 3, ratio = TRUE) ci2estimate(lower = 1, upper = 3, ratio = TRUE, antilog = TRUE) ci2z(lower = 1, upper = 3) ci2z(lower = 1, upper = 3, ratio = TRUE) ci2z(lower = 1, upper = 3, conf.level = 0.9) ci2p(lower = 1, upper = 3) ci2p(lower = 1, upper = 3, alternative = "one.sided") z2p(z = c(1, 2, 5)) z2p(z = c(1, 2, 5), alternative = "less") z2p(z = c(1, 2, 5), alternative = "greater") z <- seq(-3, 3, by = 0.01) plot(z, z2p(z), type = "l", xlab = "z", ylab = "p", ylim = c(0, 1)) lines(z, z2p(z, alternative = "greater"), lty = 2) legend("topright", c("two-sided", "greater"), lty = c(1, 2), bty = "n") p2z(p = c(0.005, 0.01, 0.05)) p2z(p = c(0.005, 0.01, 0.05), alternative = "greater") p2z(p = c(0.005, 0.01, 0.05), alternative = "less") p <- seq(0.001, 0.05, 0.0001) plot(p, p2z(p), type = "l", ylim = c(0, 3.5), ylab = "z") lines(p, p2z(p, alternative = "greater"), lty = 2) legend("bottomleft", c("two-sided", "greater"), lty = c(1, 2), bty = "n")
The minimum relative effect size (replication to original) to achieve replication success with the sceptical p-value is computed based on the result of the original study and the corresponding variance ratio.
effectSizeReplicationSuccess( zo, c = 1, level = 0.025, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
effectSizeReplicationSuccess( zo, c = 1, level = 0.025, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
zo |
Numeric vector of z-values from original studies. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
level |
Threshold for the calibrated sceptical p-value. Default is 0.025. |
alternative |
Specifies if |
type |
Type of recalibration. Can be either "golden" (default),
"nominal" (no recalibration), or "controlled". "golden" ensures that for an
original study just significant at the specified |
effectSizeReplicationSuccess
is the vectorized version of
the internal function .effectSizeReplicationSuccess_
.
Vectorize
is used to vectorize the function.
The minimum relative effect size to achieve replication success with the sceptical p-value.
Leonhard Held, Charlotte Micheloud, Samuel Pawel, Florian Gerber
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312
sampleSizeReplicationSuccess
, levelSceptical
po <- c(0.001, 0.002, 0.01, 0.02, 0.025) zo <- p2z(po, alternative = "one.sided") effectSizeReplicationSuccess(zo = zo, c = 1, level = 0.025, alternative = "one.sided", type = "golden") effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025, alternative = "one.sided", type = "golden") effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025, alternative = "one.sided", type = "controlled") effectSizeReplicationSuccess(zo = zo, c= 2, level = 0.025, alternative = "one.sided", type = "nominal") effectSizeReplicationSuccess(zo = zo, c = 2, level = 0.05, alternative = "two.sided", type = "nominal")
po <- c(0.001, 0.002, 0.01, 0.02, 0.025) zo <- p2z(po, alternative = "one.sided") effectSizeReplicationSuccess(zo = zo, c = 1, level = 0.025, alternative = "one.sided", type = "golden") effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025, alternative = "one.sided", type = "golden") effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025, alternative = "one.sided", type = "controlled") effectSizeReplicationSuccess(zo = zo, c= 2, level = 0.025, alternative = "one.sided", type = "nominal") effectSizeReplicationSuccess(zo = zo, c = 2, level = 0.05, alternative = "two.sided", type = "nominal")
The minimum relative effect size (replication to original) to achieve significance of the replication study is computed based on the result of the original study and the corresponding variance ratio.
effectSizeSignificance( zo, c = 1, level = 0.025, alternative = c("one.sided", "two.sided") )
effectSizeSignificance( zo, c = 1, level = 0.025, alternative = c("one.sided", "two.sided") )
zo |
Numeric vector of z-values from original studies. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
level |
Significance level. Default is 0.025. |
alternative |
Specifies if the significance level is "one.sided" (default) or "two.sided". If the significance level is one-sided, then effect size calculations are based on a one-sided assessment of significance in the direction of the original effect estimate. |
effectSizeSignificance
is the vectorized version of
the internal function .effectSizeSignificance_
.
Vectorize
is used to vectorize the function.
The minimum relative effect size to achieve significance in the replication study.
Charlotte Micheloud, Samuel Pawel, Florian Gerber
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
po <- c(0.001, 0.002, 0.01, 0.02, 0.025) zo <- p2z(po, alternative = "one.sided") effectSizeSignificance(zo = zo, c = 1, level = 0.025, alternative = "one.sided") effectSizeSignificance(zo = zo, c = 1, level = 0.05, alternative = "two.sided") effectSizeSignificance(zo = zo, c = 50, level = 0.025, alternative = "one.sided")
po <- c(0.001, 0.002, 0.01, 0.02, 0.025) zo <- p2z(po, alternative = "one.sided") effectSizeSignificance(zo = zo, c = 1, level = 0.025, alternative = "one.sided") effectSizeSignificance(zo = zo, c = 1, level = 0.05, alternative = "two.sided") effectSizeSignificance(zo = zo, c = 50, level = 0.025, alternative = "one.sided")
p-values and confidence intervals from the harmonic mean chi-squared test.
hMeanChiSq( z, w = rep(1, length(z)), alternative = c("greater", "less", "two.sided", "none"), bound = FALSE ) hMeanChiSqMu( thetahat, se, w = rep(1, length(thetahat)), mu = 0, alternative = c("greater", "less", "two.sided", "none"), bound = FALSE ) hMeanChiSqCI( thetahat, se, w = rep(1, length(thetahat)), alternative = c("two.sided", "greater", "less", "none"), conf.level = 0.95 )
hMeanChiSq( z, w = rep(1, length(z)), alternative = c("greater", "less", "two.sided", "none"), bound = FALSE ) hMeanChiSqMu( thetahat, se, w = rep(1, length(thetahat)), mu = 0, alternative = c("greater", "less", "two.sided", "none"), bound = FALSE ) hMeanChiSqCI( thetahat, se, w = rep(1, length(thetahat)), alternative = c("two.sided", "greater", "less", "none"), conf.level = 0.95 )
z |
Numeric vector of z-values. |
w |
Numeric vector of weights. |
alternative |
Either "greater" (default), "less", "two.sided", or "none". Specifies the alternative to be considered in the computation of the p-value. |
bound |
If |
thetahat |
Numeric vector of parameter estimates. |
se |
Numeric vector of standard errors. |
mu |
The null hypothesis value. Defaults to 0. |
conf.level |
Numeric vector specifying the conf.level of the confidence interval. Defaults to 0.95. summarize the gamma values, i.e., the local minima of the p-value function between the thetahats. Defaults is a vector of 1s. |
hMeanChiSq
: returns the p-values from the harmonic mean chi-squared test
based on the study-specific z-values.
hMeanChiSqMu
: returns the p-value from the harmonic mean chi-squared test
based on study-specific estimates and standard errors.
hMeanChiSqCI
: returns a list containing confidence interval(s)
obtained by inverting the harmonic mean chi-squared test based on study-specific
estimates and standard errors. The list contains:
CI |
Confidence interval(s). |
If the alternative
is "none", the list also contains:
gamma |
Local minima of the p-value function between the thetahats. |
Leonhard Held, Florian Gerber
Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410
## Example from Fisher (1999) as discussed in Held (2020) pvalues <- c(0.0245, 0.1305, 0.00025, 0.2575, 0.128) lower <- c(0.04, 0.21, 0.12, 0.07, 0.41) upper <- c(1.14, 1.54, 0.60, 3.75, 1.27) se <- ci2se(lower = lower, upper = upper, ratio = TRUE) thetahat <- ci2estimate(lower = lower, upper = upper, ratio = TRUE) ## hMeanChiSq() -------- hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), alternative = "less") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), alternative = "two.sided") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), alternative = "none") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), w = 1 / se^2, alternative = "less") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), w = 1 / se^2, alternative = "two.sided") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), w = 1 / se^2, alternative = "none") ## hMeanChiSqMu() -------- hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided") hMeanChiSqMu(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "two.sided") hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided", mu = -0.1) ## hMeanChiSqCI() -------- ## two-sided CI1 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "two.sided") CI2 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "two.sided", conf.level = 0.99875) ## one-sided CI1b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "less", conf.level = 0.975) CI2b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "less", conf.level = 1 - 0.025^2) ## confidence intervals on hazard ratio scale print(exp(CI1$CI), digits = 2) print(exp(CI2$CI), digits = 2) print(exp(CI1b$CI), digits = 2) print(exp(CI2b$CI), digits = 2) ## example with confidence region consisting of disjunct intervals thetahat2 <- c(-3.7, 2.1, 2.5) se2 <- c(1.5, 2.2, 3.1) conf.level <- 0.95; alpha <- 1 - conf.level muSeq <- seq(-7, 6, length.out = 1000) pValueSeq <- hMeanChiSqMu(thetahat = thetahat2, se = se2, alternative = "none", mu = muSeq) (hm <- hMeanChiSqCI(thetahat = thetahat2, se = se2, alternative = "none")) plot(x = muSeq, y = pValueSeq, type = "l", panel.first = grid(lty = 1), xlab = expression(mu), ylab = "p-value") abline(v = thetahat2, h = alpha, lty = 2) arrows(x0 = hm$CI[, 1], x1 = hm$CI[, 2], y0 = alpha, y1 = alpha, col = "darkgreen", lwd = 3, angle = 90, code = 3) points(hm$gamma, col = "red", pch = 19, cex = 2)
## Example from Fisher (1999) as discussed in Held (2020) pvalues <- c(0.0245, 0.1305, 0.00025, 0.2575, 0.128) lower <- c(0.04, 0.21, 0.12, 0.07, 0.41) upper <- c(1.14, 1.54, 0.60, 3.75, 1.27) se <- ci2se(lower = lower, upper = upper, ratio = TRUE) thetahat <- ci2estimate(lower = lower, upper = upper, ratio = TRUE) ## hMeanChiSq() -------- hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), alternative = "less") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), alternative = "two.sided") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), alternative = "none") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), w = 1 / se^2, alternative = "less") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), w = 1 / se^2, alternative = "two.sided") hMeanChiSq(z = p2z(p = pvalues, alternative = "less"), w = 1 / se^2, alternative = "none") ## hMeanChiSqMu() -------- hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided") hMeanChiSqMu(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "two.sided") hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided", mu = -0.1) ## hMeanChiSqCI() -------- ## two-sided CI1 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "two.sided") CI2 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "two.sided", conf.level = 0.99875) ## one-sided CI1b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "less", conf.level = 0.975) CI2b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2, alternative = "less", conf.level = 1 - 0.025^2) ## confidence intervals on hazard ratio scale print(exp(CI1$CI), digits = 2) print(exp(CI2$CI), digits = 2) print(exp(CI1b$CI), digits = 2) print(exp(CI2b$CI), digits = 2) ## example with confidence region consisting of disjunct intervals thetahat2 <- c(-3.7, 2.1, 2.5) se2 <- c(1.5, 2.2, 3.1) conf.level <- 0.95; alpha <- 1 - conf.level muSeq <- seq(-7, 6, length.out = 1000) pValueSeq <- hMeanChiSqMu(thetahat = thetahat2, se = se2, alternative = "none", mu = muSeq) (hm <- hMeanChiSqCI(thetahat = thetahat2, se = se2, alternative = "none")) plot(x = muSeq, y = pValueSeq, type = "l", panel.first = grid(lty = 1), xlab = expression(mu), ylab = "p-value") abline(v = thetahat2, h = alpha, lty = 2) arrows(x0 = hm$CI[, 1], x1 = hm$CI[, 2], y0 = alpha, y1 = alpha, col = "darkgreen", lwd = 3, angle = 90, code = 3) points(hm$gamma, col = "red", pch = 19, cex = 2)
The replication success level is computed based on the specified alternative and recalibration type.
levelSceptical( level, c = NA, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
levelSceptical( level, c = NA, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
level |
Threshold for the calibrated sceptical p-value. Default is 0.025. |
c |
The variance ratio. Only required when |
alternative |
Specifies if |
type |
Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration),
or "controlled". "golden" ensures that for an original study just significant at
the specified |
levelSceptical
is the vectorized version of
the internal function .levelSceptical_
.
Vectorize
is used to vectorize the function.
Replication success levels
Leonhard Held
Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493
Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics, 16, 706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312
levelSceptical(level = 0.025, alternative = "one.sided", type = "nominal") levelSceptical( level = 0.025, alternative = "one.sided", type = "controlled", c = 1 ) levelSceptical(level = 0.025, alternative = "one.sided", type = "golden")
levelSceptical(level = 0.025, alternative = "one.sided", type = "nominal") levelSceptical( level = 0.025, alternative = "one.sided", type = "controlled", c = 1 ) levelSceptical(level = 0.025, alternative = "one.sided", type = "golden")
pBox
computes Box's tail probabilities based on the z-values of the
original and the replication study, the corresponding variance ratio,
and the significance level.
pBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided")) zBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided"))
pBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided")) zBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided"))
zo |
Numeric vector of z-values from the original studies. |
zr |
Numeric vector of z-values from replication studies. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
level |
Numeric vector of significance levels. Default is 0.05. |
alternative |
Either "two.sided" (default) or "one.sided". Specifies whether two-sided or one-sided Box's tail probabilities are computed. |
pBox
quantifies the conflict between the sceptical prior
that would render the original study non-significant and the result
from the replication study. If the original study was not significant
at level level
, the sceptical prior does not exist and pBox
cannot be calculated.
pBox
returns Box's tail probabilities.
zBox
returns the z-values used in pBox
.
Leonhard Held
Box, G.E.P. (1980). Sampling and Bayes' inference in scientific modelling and robustness (with discussion). Journal of the Royal Statistical Society, Series A, 143, 383-430.
Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493
pBox(zo = p2z(0.01), zr = p2z(0.02), c = 2) pBox(zo = p2z(0.02), zr = p2z(0.01), c = 1/2) pBox(zo = p2z(0.02, alternative = "one.sided"), zr = p2z(0.01, alternative = "one.sided"), c = 1/2, alternative = "one.sided")
pBox(zo = p2z(0.01), zr = p2z(0.02), c = 2) pBox(zo = p2z(0.02), zr = p2z(0.01), c = 1/2) pBox(zo = p2z(0.02, alternative = "one.sided"), zr = p2z(0.01, alternative = "one.sided"), c = 1/2, alternative = "one.sided")
The combined p-value with Edgington's method is computed based on the one-sided p-values (or the corresponding the z-values) of the original and replication study, and the ratio of the weight of the replication study over the weight of the original study
pEdgington(zo = NULL, zr = NULL, po = NULL, pr = NULL, r = 1)
pEdgington(zo = NULL, zr = NULL, po = NULL, pr = NULL, r = 1)
zo |
A vector of z-values from original studies. |
zr |
A vector of z-values from replication studies. |
po |
A vector of one-sided original p-values. |
pr |
A vector of one-sided replication p-values. |
r |
Numeric vector of ratios of replication to original weight |
Either zo
and zr
, or po
and pr
, must be
specified.
Edgington's p-value
Charlotte Micheloud, Leonhard Held, Samuel Pawel
Held, L., Pawel, S., Micheloud, C. (2024). The assessment of replicability using the sum of p-values. Royal Society Open Science. 11(8):11240149. doi:10.1098/rsos.240149
## examples from paper pEdgington(po = 0.026, pr = 0.001) pEdgington(po = 0.024, pr = 0.024) ## using z-values pEdgington(zo = 1.91, zr = 1.95) ## using combination of z-value and p-value pEdgington(zo = 1.91, pr = 0.024)
## examples from paper pEdgington(po = 0.026, pr = 0.001) pEdgington(po = 0.024, pr = 0.024) ## using z-values pEdgington(zo = 1.91, zr = 1.95) ## using combination of z-value and p-value pEdgington(zo = 1.91, pr = 0.024)
Computes the p-value for intrinsic credibility
pIntrinsic( p = z2p(z, alternative = alternative), z = NULL, alternative = c("two.sided", "one.sided", "less", "greater"), type = c("Held", "Matthews") )
pIntrinsic( p = z2p(z, alternative = alternative), z = NULL, alternative = c("two.sided", "one.sided", "less", "greater"), type = c("Held", "Matthews") )
p |
numeric vector of p-values. |
z |
numeric vector of z-values. Default is |
alternative |
Either "two.sided" (default) or "one.sided". Specifies if the p-value is two-sided or one-sided. If the p-value is one-sided, then a one-sided p-value for intrinsic credibility is computed. |
type |
Type of intrinsic p-value. Default is "Held" as in Held (2019). The other option is "Matthews" as in Matthews (2018). |
p-values for intrinsic credibility.
Leonhard Held
Matthews, R. A. J. (2018). Beyond 'significance': principles and practice of the analysis of credibility. Royal Society Open Science, 5, 171047. doi:10.1098/rsos.171047
Held, L. (2019). The assessment of intrinsic credibility and a new argument for p < 0.005. Royal Society Open Science, 6, 181534. doi:10.1098/rsos.181534
p <- c(0.005, 0.01, 0.05) pIntrinsic(p = p) pIntrinsic(p = p, type = "Matthews") pIntrinsic(p = p, alternative = "one.sided") pIntrinsic(p = p, alternative = "one.sided", type = "Matthews") pIntrinsic(z = 2)
p <- c(0.005, 0.01, 0.05) pIntrinsic(p = p) pIntrinsic(p = p, type = "Matthews") pIntrinsic(p = p, alternative = "one.sided") pIntrinsic(p = p, alternative = "one.sided", type = "Matthews") pIntrinsic(z = 2)
The power with Edgington's method is computed based on the result of the original study (z-value or one-sided p-value), the corresponding variance ratio, and the ratio of the weight of the replication study over the weight of the original study
powerEdgington( zo = NULL, po = NULL, r = 1, c = 1, level = 0.025, designPrior = "conditional", shrinkage = 0 )
powerEdgington( zo = NULL, po = NULL, r = 1, c = 1, level = 0.025, designPrior = "conditional", shrinkage = 0 )
zo |
Numeric vector of z-values from original studies. |
po |
Numeric vector of original one-sided p-values |
r |
Numeric vector of ratios of replication to original weight. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
level |
One-sided significance level. Default is 0.025. |
designPrior |
Either "conditional" (default) or "predictive". |
shrinkage |
Numeric vector with values in [0,1). Defaults to 0.
Specifies the shrinkage of the original effect estimate towards zero, e.g.,
the effect is shrunken by a factor of 25% for |
Either zo
or po
must be specified.
The power with Edgington's method
Charlotte Micheloud, Leonhard Held, Samuel Pawel
Held, L., Pawel, S., Micheloud, C. (2024). The assessment of replicability using the sum of p-values. Royal Society Open Science. 11(8):11240149. doi:10.1098/rsos.240149
powerEdgington(po = 0.025, level = 0.025, c = 1.4)
powerEdgington(po = 0.025, level = 0.025, c = 1.4)
Computes the power for replication success with the sceptical p-value based on the result of the original study, the corresponding variance ratio, and the design prior.
powerReplicationSuccess( zo, c = 1, level = 0.025, designPrior = c("conditional", "predictive", "EB"), alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled"), shrinkage = 0, h = 0, strict = FALSE )
powerReplicationSuccess( zo, c = 1, level = 0.025, designPrior = c("conditional", "predictive", "EB"), alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled"), shrinkage = 0, h = 0, strict = FALSE )
zo |
Numeric vector of z-values from original studies. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
level |
Threshold for the calibrated sceptical p-value. Default is 0.025. |
designPrior |
Either "conditional" (default), "predictive", or "EB". If "EB", the power is computed under a predictive distribution, where the contribution of the original study is shrunken towards zero based on the evidence in the original study (with an empirical Bayes shrinkage estimator). |
alternative |
Specifies if |
type |
Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration),
or "controlled". "golden" ensures that for an original study just significant at
the specified |
shrinkage |
Numeric vector with values in [0,1). Defaults to 0.
Specifies the shrinkage of the original effect estimate towards zero,
e.g., the effect is shrunken by a factor of 25% for
|
h |
Numeric vector of relative heterogeneity variances i.e., the ratios
of the heterogeneity variance to the variance of the original effect
estimate. Default is 0 (no heterogeneity). Is only taken into account
when |
strict |
Logical vector indicating whether the probability for
replication success in the opposite direction of the original effect
estimate should also be taken into account. Default is |
powerReplicationSuccess
is the vectorized version of
the internal function .powerReplicationSuccess_
.
Vectorize
is used to vectorize the function.
The power for replication success with the sceptical p-value
Leonhard Held, Charlotte Micheloud, Samuel Pawel
Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312
sampleSizeReplicationSuccess
, pSceptical
,
levelSceptical
## larger sample size in replication (c > 1) powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "conditional") powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "predictive") ## smaller sample size in replication (c < 1) powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "conditional") powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "predictive") powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05, alternative = "two.sided", strict = TRUE, shrinkage = 0.9) powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05, alternative = "two.sided", strict = FALSE, shrinkage = 0.9)
## larger sample size in replication (c > 1) powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "conditional") powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "predictive") ## smaller sample size in replication (c < 1) powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "conditional") powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "predictive") powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05, alternative = "two.sided", strict = TRUE, shrinkage = 0.9) powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05, alternative = "two.sided", strict = FALSE, shrinkage = 0.9)
The power for significance is computed based on the result of the original study, the corresponding variance ratio, and the design prior.
powerSignificance( zo, c = 1, level = 0.025, designPrior = c("conditional", "predictive", "EB"), alternative = c("one.sided", "two.sided"), h = 0, shrinkage = 0, strict = FALSE )
powerSignificance( zo, c = 1, level = 0.025, designPrior = c("conditional", "predictive", "EB"), alternative = c("one.sided", "two.sided"), h = 0, shrinkage = 0, strict = FALSE )
zo |
Numeric vector of z-values from original studies. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
level |
Significance level. Default is 0.025. |
designPrior |
Either "conditional" (default), "predictive", or "EB". If "EB", the power is computed under a predictive distribution, where the contribution of the original study is shrunken towards zero based on the evidence in the original study (with an empirical Bayes shrinkage estimator). |
alternative |
Either "one.sided" (default) or "two.sided". Specifies if the significance level is one-sided or two-sided. If the significance level is one-sided, then power calculations are based on a one-sided assessment of significance in the direction of the original effect estimates. |
h |
The relative between-study heterogeneity, i.e., the ratio of the heterogeneity
variance to the variance of the original effect estimate.
Default is 0 (no heterogeneity).
Is only taken into account when |
shrinkage |
Numeric vector with values in [0,1). Defaults to 0.
Specifies the shrinkage of the original effect estimate towards zero, e.g.,
the effect is shrunken by a factor of 25% for |
strict |
Logical vector indicating whether the probability for significance
in the opposite direction of the original effect estimate should also be
taken into account. Default is |
powerSignificance
is the vectorized version of
the internal function .powerSignificance_
.
Vectorize
is used to vectorize the function.
The probability that a replication study yields a significant effect estimate in the specified direction.
Leonhard Held, Samuel Pawel, Charlotte Micheloud, Florian Gerber
Goodman, S. N. (1992). A comment on replication, p-values and evidence, Statistics in Medicine, 11, 875–879. doi:10.1002/sim.4780110705
Senn, S. (2002). Letter to the Editor, Statistics in Medicine, 21, 2437–2444.
Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493
Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Held, L. (2022). Power Calculations for Replication Studies. Statistical Science. 37:369-379. doi:10.1214/21-STS828
sampleSizeSignificance
,
powerSignificanceInterim
powerSignificance(zo = p2z(0.005), c = 2) powerSignificance(zo = p2z(0.005), c = 2, designPrior = "predictive") powerSignificance(zo = p2z(0.005), c = 2, alternative = "two.sided") powerSignificance(zo = -3, c = 2, designPrior = "predictive", alternative = "one.sided") powerSignificance(zo = p2z(0.005), c = 1/2) powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive") powerSignificance(zo = p2z(0.005), c = 1/2, alternative = "two.sided") powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive", alternative = "two.sided") powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive", alternative = "one.sided", h = 0.5, shrinkage = 0.5) powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "EB", alternative = "two.sided", h = 0.5) # power as function of original p-value po <- seq(0.0001, 0.06, 0.0001) plot(po, powerSignificance(zo = p2z(po), designPrior = "conditional"), type = "l", ylim = c(0, 1), lwd = 1.5, las = 1, ylab = "Power", xlab = expression(italic(p)[o])) lines(po, powerSignificance(zo = p2z(po), designPrior = "predictive"), lwd = 2, lty = 2) lines(po, powerSignificance(zo = p2z(po), designPrior = "EB"), lwd = 1.5, lty = 3) legend("topright", legend = c("conditional", "predictive", "EB"), title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n")
powerSignificance(zo = p2z(0.005), c = 2) powerSignificance(zo = p2z(0.005), c = 2, designPrior = "predictive") powerSignificance(zo = p2z(0.005), c = 2, alternative = "two.sided") powerSignificance(zo = -3, c = 2, designPrior = "predictive", alternative = "one.sided") powerSignificance(zo = p2z(0.005), c = 1/2) powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive") powerSignificance(zo = p2z(0.005), c = 1/2, alternative = "two.sided") powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive", alternative = "two.sided") powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive", alternative = "one.sided", h = 0.5, shrinkage = 0.5) powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "EB", alternative = "two.sided", h = 0.5) # power as function of original p-value po <- seq(0.0001, 0.06, 0.0001) plot(po, powerSignificance(zo = p2z(po), designPrior = "conditional"), type = "l", ylim = c(0, 1), lwd = 1.5, las = 1, ylab = "Power", xlab = expression(italic(p)[o])) lines(po, powerSignificance(zo = p2z(po), designPrior = "predictive"), lwd = 2, lty = 2) lines(po, powerSignificance(zo = p2z(po), designPrior = "EB"), lwd = 1.5, lty = 3) legend("topright", legend = c("conditional", "predictive", "EB"), title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n")
Computes the power of a replication study taking into account data from an interim analysis.
powerSignificanceInterim( zo, zi, c = 1, f = 1/2, level = 0.025, designPrior = c("conditional", "informed predictive", "predictive"), analysisPrior = c("flat", "original"), alternative = c("one.sided", "two.sided"), shrinkage = 0 )
powerSignificanceInterim( zo, zi, c = 1, f = 1/2, level = 0.025, designPrior = c("conditional", "informed predictive", "predictive"), analysisPrior = c("flat", "original"), alternative = c("one.sided", "two.sided"), shrinkage = 0 )
zo |
Numeric vector of z-values from original studies. |
zi |
Numeric vector of z-values from interim analyses of replication studies. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. Default is 1. |
f |
Fraction of the replication study already completed. Default is 0.5. |
level |
Significance level. Default is 0.025. |
designPrior |
Either "conditional" (default), "informed predictive", or "predictive". "informed predictive" refers to an informative normal prior coming from the original study. "predictive" refers to a flat prior. |
analysisPrior |
Either "flat" (default) or "original". |
alternative |
Either "one.sided" (default) or "two.sided". Specifies if the significance level is one-sided or two-sided. |
shrinkage |
Numeric vector with values in [0,1). Defaults to 0.
Specifies the shrinkage of the original effect estimate towards zero, e.g.,
the effect is shrunken by a factor of 25% for |
This is an extension of powerSignificance()
and adapts the ‘interim power’
from section 6.6.3 of Spiegelhalter et al. (2004) to the setting of replication studies.
powerSignificanceInterim
is the vectorized version of
.powerSignificanceInterim_
.
Vectorize
is used to vectorize the function.
The probability of statistical significance in the specified direction at the end of the replication study given the data collected so far in the replication study.
Charlotte Micheloud
Spiegelhalter, D. J., Abrams, K. R., and Myles, J. P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation, volume 13. John Wiley & Sons
Micheloud, C., Held, L. (2022). Power Calculations for Replication Studies. Statistical Science, 37, 369-379. doi:10.1214/21-STS828
sampleSizeSignificance
, powerSignificance
powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "conditional", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "informed predictive", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "predictive", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = -2, c = 1, f = 1/2, designPrior = "conditional", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "conditional", analysisPrior = "flat", shrinkage = 0.25)
powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "conditional", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "informed predictive", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "predictive", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = -2, c = 1, f = 1/2, designPrior = "conditional", analysisPrior = "flat") powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2, designPrior = "conditional", analysisPrior = "flat", shrinkage = 0.25)
The project power of the sceptical p-value is computed for a specified level, the relative variance, significance level and power for a standard significance test of the original study, and the alternative hypothesis.
PPpSceptical( level, c, alpha, power, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
PPpSceptical( level, c, alpha, power, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
level |
Threshold for the calibrated sceptical p-value. Default is 0.025. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
alpha |
Significance level for a standard significance test in the original study. Default is 0.025. |
power |
Power to detect the assumed effect with a standard significance test in the original study. |
alternative |
Specifies if |
type |
Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration), or "controlled". |
PPpSceptical
is the vectorized version of
the internal function .PPpSceptical_
.
Vectorize
is used to vectorize the function.
The project power of the sceptical p-value
Leonhard Held, Samuel Pawel
Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720.doi:10.1214/21-AOAS1502
Maca, J., Gallo, P., Branson, M., and Maurer, W. (2002). Reconsidering some aspects of the two-trials paradigm. Journal of Biopharmaceutical Statistics, 12, 107-119. doi:10.1081/bip-120006450
pSceptical
, levelSceptical
, T1EpSceptical
## compare project power for different recalibration types types <- c("nominal", "golden", "controlled") c <- seq(0.4, 5, by = 0.01) alpha <- 0.025 power <- 0.9 pp <- sapply(X = types, FUN = function(t) { PPpSceptical(type = t, c = c, alpha, power, alternative = "one.sided", level = 0.025) }) ## compute project power of 2 trials rule za <- qnorm(p = 1 - alpha) mu <- za + qnorm(p = power) pp2TR <- power * pnorm(q = za, mean = sqrt(c) * mu, lower.tail = FALSE) matplot(x = c, y = pp * 100, type = "l", lty = 1, lwd = 2, las = 1, log = "x", xlab = bquote(italic(c)), ylab = "Project power (%)", xlim = c(0.4, 5), ylim = c(0, 100)) lines(x = c, y = pp2TR * 100, col = length(types) + 1, lwd = 2) abline(v = 1, lty = 2) abline(h = 90, lty = 2, col = "lightgrey") legend("bottomright", legend = c(types, "2TR"), lty = 1, lwd = 2, col = seq(1, length(types) + 1))
## compare project power for different recalibration types types <- c("nominal", "golden", "controlled") c <- seq(0.4, 5, by = 0.01) alpha <- 0.025 power <- 0.9 pp <- sapply(X = types, FUN = function(t) { PPpSceptical(type = t, c = c, alpha, power, alternative = "one.sided", level = 0.025) }) ## compute project power of 2 trials rule za <- qnorm(p = 1 - alpha) mu <- za + qnorm(p = power) pp2TR <- power * pnorm(q = za, mean = sqrt(c) * mu, lower.tail = FALSE) matplot(x = c, y = pp * 100, type = "l", lty = 1, lwd = 2, las = 1, log = "x", xlab = bquote(italic(c)), ylab = "Project power (%)", xlim = c(0.4, 5), ylim = c(0, 100)) lines(x = c, y = pp2TR * 100, col = length(types) + 1, lwd = 2) abline(v = 1, lty = 2) abline(h = 90, lty = 2, col = "lightgrey") legend("bottomright", legend = c(types, "2TR"), lty = 1, lwd = 2, col = seq(1, length(types) + 1))
Computes a prediction interval for the effect estimate of the replication study.
predictionInterval( thetao, seo, ser, tau = 0, conf.level = 0.95, designPrior = "predictive" )
predictionInterval( thetao, seo, ser, tau = 0, conf.level = 0.95, designPrior = "predictive" )
thetao |
Numeric vector of effect estimates from original studies. |
seo |
Numeric vector of standard errors of the original effect estimates. |
ser |
Numeric vector of standard errors of the replication effect estimates. |
tau |
Between-study heterogeneity standard error.
Default is |
conf.level |
The confidence level of the prediction intervals. Default is 0.95. |
designPrior |
Either "predictive" (default), "conditional", or "EB". If "EB", the contribution of the original study to the predictive distribution is shrunken towards zero based on the evidence in the original study (with empirical Bayes). |
This function computes a prediction interval and a mean estimate under a
specified predictive distribution of the replication effect estimate. Setting
designPrior = "conditional"
is not recommended since this ignores the
uncertainty of the original effect estimate. See Patil, Peng, and Leek (2016)
and Pawel and Held (2020) for details.
predictionInterval
is the vectorized version of .predictionInterval_
.
Vectorize
is used to vectorize the function.
A data frame with the following columns
lower |
Lower limit of prediction interval, |
mean |
Mean of predictive distribution, |
upper |
Upper limit of prediction interval. |
Samuel Pawel
Patil, P., Peng, R. D., Leek, J. T. (2016). What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspectives on Psychological Science, 11, 539-544. doi:10.1177/1745691616646366
Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416
predictionInterval(thetao = c(1.5, 2, 5), seo = 1, ser = 0.5, designPrior = "EB") # compute prediction intervals for replication projects data("RProjects", package = "ReplicationSuccess") parOld <- par(mfrow = c(2, 2)) for (p in unique(RProjects$project)) { data_project <- subset(RProjects, project == p) PI <- predictionInterval(thetao = data_project$fiso, seo = data_project$se_fiso, ser = data_project$se_fisr) PI <- tanh(PI) # transforming back to correlation scale within <- (data_project$rr < PI$upper) & (data_project$rr > PI$lower) coverage <- mean(within) color <- ifelse(within == TRUE, "#333333B3", "#8B0000B3") study <- seq(1, nrow(data_project)) plot(data_project$rr, study, col = color, pch = 20, xlim = c(-0.5, 1), xlab = expression(italic(r)[r]), main = paste0(p, ": ", round(coverage*100, 1), "% coverage")) arrows(PI$lower, study, PI$upper, study, length = 0.02, angle = 90, code = 3, col = color) abline(v = 0, lty = 3) } par(parOld)
predictionInterval(thetao = c(1.5, 2, 5), seo = 1, ser = 0.5, designPrior = "EB") # compute prediction intervals for replication projects data("RProjects", package = "ReplicationSuccess") parOld <- par(mfrow = c(2, 2)) for (p in unique(RProjects$project)) { data_project <- subset(RProjects, project == p) PI <- predictionInterval(thetao = data_project$fiso, seo = data_project$se_fiso, ser = data_project$se_fisr) PI <- tanh(PI) # transforming back to correlation scale within <- (data_project$rr < PI$upper) & (data_project$rr > PI$lower) coverage <- mean(within) color <- ifelse(within == TRUE, "#333333B3", "#8B0000B3") study <- seq(1, nrow(data_project)) plot(data_project$rr, study, col = color, pch = 20, xlim = c(-0.5, 1), xlab = expression(italic(r)[r]), main = paste0(p, ": ", round(coverage*100, 1), "% coverage")) arrows(PI$lower, study, PI$upper, study, length = 0.02, angle = 90, code = 3, col = color) abline(v = 0, lty = 3) } par(parOld)
Computes the probability that a replication study yields an effect estimate in the same direction as in the original study.
pReplicate( po = NULL, zo = p2z(p = po, alternative = alternative), c, alternative = "two.sided" )
pReplicate( po = NULL, zo = p2z(p = po, alternative = alternative), c, alternative = "two.sided" )
po |
Numeric vector of p-values from the original study, default is |
zo |
Numeric vector of z-values from the original study.
Is calculated from |
c |
The ratio of the variances of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
alternative |
Either "two.sided" (default) or "one.sided". Specifies whether the p-value is two-sided or one-sided. |
This extends the statistic p_rep ("the probability of replicating an effect") by Killeen (2005) to the case of possibly unequal sample sizes, see also Senn (2002).
The probability that a replication study yields an effect estimate in the same direction as in the original study.
Leonhard Held
Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345–353. doi:10.1111/j.0956-7976.2005.01538.x
Senn, S. (2002). Letter to the Editor, Statistics in Medicine, 21, 2437–2444.
Held, L. (2019). The assessment of intrinsic credibility and a new argument for p < 0.005. Royal Society Open Science, 6, 181534. doi:10.1098/rsos.181534
pReplicate(po = c(0.05, 0.01, 0.001), c = 1) pReplicate(po = c(0.05, 0.01, 0.001), c = 2) pReplicate(po = c(0.05, 0.01, 0.001), c = 2, alternative = "one.sided") pReplicate(zo = c(2, 3, 4), c = 1)
pReplicate(po = c(0.05, 0.01, 0.001), c = 1) pReplicate(po = c(0.05, 0.01, 0.001), c = 2) pReplicate(po = c(0.05, 0.01, 0.001), c = 2, alternative = "one.sided") pReplicate(zo = c(2, 3, 4), c = 1)
Data from "High Replicability of Newly-Discovered Social-behavioral Findings is Achievable" by Protzko et al. (2020). The variables are as follows:
experiment
Experiment name
type
Type of study, either "original", "self-replication", or "external-replication"
lab
The lab which conducted the study, either 1, 2, 3, or 4.
smd
Standardized mean difference effect estimate
se
Standard error of standardized mean difference effect estimate
n
Total sample size of the study
data("protzko2020")
data("protzko2020")
A data frame with 80 rows and 6 variables
This data set originates from a prospective replication project involving four laboratories. Each of them conducted four original studies and for each original study a replication study was carried out within the same lab (self-replication) and by the other three labs (external-replication). Most studies used simple between-subject designs with two groups and a continuous outcome so that for each study, an estimate of the standardized mean difference (SMD) could be computed from the group means, group standard deviations, and group sample sizes. For studies with covariate adjustment and/or binary outcomes, effect size transformations as described in the supplementary material of Protzko (2020) were used to obtain effect estimates and standard errors on SMD scale. The data set is licensed under a CC-By Attribution 4.0 International license, see https://creativecommons.org/licenses/by/4.0/ for the terms of reuse.
The relevant files were downloaded from https://osf.io/42ef9/ on January 24, 2022. The R markdown script "Decline effects main analysis.Rmd" was executed and the relevant variables from the objects "ES_experiments" and "decline_effects" were saved.
Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., ... Schooler, J. (2020, September 10). High Replicability of Newly-Discovered Social-behavioral Findings is Achievable. doi:10.31234/osf.io/n2a9x
Protzko, J., Berent, M., Buttrick, N., DeBell, M., Roeder, S. S., Walleczek, J., ... Nosek, B. A. (2021, January 5). Results & Data. Retrieved from https://osf.io/42ef9/
data("protzko2020", package = "ReplicationSuccess") ## forestplots of effect estimates graphics.off() parOld <- par(mar = c(5, 8, 4, 2), mfrow = c(4, 4)) experiments <- unique(protzko2020$experiment) for (ex in experiments) { ## compute CIs dat <- subset(protzko2020, experiment == ex) za <- qnorm(p = 0.975) plotDF <- data.frame(lower = dat$smd - za*dat$se, est = dat$smd, upper = dat$smd + za*dat$se) colpalette <- c("#000000", "#1B9E77", "#D95F02") cols <- colpalette[dat$type] yseq <- seq(1, nrow(dat)) ## forestplot plot(x = plotDF$est, y = yseq, xlim = c(-0.15, 0.8), ylim = c(0.8*min(yseq), 1.05*max(yseq)), type = "n", yaxt = "n", xlab = "Effect estimate (SMD)", ylab = "") abline(v = 0, col = "#0000004D") arrows(x0 = plotDF$lower, x1 = plotDF$upper, y0 = yseq, angle = 90, code = 3, length = 0.05, col = cols) points(y = yseq, x = plotDF$est, pch = 20, lwd = 2, col = cols) axis(side = 2, at = yseq, las = 1, labels = dat$type, cex.axis = 0.85) title(main = ex) } par(parOld)
data("protzko2020", package = "ReplicationSuccess") ## forestplots of effect estimates graphics.off() parOld <- par(mar = c(5, 8, 4, 2), mfrow = c(4, 4)) experiments <- unique(protzko2020$experiment) for (ex in experiments) { ## compute CIs dat <- subset(protzko2020, experiment == ex) za <- qnorm(p = 0.975) plotDF <- data.frame(lower = dat$smd - za*dat$se, est = dat$smd, upper = dat$smd + za*dat$se) colpalette <- c("#000000", "#1B9E77", "#D95F02") cols <- colpalette[dat$type] yseq <- seq(1, nrow(dat)) ## forestplot plot(x = plotDF$est, y = yseq, xlim = c(-0.15, 0.8), ylim = c(0.8*min(yseq), 1.05*max(yseq)), type = "n", yaxt = "n", xlab = "Effect estimate (SMD)", ylab = "") abline(v = 0, col = "#0000004D") arrows(x0 = plotDF$lower, x1 = plotDF$upper, y0 = yseq, angle = 90, code = 3, length = 0.05, col = cols) points(y = yseq, x = plotDF$est, pch = 20, lwd = 2, col = cols) axis(side = 2, at = yseq, las = 1, labels = dat$type, cex.axis = 0.85) title(main = ex) } par(parOld)
Computes sceptical p-values and z-values based on the z-values of the original and the replication study and the corresponding variance ratios. If specified, the sceptical p-values are recalibrated.
pSceptical( zo, zr, c, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") ) zSceptical(zo, zr, c)
pSceptical( zo, zr, c, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") ) zSceptical(zo, zr, c)
zo |
Numeric vector of z-values from original studies. |
zr |
Numeric vector of z-values from replication studies. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
alternative |
Either "one.sided" (default) or "two.sided". If "one.sided", the sceptical p-value is based on a one-sided assessment of replication success in the direction of the original effect estimate. If "two.sided", the sceptical p-value is based on a two-sided assessment of replication success regardless of the direction of the original and replication effect estimate. |
type |
Type of recalibration. Can be either "golden" (default),
"nominal", or "controlled". Setting |
pSceptical
is the vectorized version of
the internal function .pSceptical_
.
Vectorize
is used to vectorize the function.
pSceptical
returns the sceptical p-value.
zSceptical
returns the z-value of the sceptical p-value.
Leonhard Held
Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312
sampleSizeReplicationSuccess
,
powerReplicationSuccess
, levelSceptical
## no recalibration (type = "nominal") as in Held (2020) pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided", type = "nominal") ## recalibration with golden level as in Held, Micheloud, Pawel (2020) pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided", type = "golden") ## two-sided p-values 0.01 and 0.02, relative sample size 2 pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided") ## reverse the studies pSceptical( zo = p2z(0.02), zr = p2z(0.01), c = 1/2, alternative = "one.sided" ) ## both p-values 0.01, relative sample size 2 pSceptical(zo = p2z(0.01), zr = p2z(0.01), c = 2, alternative = "two.sided") zSceptical(zo = 2, zr = 3, c = 2) zSceptical(zo = 3, zr = 2, c = 2)
## no recalibration (type = "nominal") as in Held (2020) pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided", type = "nominal") ## recalibration with golden level as in Held, Micheloud, Pawel (2020) pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided", type = "golden") ## two-sided p-values 0.01 and 0.02, relative sample size 2 pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided") ## reverse the studies pSceptical( zo = p2z(0.02), zr = p2z(0.01), c = 1/2, alternative = "one.sided" ) ## both p-values 0.01, relative sample size 2 pSceptical(zo = p2z(0.01), zr = p2z(0.01), c = 2, alternative = "two.sided") zSceptical(zo = 2, zr = 3, c = 2) zSceptical(zo = 3, zr = 2, c = 2)
Necessary or sufficient bounds for significance of the harmonic mean chi-squared test are computed for n one-sided p-values.
pvalueBound(alpha, n, type = c("necessary", "sufficient"))
pvalueBound(alpha, n, type = c("necessary", "sufficient"))
alpha |
Numeric vector specifying the significance level. |
n |
The number of p-values. |
type |
Either "necessary" (default) or "sufficient". If "necessary", the necessary bounds are computed. If "sufficient", the sufficient bounds are computed. |
The bound for the p-values.
Leonhard Held
Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410
pvalueBound(alpha = 0.025^2, n = 2, type = "necessary") pvalueBound(alpha = 0.025^2, n = 2, type = "sufficient")
pvalueBound(alpha = 0.025^2, n = 2, type = "necessary") pvalueBound(alpha = 0.025^2, n = 2, type = "sufficient")
Computes p-value from meta-analytic Q-test to assess compatibility between original and replication effect estimate.
Qtest(thetao, thetar, seo, ser)
Qtest(thetao, thetar, seo, ser)
thetao |
Numeric vector of effect estimates from original studies. |
thetar |
Numeric vector of effect estimates from replication studies. |
seo |
Numeric vector of standard errors of the original effect estimates. |
ser |
Numeric vector of standard errors of the replication effect estimates. |
This function computes the p-value from a meta-analytic Q-test assessing compatibility between original and replication effect estimate. Rejecting compatibility when the p-value is smaller than alpha is equivalent with rejecting compatibility based on a (1 - alpha) prediction interval.
p-value from Q-test.
Samuel Pawel
Hedges, L. V., Schauer, J. M. (2019). More Than One Replication Study Is Needed for Unambiguous Tests of Replication. Journal of Educational and Behavioral Statistics, 44, 543-570. doi:10.3102/1076998619852953
Qtest(thetao = 2, thetar = 0.5, seo = 1, ser = 0.5)
Qtest(thetao = 2, thetar = 0.5, seo = 1, ser = 0.5)
Data from Reproduciblity Project Psychology (RPP), Experimental Economics Replication Project (EERP), Social Sciences Replication Project (SSRP), Experimental Philosophy Replicability Project (EPRP). The variables are as follows:
study
Study identifier, usually names of authors from original study
project
Name of replication project
ro
Effect estimate of original study on correlation scale
rr
Effect estimate of replication study on correlation scale
fiso
Effect estimate of original study transformed to Fisher-z scale
fisr
Effect estimate of replication study transformed to Fisher-z scale
se_fiso
Standard error of Fisher-z transformed effect estimate of original study
se_fisr
Standard error of Fisher-z transformed effect estimate of replication study
po
Two-sided p-value from significance test of effect estimate from original study
pr
Two-sided p-value from significance test of effect estimate from replication study
po1
One-sided p-value from significance test of effect estimate from original study (in the direction of the original effect estimate)
pr1
One-sided p-value from significance test of effect estimate from replication study (in the direction of the original effect estimate)
pm_belief
Peer belief about whether replication effect estimate will achieve statistical significance elicited through prediction market (only available for EERP and SSRP)
no
Sample size in original study
nr
Sample size in replication study
data(RProjects)
data(RProjects)
A data frame with 143 rows and 15 variables
Two-sided p-values were calculated assuming normality of Fisher-z transformed effect estimates. From the RPP only the meta-analytic subset is included, which consists of 73 out of 100 study pairs for which the standard error of the z-transformed correlation coefficient can be computed. For the RPP sample sizes were recalculated from the reported standard errors of Fisher z-transformed correlation coefficients. From the EPRP only 31 out of 40 study pairs are included where effective sample size for original and replication study are available simultaneously. For more details about how the the data was preprocessed see source below and supplement S1 of Pawel and Held (2020).
RPP: The source files were downloaded from https://github.com/CenterForOpenScience/rpp/. The "masterscript.R" file was executed and the relevant variables were extracted from the generated "final" object (standard errors of Fisher-z transformed correlations) and "MASTER" object (everything else). The data set is licensed under a CC0 1.0 Universal license, see https://creativecommons.org/publicdomain/zero/1.0/ for the terms of reuse.
EERP: The source files were downloaded from https://osf.io/pnwuz/. The required data were then manually extracted from the code in the files "effectdata.py" (sample sizes) and "create_studydetails.do" (everything else). Data regarding the prediction market and survey beliefs were manually extracted from table S3 of the supplementary materials of the EERP. The authors of this R package have been granted permission to share this data set by the coordinators of the EERP.
SSRP: The relevant variables were extracted from the file "D3 - ReplicationResults.csv" downloaded from https://osf.io/abu7k. For replications which underwent only the first stage, the data from the first stage were taken as the data for the replication study. For the replications which reached the second stage, the pooled data from both stages were taken as the data for the replication study. Data regarding survey and prediction market beliefs were extracted from the "D6 - MeanPeerBeliefs.csv" file, which was downloaded from https://osf.io/vr6p8/. The data set is licensed under a CC0 1.0 Universal license, see https://creativecommons.org/publicdomain/zero/1.0/ for the terms of reuse.
EPRP: Data were taken from the "XPhiReplicability_CompleteData.csv" file, which was downloaded from https://osf.io/4ewkh/. The authors of this R package have been granted permission to share this data set by the coordinators of the EPRP.
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., ... Hang, W. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351, 1433-1436. doi:10.1126/science.aaf0918
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., ... Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2, 637-644. doi:10.1038/s41562-018-0399-z
Cova, F., Strickland, B., Abatista, A., Allard, A., Andow, J., Attie, M., ... Zhou, X. (2018). Estimating the reproducibility of experimental philosophy. Review of Philosophy and Psychology. doi:10.1007/s13164-018-0400-9
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. doi:10.1126/science.aac4716
Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416
data("RProjects", package = "ReplicationSuccess") ## Computing key quantities RProjects$zo <- RProjects$fiso/RProjects$se_fiso RProjects$zr <- RProjects$fisr/RProjects$se_fisr RProjects$c <- RProjects$se_fiso^2/RProjects$se_fisr^2 ## Computing one-sided p-values for alternative = "greater" RProjects$po1 <- z2p(z = RProjects$zo, alternative = "greater") RProjects$pr1 <- z2p(z = RProjects$zr, alternative = "greater") ## Plots of effect estimates parOld <- par(mfrow = c(2, 2)) for (p in unique(RProjects$project)) { data_project <- subset(RProjects, project == p) plot(rr ~ ro, data = data_project, ylim = c(-0.5, 1), xlim = c(-0.5, 1), main = p, xlab = expression(italic(r)[o]), ylab = expression(italic(r)[r])) abline(h = 0, lty = 2) abline(a = 0, b = 1, col = "grey") } par(parOld) ## Plots of peer beliefs RProjects$significant <- factor(RProjects$pr < 0.05, levels = c(FALSE, TRUE), labels = c("no", "yes")) parOld <- par(mfrow = c(1, 2)) for (p in c("Experimental Economics", "Social Sciences")) { data_project <- subset(RProjects, project == p) boxplot(pm_belief ~ significant, data = data_project, ylim = c(0, 1), main = p, xlab = "Replication effect significant", ylab = "Peer belief") stripchart(pm_belief ~ significant, data = data_project, vertical = TRUE, add = TRUE, pch = 1, method = "jitter") } par(parOld) ## Computing the sceptical p-value ps <- with(RProjects, pSceptical(zo = fiso/se_fiso, zr = fisr/se_fisr, c = se_fiso^2/se_fisr^2))
data("RProjects", package = "ReplicationSuccess") ## Computing key quantities RProjects$zo <- RProjects$fiso/RProjects$se_fiso RProjects$zr <- RProjects$fisr/RProjects$se_fisr RProjects$c <- RProjects$se_fiso^2/RProjects$se_fisr^2 ## Computing one-sided p-values for alternative = "greater" RProjects$po1 <- z2p(z = RProjects$zo, alternative = "greater") RProjects$pr1 <- z2p(z = RProjects$zr, alternative = "greater") ## Plots of effect estimates parOld <- par(mfrow = c(2, 2)) for (p in unique(RProjects$project)) { data_project <- subset(RProjects, project == p) plot(rr ~ ro, data = data_project, ylim = c(-0.5, 1), xlim = c(-0.5, 1), main = p, xlab = expression(italic(r)[o]), ylab = expression(italic(r)[r])) abline(h = 0, lty = 2) abline(a = 0, b = 1, col = "grey") } par(parOld) ## Plots of peer beliefs RProjects$significant <- factor(RProjects$pr < 0.05, levels = c(FALSE, TRUE), labels = c("no", "yes")) parOld <- par(mfrow = c(1, 2)) for (p in c("Experimental Economics", "Social Sciences")) { data_project <- subset(RProjects, project == p) boxplot(pm_belief ~ significant, data = data_project, ylim = c(0, 1), main = p, xlab = "Replication effect significant", ylab = "Peer belief") stripchart(pm_belief ~ significant, data = data_project, vertical = TRUE, add = TRUE, pch = 1, method = "jitter") } par(parOld) ## Computing the sceptical p-value ps <- with(RProjects, pSceptical(zo = fiso/se_fiso, zr = fisr/se_fisr, c = se_fiso^2/se_fisr^2))
The relative sample size to achieve replication success with Edgington's method is computed based on the z-value (or one-sided p-value) of the original study, the significance level, the ratio of the weight of the replication study over the weight of the original study, the design prior and the power.
sampleSizeEdgington( zo = NULL, po = NULL, r = 1, power, level = 0.025, designPrior = "conditional", shrinkage = 0 )
sampleSizeEdgington( zo = NULL, po = NULL, r = 1, power, level = 0.025, designPrior = "conditional", shrinkage = 0 )
zo |
Numeric vector of z-values from original studies. |
po |
Numeric vector of original one-sided p-values |
r |
Numeric vector of ratios of replication to original weight. |
power |
Power to achieve replication success. |
level |
One-sided significance level. Default is 0.025. |
designPrior |
Either "conditional" (default) or "predictive". |
shrinkage |
Numeric vector with values in [0,1). Defaults to 0.
Specifies the shrinkage of the original effect estimate towards zero,
e.g., the effect is shrunken by a factor of 25% for |
Either zo
or po
must be specified.
The relative sample size to achieve replication success with
Edgington's method. If impossible to achieve the desired power for
specified inputs NaN
is returned.
Charlotte Micheloud, Leonhard Held, Samuel Pawel
Held, L., Pawel, S., Micheloud, C. (2024). The assessment of replicability using the sum of p-values. Royal Society Open Science. 11(8):11240149. doi:10.1098/rsos.240149
## partially recreate Figure 5 from paper poseq <- exp(seq(log(0.00001), log(0.025), length.out = 100)) cseq <- sampleSizeEdgington(po = poseq, power = 0.8) cseqSig <- sampleSizeSignificance(zo = p2z(p = poseq, alternative = "one.sided"), power = 0.8) plot(poseq, cseq/cseqSig, log = "x", xlim = c(0.00001, 0.035), ylim = c(0.9, 1.3), type = "l", las = 1, xlab = "Original p-value", ylab = "Sample size ratio")
## partially recreate Figure 5 from paper poseq <- exp(seq(log(0.00001), log(0.025), length.out = 100)) cseq <- sampleSizeEdgington(po = poseq, power = 0.8) cseqSig <- sampleSizeSignificance(zo = p2z(p = poseq, alternative = "one.sided"), power = 0.8) plot(poseq, cseq/cseqSig, log = "x", xlim = c(0.00001, 0.035), ylim = c(0.9, 1.3), type = "l", las = 1, xlab = "Original p-value", ylab = "Sample size ratio")
The relative sample size to achieve replication success is computed based on the z-value of the original study, the type of recalibration, the power and the design prior.
sampleSizeReplicationSuccess( zo, power = NA, level = 0.025, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled"), designPrior = c("conditional", "predictive", "EB"), shrinkage = 0, h = 0 )
sampleSizeReplicationSuccess( zo, power = NA, level = 0.025, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled"), designPrior = c("conditional", "predictive", "EB"), shrinkage = 0, h = 0 )
zo |
Numeric vector of z-values from original studies. |
power |
The power to achieve replication success. |
level |
Threshold for the calibrated sceptical p-value. Default is 0.025. |
alternative |
Specifies if |
type |
Type of recalibration. Can be either "golden" (default),
"nominal" (no recalibration), or "controlled". "golden" ensures that for
an original study just significant at the specified |
designPrior |
Is only taken into account when |
shrinkage |
Is only taken into account when |
h |
Is only taken into account when |
sampleSizeReplicationSuccess
is the vectorized version of
the internal function .sampleSizeReplicationSuccess_
.
Vectorize
is used to vectorize the function.
The relative sample size for replication success. If impossible to
achieve the desired power for specified inputs NaN
is returned.
Leonhard Held, Charlotte Micheloud, Samuel Pawel, Florian Gerber
Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312
pSceptical
, powerReplicationSuccess
,
levelSceptical
## based on power sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025, type = "golden") sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025, type = "golden", designPrior = "predictive")
## based on power sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025, type = "golden") sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025, type = "golden", designPrior = "predictive")
The relative sample size to achieve significance of the replication study is computed based on the z-value of the original study, the significance level and the power.
sampleSizeSignificance( zo, power = NA, level = 0.025, alternative = c("one.sided", "two.sided"), designPrior = c("conditional", "predictive", "EB"), h = 0, shrinkage = 0 )
sampleSizeSignificance( zo, power = NA, level = 0.025, alternative = c("one.sided", "two.sided"), designPrior = c("conditional", "predictive", "EB"), h = 0, shrinkage = 0 )
zo |
A vector of z-values from original studies. |
power |
The power to achieve replication success. |
level |
Significance level. Default is 0.025. |
alternative |
Either "one.sided" (default) or "two.sided". Specifies if the significance level is one-sided or two-sided. If the significance level is one-sided, then sample size calculations are based on a one-sided assessment of significance in the direction of the original effect estimate. |
designPrior |
Is only taken into account when |
h |
Is only taken into account when |
shrinkage |
Is only taken into account when |
sampleSizeSignificance
is the vectorized version of
.sampleSizeSignificance_
. Vectorize
is used to
vectorize the function.
The relative sample size to achieve significance in the specified
direction. If impossible to achieve the desired power for specified
inputs NaN
is returned.
Leonhard Held, Samuel Pawel, Charlotte Micheloud, Florian Gerber
Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493
Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Held, L. (2022). Power Calculations for Replication Studies. Statistical Science. 37:369-379. doi:10.1214/21-STS828
sampleSizeSignificance(zo = p2z(0.005), power = 0.8) sampleSizeSignificance(zo = p2z(0.005, alternative = "two.sided"), power = 0.8) sampleSizeSignificance(zo = p2z(0.005), power = 0.8, designPrior = "predictive") sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "predictive", shrinkage = 0.5, h = 0.25) sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "EB", h = 0.5) # sample size to achieve 0.8 power as function of original p-value zo <- p2z(seq(0.0001, 0.05, 0.0001)) oldPar <- par(mfrow = c(1,2)) plot(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "conditional", power = 0.8), type = "l", ylim = c(0.5, 10), log = "y", lwd = 1.5, ylab = "Relative sample size", xlab = expression(italic(p)[o]), las = 1) lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "predictive", power = 0.8), lwd = 2, lty = 2) lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "EB", power = 0.8), lwd = 1.5, lty = 3) legend("topleft", legend = c("conditional", "predictive", "EB"), title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n") par(oldPar)
sampleSizeSignificance(zo = p2z(0.005), power = 0.8) sampleSizeSignificance(zo = p2z(0.005, alternative = "two.sided"), power = 0.8) sampleSizeSignificance(zo = p2z(0.005), power = 0.8, designPrior = "predictive") sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "predictive", shrinkage = 0.5, h = 0.25) sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "EB", h = 0.5) # sample size to achieve 0.8 power as function of original p-value zo <- p2z(seq(0.0001, 0.05, 0.0001)) oldPar <- par(mfrow = c(1,2)) plot(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "conditional", power = 0.8), type = "l", ylim = c(0.5, 10), log = "y", lwd = 1.5, ylab = "Relative sample size", xlab = expression(italic(p)[o]), las = 1) lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "predictive", power = 0.8), lwd = 2, lty = 2) lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "EB", power = 0.8), lwd = 1.5, lty = 3) legend("topleft", legend = c("conditional", "predictive", "EB"), title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n") par(oldPar)
Data from the Social Sciences Replication Project (SSRP) including the details of the interim analysis. The variables are as follows:
study
Study identifier, usually names of authors from original study
ro
Effect estimate of original study on correlation scale
ri
Effect estimate of replication study at the interim analysis on correlation scale
rr
Effect estimate of replication study at the final analysis on correlation scale
fiso
Effect estimate of original study transformed to Fisher-z scale
fisi
Effect estimate of replication study at the interim analysis transformed to Fisher-z scale
fisr
Effect estimate of replication study at the final analysis transformed to Fisher-z scale
se_fiso
Standard error of Fisher-z transformed effect estimate of original study
se_fisi
Standard error of Fisher-z transformed effect estimate of replication study at the interim analysis
se_fisr
Standard error of Fisher-z transformed effect estimate of replication study at the final analysis
no
Sample size in original study
ni
Sample size in replication study at the interim analysis
nr
Sample size in replication study at the final analysis
po
Two-sided p-value from significance test of effect estimate from original study
pi
Two-sided p-value from significance test of effect estimate from replication study at the interim analysis
pr
Two-sided p-value from significance test of effect estimate from replication study at the final analysis
n75
Sample size calculated to have 90% power in replication study to detect 75% of the original effect size (expressed as the correlation coefficient r)
n50
Sample size calculated to have 90% power in replication study to detect 50% of the original effect size (expressed as the correlation coefficient r)
data(SSRP)
data(SSRP)
A data frame with 21 rows and 18 variables
Two-sided p-values were calculated assuming normality of Fisher-z
transformed effect estimates.A two-stage procedure was used for the
replications. In stage 1, the authors had 90% power to detect 75% of
the original effect size at the 5% significance level in a two-sided
test. If the original result replicated in stage 1 (two-sided P-value <
0.05 and effect in the same direction as in the original study), the data
collection was stopped. If not, a second data collection was carried out
in stage 2 to have 90% power to detect 50% of the original effect size
for the first and the second data collections pooled. n75
and
n50
are the planned sample sizes calculated to reach 90% power in
stage 1 and 2, respectively. They sometimes differ from the sample sizes
that were actually collected (ni
and nr
, respectively). See
supplementary information of Camerer et al. (2018) for details.
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., ... Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2, 637-644. doi:10.1038/s41562-018-0399-z
# plot of the sample sizes plot(ni ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400), xlab = expression(n[o]), ylab = expression(n[i])) abline(a = 0, b = 1, col = "grey") plot(nr ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400), xlab = expression(n[o]), ylab = expression(n[r])) abline(a = 0, b = 1, col = "grey")
# plot of the sample sizes plot(ni ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400), xlab = expression(n[o]), ylab = expression(n[i])) abline(a = 0, b = 1, col = "grey") plot(nr ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400), xlab = expression(n[o]), ylab = expression(n[r])) abline(a = 0, b = 1, col = "grey")
The overall type-I error rate of the sceptical p-value is computed for a specified level, the relative variance, and the alternative hypothesis.
T1EpSceptical( level, c, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
T1EpSceptical( level, c, alternative = c("one.sided", "two.sided"), type = c("golden", "nominal", "controlled") )
level |
Threshold for the calibrated sceptical p-value. Default is 0.025. |
c |
Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. |
alternative |
Specifies if |
type |
Type of recalibration. Recalibration type can be either "golden" (default), "nominal" (no recalibration), or "controlled". |
T1EpSceptical
is the vectorized version of
the internal function .T1EpSceptical_
.
Vectorize
is used to vectorize the function.
The overall type-I error rate.
Leonhard Held, Samuel Pawel
Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410
Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502
Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312
pSceptical
, levelSceptical
,
PPpSceptical
## compare type-I error rate for different recalibration types types <- c("nominal", "golden", "controlled") c <- seq(0.2, 5, by = 0.05) t1 <- sapply(X = types, FUN = function(t) { T1EpSceptical(type = t, c = c, alternative = "one.sided", level = 0.025) }) matplot( x = c, y = t1*100, type = "l", lty = 1, lwd = 2, las = 1, log = "x", xlab = bquote(italic(c)), ylab = "Type-I error (%)", xlim = c(0.2, 5) ) legend("topright", legend = types, lty = 1, lwd = 2, col = seq_along(types))
## compare type-I error rate for different recalibration types types <- c("nominal", "golden", "controlled") c <- seq(0.2, 5, by = 0.05) t1 <- sapply(X = types, FUN = function(t) { T1EpSceptical(type = t, c = c, alternative = "one.sided", level = 0.025) }) matplot( x = c, y = t1*100, type = "l", lty = 1, lwd = 2, las = 1, log = "x", xlab = bquote(italic(c)), ylab = "Type-I error (%)", xlim = c(0.2, 5) ) legend("topright", legend = types, lty = 1, lwd = 2, col = seq_along(types))
Computes the p-value threshold for intrinsic credibility
thresholdIntrinsic( alpha, alternative = c("two.sided", "one.sided"), type = c("Held", "Matthews") )
thresholdIntrinsic( alpha, alternative = c("two.sided", "one.sided"), type = c("Held", "Matthews") )
alpha |
Numeric vector of intrinsic credibility levels. |
alternative |
Either "two.sided" (default) or "one.sided". Specifies if the threshold is for one-sided or two-sided p-values. |
type |
Either "Held" (default) or "Matthews". Type of intrinsic p-value threshold, see Held (2019) and Matthews (2018) for more information. |
The threshold for intrinsic credibility.
Leonhard Held
Matthews, R. A. J. (2018). Beyond 'significance': principles and practice of the analysis of credibility. Royal Society Open Science, 5, 171047. doi:10.1098/rsos.171047
Held, L. (2019). The assessment of intrinsic credibility and a new argument for p < 0.005. Royal Society Open Science, 6, 181534. doi:10.1098/rsos.181534
thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05)) thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05), alternative = "one.sided")
thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05)) thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05), alternative = "one.sided")