Package 'ReplicationSuccess' reference manual

Title:	Design and Analysis of Replication Studies
Description:	Provides utilities for the design and analysis of replication studies. Features both traditional methods based on statistical significance and more recent methods such as the sceptical p-value; Held L. (2020) <doi:10.1111/rssa.12493>, Held et al. (2022) <doi:10.1214/21-AOAS1502>, Micheloud et al. (2023) <doi:10.1111/stan.12312>. Also provides related methods including the harmonic mean chi-squared test; Held, L. (2020) <doi:10.1111/rssc.12410>, and intrinsic credibility; Held, L. (2019) <doi:10.1098/rsos.181534>. Contains datasets from five large-scale replication projects.
Authors:	Leonhard Held [aut] , Samuel Pawel [cre] , Charlotte Micheloud [aut] , Florian Gerber [aut] , Felix Hofmann [aut]
Maintainer:	Samuel Pawel <[email protected]>
License:	GPL (>=2)
Version:	1.3.3
Built:	2025-03-21 05:19:49 UTC
Source:	https://github.com/crsuzh/replicationsuccess

Convert between estimates, z-values, p-values, and confidence intervals

Description

Convert between estimates, z-values, p-values, and confidence intervals

Usage

ci2se(lower, upper, conf.level = 0.95, ratio = FALSE)

ci2estimate(lower, upper, ratio = FALSE, antilog = FALSE)

ci2z(lower, upper, conf.level = 0.95, ratio = FALSE)

ci2p(lower, upper, conf.level = 0.95, ratio = FALSE, alternative = "two.sided")

z2p(z, alternative = "two.sided")

p2z(p, alternative = "two.sided")
ci2se(lower, upper, conf.level = 0.95, ratio = FALSE)

ci2estimate(lower, upper, ratio = FALSE, antilog = FALSE)

ci2z(lower, upper, conf.level = 0.95, ratio = FALSE)

ci2p(lower, upper, conf.level = 0.95, ratio = FALSE, alternative = "two.sided")

z2p(z, alternative = "two.sided")

p2z(p, alternative = "two.sided")

Arguments

`lower`	Numeric vector of lower confidence interval bounds.
`upper`	Numeric vector of upper confidence interval bounds.
`conf.level`	The confidence level of the confidence intervals. Default is 0.95.
`ratio`	Indicates whether the confidence interval is for a ratio, e.g. an odds ratio, relative risk or hazard ratio. If `TRUE`, the standard error of the log ratio is computed. Defaults to `FALSE`.
`antilog`	Indicates whether the estimate is reported on the ratio scale. Only applies if `ratio = TRUE`. Defaults to `FALSE`.
`alternative`	Direction of the alternative of the p-value. Either "two.sided" (default), "one.sided", "less", or "greater". If "one.sided" or "two.sided" is specified, the z-value is assumed to be positive.
`z`	Numeric vector of z-values.
`p`	Numeric vector of p-values.

Details

z2p is vectorized over all arguments.

p2z is vectorized over all arguments.

Value

ci2se returns a numeric vector of standard errors.

ci2estimate returns a numeric vector of parameter estimates.

ci2z returns a numeric vector of z-values.

ci2p returns a numeric vector of p-values.

z2p returns a numeric vector of p-values. The dimension of the output depends on the input. In general, the output will be an array of dimension c(nrow(z), ncol(z), length(alternative)). If any of these dimensions is 1, it will be dropped.

p2z returns a numeric vector of z-values. The dimension of the output depends on the input. In general, the output will be an array of dimension c(nrow(p), ncol(p), length(alternative)). If any of these dimensions is 1, it will be dropped.

Examples

ci2se(lower = 1, upper = 3)
ci2se(lower = 1, upper = 3, ratio = TRUE)
ci2se(lower = 1, upper = 3, conf.level = 0.9)

ci2estimate(lower = 1, upper = 3)
ci2estimate(lower = 1, upper = 3, ratio = TRUE)
ci2estimate(lower = 1, upper = 3, ratio = TRUE, antilog = TRUE)

ci2z(lower = 1, upper = 3)
ci2z(lower = 1, upper = 3, ratio = TRUE)
ci2z(lower = 1, upper = 3, conf.level = 0.9)

ci2p(lower = 1, upper = 3)
ci2p(lower = 1, upper = 3, alternative = "one.sided")

z2p(z = c(1, 2, 5))
z2p(z = c(1, 2, 5), alternative = "less")
z2p(z = c(1, 2, 5), alternative = "greater")
z <- seq(-3, 3, by = 0.01)
plot(z, z2p(z), type = "l", xlab = "z", ylab = "p", ylim = c(0, 1))
lines(z, z2p(z, alternative = "greater"), lty = 2)
legend("topright", c("two-sided", "greater"), lty = c(1, 2), bty = "n")

p2z(p = c(0.005, 0.01, 0.05))
p2z(p = c(0.005, 0.01, 0.05), alternative = "greater")
p2z(p = c(0.005, 0.01, 0.05), alternative = "less")
p <- seq(0.001, 0.05, 0.0001)
plot(p, p2z(p), type = "l", ylim = c(0, 3.5), ylab = "z")
lines(p, p2z(p, alternative = "greater"), lty = 2)
legend("bottomleft", c("two-sided", "greater"), lty = c(1, 2), bty = "n")
ci2se(lower = 1, upper = 3)
ci2se(lower = 1, upper = 3, ratio = TRUE)
ci2se(lower = 1, upper = 3, conf.level = 0.9)

ci2estimate(lower = 1, upper = 3)
ci2estimate(lower = 1, upper = 3, ratio = TRUE)
ci2estimate(lower = 1, upper = 3, ratio = TRUE, antilog = TRUE)

ci2z(lower = 1, upper = 3)
ci2z(lower = 1, upper = 3, ratio = TRUE)
ci2z(lower = 1, upper = 3, conf.level = 0.9)

ci2p(lower = 1, upper = 3)
ci2p(lower = 1, upper = 3, alternative = "one.sided")

z2p(z = c(1, 2, 5))
z2p(z = c(1, 2, 5), alternative = "less")
z2p(z = c(1, 2, 5), alternative = "greater")
z <- seq(-3, 3, by = 0.01)
plot(z, z2p(z), type = "l", xlab = "z", ylab = "p", ylim = c(0, 1))
lines(z, z2p(z, alternative = "greater"), lty = 2)
legend("topright", c("two-sided", "greater"), lty = c(1, 2), bty = "n")

p2z(p = c(0.005, 0.01, 0.05))
p2z(p = c(0.005, 0.01, 0.05), alternative = "greater")
p2z(p = c(0.005, 0.01, 0.05), alternative = "less")
p <- seq(0.001, 0.05, 0.0001)
plot(p, p2z(p), type = "l", ylim = c(0, 3.5), ylab = "z")
lines(p, p2z(p, alternative = "greater"), lty = 2)
legend("bottomleft", c("two-sided", "greater"), lty = c(1, 2), bty = "n")

Computes the minimum relative effect size to achieve replication success with the sceptical p-value

Description

The minimum relative effect size (replication to original) to achieve replication success with the sceptical p-value is computed based on the result of the original study and the corresponding variance ratio.

Usage

effectSizeReplicationSuccess(
  zo,
  c = 1,
  level = 0.025,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)
effectSizeReplicationSuccess(
  zo,
  c = 1,
  level = 0.025,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`level`	Threshold for the calibrated sceptical p-value. Default is 0.025.
`alternative`	Specifies if `level` is "one.sided" (default) or "two.sided". If "one.sided", then effect size calculations are based on a one-sided assessment of replication success in the direction of the original effect estimate.
`type`	Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration), or "controlled". "golden" ensures that for an original study just significant at the specified `level`, replication success is only possible for replication effect estimates larger than the original one. "controlled" ensures exact overall Type-I error control at level `level`^2.

Details

effectSizeReplicationSuccess is the vectorized version of the internal function .effectSizeReplicationSuccess_. Vectorize is used to vectorize the function.

Value

The minimum relative effect size to achieve replication success with the sceptical p-value.

Author(s)

Leonhard Held, Charlotte Micheloud, Samuel Pawel, Florian Gerber

References

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312

Examples

po <- c(0.001, 0.002, 0.01, 0.02, 0.025)
zo <- p2z(po, alternative = "one.sided")

effectSizeReplicationSuccess(zo = zo, c = 1, level = 0.025,
                             alternative = "one.sided", type = "golden")

effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025,
                             alternative = "one.sided", type = "golden")
effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025,
                             alternative = "one.sided", type = "controlled")
effectSizeReplicationSuccess(zo = zo, c= 2, level = 0.025,
                             alternative = "one.sided", type = "nominal")

effectSizeReplicationSuccess(zo = zo, c = 2, level = 0.05,
                             alternative = "two.sided", type = "nominal")
po <- c(0.001, 0.002, 0.01, 0.02, 0.025)
zo <- p2z(po, alternative = "one.sided")

effectSizeReplicationSuccess(zo = zo, c = 1, level = 0.025,
                             alternative = "one.sided", type = "golden")

effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025,
                             alternative = "one.sided", type = "golden")
effectSizeReplicationSuccess(zo = zo, c = 10, level = 0.025,
                             alternative = "one.sided", type = "controlled")
effectSizeReplicationSuccess(zo = zo, c= 2, level = 0.025,
                             alternative = "one.sided", type = "nominal")

effectSizeReplicationSuccess(zo = zo, c = 2, level = 0.05,
                             alternative = "two.sided", type = "nominal")

Computes the minimum relative effect size to achieve significance of the replication study

Description

The minimum relative effect size (replication to original) to achieve significance of the replication study is computed based on the result of the original study and the corresponding variance ratio.

Usage

effectSizeSignificance(
  zo,
  c = 1,
  level = 0.025,
  alternative = c("one.sided", "two.sided")
)
effectSizeSignificance(
  zo,
  c = 1,
  level = 0.025,
  alternative = c("one.sided", "two.sided")
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`level`	Significance level. Default is 0.025.
`alternative`	Specifies if the significance level is "one.sided" (default) or "two.sided". If the significance level is one-sided, then effect size calculations are based on a one-sided assessment of significance in the direction of the original effect estimate.

Details

effectSizeSignificance is the vectorized version of the internal function .effectSizeSignificance_. Vectorize is used to vectorize the function.

Value

The minimum relative effect size to achieve significance in the replication study.

Author(s)

Charlotte Micheloud, Samuel Pawel, Florian Gerber

References

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Examples

po <- c(0.001, 0.002, 0.01, 0.02, 0.025)
zo <- p2z(po, alternative = "one.sided")

effectSizeSignificance(zo = zo, c = 1, level = 0.025,
                       alternative = "one.sided")

effectSizeSignificance(zo = zo, c = 1, level = 0.05,
                       alternative = "two.sided")

effectSizeSignificance(zo = zo, c = 50, level = 0.025,
                       alternative = "one.sided")
po <- c(0.001, 0.002, 0.01, 0.02, 0.025)
zo <- p2z(po, alternative = "one.sided")

effectSizeSignificance(zo = zo, c = 1, level = 0.025,
                       alternative = "one.sided")

effectSizeSignificance(zo = zo, c = 1, level = 0.05,
                       alternative = "two.sided")

effectSizeSignificance(zo = zo, c = 50, level = 0.025,
                       alternative = "one.sided")

harmonic mean chi-squared test

Description

p-values and confidence intervals from the harmonic mean chi-squared test.

Usage

hMeanChiSq(
  z,
  w = rep(1, length(z)),
  alternative = c("greater", "less", "two.sided", "none"),
  bound = FALSE
)

hMeanChiSqMu(
  thetahat,
  se,
  w = rep(1, length(thetahat)),
  mu = 0,
  alternative = c("greater", "less", "two.sided", "none"),
  bound = FALSE
)

hMeanChiSqCI(
  thetahat,
  se,
  w = rep(1, length(thetahat)),
  alternative = c("two.sided", "greater", "less", "none"),
  conf.level = 0.95
)
hMeanChiSq(
  z,
  w = rep(1, length(z)),
  alternative = c("greater", "less", "two.sided", "none"),
  bound = FALSE
)

hMeanChiSqMu(
  thetahat,
  se,
  w = rep(1, length(thetahat)),
  mu = 0,
  alternative = c("greater", "less", "two.sided", "none"),
  bound = FALSE
)

hMeanChiSqCI(
  thetahat,
  se,
  w = rep(1, length(thetahat)),
  alternative = c("two.sided", "greater", "less", "none"),
  conf.level = 0.95
)

Arguments

`z`	Numeric vector of z-values.
`w`	Numeric vector of weights.
`alternative`	Either "greater" (default), "less", "two.sided", or "none". Specifies the alternative to be considered in the computation of the p-value.
`bound`	If `FALSE` (default), p-values that cannot be computed are reported as `NaN`. If `TRUE`, they are reported as "> bound".
`thetahat`	Numeric vector of parameter estimates.
`se`	Numeric vector of standard errors.
`mu`	The null hypothesis value. Defaults to 0.
`conf.level`	Numeric vector specifying the conf.level of the confidence interval. Defaults to 0.95. summarize the gamma values, i.e., the local minima of the p-value function between the thetahats. Defaults is a vector of 1s.

Value

hMeanChiSq: returns the p-values from the harmonic mean chi-squared test based on the study-specific z-values.

hMeanChiSqMu: returns the p-value from the harmonic mean chi-squared test based on study-specific estimates and standard errors.

hMeanChiSqCI: returns a list containing confidence interval(s) obtained by inverting the harmonic mean chi-squared test based on study-specific estimates and standard errors. The list contains:

`CI`	Confidence interval(s).

If the alternative is "none", the list also contains:

gamma

Local minima of the p-value function between the thetahats.

Author(s)

Leonhard Held, Florian Gerber

References

Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410

Examples

## Example from Fisher (1999) as discussed in Held (2020)
pvalues <- c(0.0245, 0.1305, 0.00025, 0.2575, 0.128)
lower <- c(0.04, 0.21, 0.12, 0.07, 0.41)
upper <- c(1.14, 1.54, 0.60, 3.75, 1.27)
se <- ci2se(lower = lower, upper = upper, ratio = TRUE)
thetahat <- ci2estimate(lower = lower, upper = upper, ratio = TRUE)

## hMeanChiSq() --------
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           alternative = "less")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           alternative = "two.sided")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           alternative = "none")

hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           w = 1 / se^2, alternative = "less")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           w = 1 / se^2, alternative = "two.sided")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           w = 1 / se^2, alternative = "none")


## hMeanChiSqMu() --------
hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided")
hMeanChiSqMu(thetahat = thetahat, se = se, w = 1 / se^2,
             alternative = "two.sided")
hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided",
             mu = -0.1)

## hMeanChiSqCI() --------
## two-sided
CI1 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                    alternative = "two.sided")
CI2 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                    alternative = "two.sided", conf.level = 0.99875)
## one-sided
CI1b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                     alternative = "less", conf.level = 0.975)
CI2b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                     alternative = "less", conf.level = 1 - 0.025^2)

## confidence intervals on hazard ratio scale
print(exp(CI1$CI), digits = 2)
print(exp(CI2$CI), digits = 2)
print(exp(CI1b$CI), digits = 2)
print(exp(CI2b$CI), digits = 2)


## example with confidence region consisting of disjunct intervals
thetahat2 <- c(-3.7, 2.1, 2.5)
se2 <- c(1.5, 2.2, 3.1)
conf.level <- 0.95; alpha <- 1 - conf.level
muSeq <- seq(-7, 6, length.out = 1000)
pValueSeq <- hMeanChiSqMu(thetahat = thetahat2, se = se2,
                          alternative = "none", mu = muSeq)
(hm <- hMeanChiSqCI(thetahat = thetahat2, se = se2, alternative = "none"))

plot(x = muSeq, y = pValueSeq, type = "l", panel.first = grid(lty = 1),
     xlab = expression(mu), ylab = "p-value")
abline(v = thetahat2, h = alpha, lty = 2)
arrows(x0 = hm$CI[, 1], x1 = hm$CI[, 2], y0 = alpha,
       y1 = alpha, col = "darkgreen", lwd = 3, angle = 90, code = 3)
points(hm$gamma, col = "red", pch = 19, cex = 2)

## Example from Fisher (1999) as discussed in Held (2020)
pvalues <- c(0.0245, 0.1305, 0.00025, 0.2575, 0.128)
lower <- c(0.04, 0.21, 0.12, 0.07, 0.41)
upper <- c(1.14, 1.54, 0.60, 3.75, 1.27)
se <- ci2se(lower = lower, upper = upper, ratio = TRUE)
thetahat <- ci2estimate(lower = lower, upper = upper, ratio = TRUE)

## hMeanChiSq() --------
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           alternative = "less")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           alternative = "two.sided")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           alternative = "none")

hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           w = 1 / se^2, alternative = "less")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           w = 1 / se^2, alternative = "two.sided")
hMeanChiSq(z = p2z(p = pvalues, alternative = "less"),
           w = 1 / se^2, alternative = "none")


## hMeanChiSqMu() --------
hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided")
hMeanChiSqMu(thetahat = thetahat, se = se, w = 1 / se^2,
             alternative = "two.sided")
hMeanChiSqMu(thetahat = thetahat, se = se, alternative = "two.sided",
             mu = -0.1)

## hMeanChiSqCI() --------
## two-sided
CI1 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                    alternative = "two.sided")
CI2 <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                    alternative = "two.sided", conf.level = 0.99875)
## one-sided
CI1b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                     alternative = "less", conf.level = 0.975)
CI2b <- hMeanChiSqCI(thetahat = thetahat, se = se, w = 1 / se^2,
                     alternative = "less", conf.level = 1 - 0.025^2)

## confidence intervals on hazard ratio scale
print(exp(CI1$CI), digits = 2)
print(exp(CI2$CI), digits = 2)
print(exp(CI1b$CI), digits = 2)
print(exp(CI2b$CI), digits = 2)


## example with confidence region consisting of disjunct intervals
thetahat2 <- c(-3.7, 2.1, 2.5)
se2 <- c(1.5, 2.2, 3.1)
conf.level <- 0.95; alpha <- 1 - conf.level
muSeq <- seq(-7, 6, length.out = 1000)
pValueSeq <- hMeanChiSqMu(thetahat = thetahat2, se = se2,
                          alternative = "none", mu = muSeq)
(hm <- hMeanChiSqCI(thetahat = thetahat2, se = se2, alternative = "none"))

plot(x = muSeq, y = pValueSeq, type = "l", panel.first = grid(lty = 1),
     xlab = expression(mu), ylab = "p-value")
abline(v = thetahat2, h = alpha, lty = 2)
arrows(x0 = hm$CI[, 1], x1 = hm$CI[, 2], y0 = alpha,
       y1 = alpha, col = "darkgreen", lwd = 3, angle = 90, code = 3)
points(hm$gamma, col = "red", pch = 19, cex = 2)

Computes the replication success level

Description

The replication success level is computed based on the specified alternative and recalibration type.

Usage

levelSceptical(
  level,
  c = NA,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)
levelSceptical(
  level,
  c = NA,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)

Arguments

`level`	Threshold for the calibrated sceptical p-value. Default is 0.025.
`c`	The variance ratio. Only required when `type =` "controlled".
`alternative`	Specifies if `level` is "one.sided" (default) or "two.sided". If "one-sided", then a one-sided replication success level is computed.
`type`	Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration), or "controlled". "golden" ensures that for an original study just significant at the specified `level`, replication success is only possible for replication effect estimates larger than the original one. "controlled" ensures exact overall Type-I error control at level `level`^2.

Details

levelSceptical is the vectorized version of the internal function .levelSceptical_. Vectorize is used to vectorize the function.

Value

Replication success levels

Author(s)

Leonhard Held

References

Held, L. (2020). A new standard for the analysis and design of replication studies (with discussion). Journal of the Royal Statistical Society: Series A (Statistics in Society), 183, 431-448. doi:10.1111/rssa.12493

Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics, 16, 706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312

Examples

levelSceptical(level = 0.025, alternative = "one.sided", type = "nominal")
levelSceptical(
  level = 0.025,
  alternative = "one.sided",
  type = "controlled",
  c = 1
)
levelSceptical(level = 0.025, alternative = "one.sided", type = "golden")
levelSceptical(level = 0.025, alternative = "one.sided", type = "nominal")
levelSceptical(
  level = 0.025,
  alternative = "one.sided",
  type = "controlled",
  c = 1
)
levelSceptical(level = 0.025, alternative = "one.sided", type = "golden")

Computes Box's tail probability

Description

pBox computes Box's tail probabilities based on the z-values of the original and the replication study, the corresponding variance ratio, and the significance level.

Usage

pBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided"))

zBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided"))
pBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided"))

zBox(zo, zr, c, level = 0.05, alternative = c("two.sided", "one.sided"))

Arguments

`zo`	Numeric vector of z-values from the original studies.
`zr`	Numeric vector of z-values from replication studies.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`level`	Numeric vector of significance levels. Default is 0.05.
`alternative`	Either "two.sided" (default) or "one.sided". Specifies whether two-sided or one-sided Box's tail probabilities are computed.

Details

pBox quantifies the conflict between the sceptical prior that would render the original study non-significant and the result from the replication study. If the original study was not significant at level level, the sceptical prior does not exist and pBox cannot be calculated.

Value

pBox returns Box's tail probabilities.

zBox returns the z-values used in pBox.

Author(s)

Leonhard Held

References

Box, G.E.P. (1980). Sampling and Bayes' inference in scientific modelling and robustness (with discussion). Journal of the Royal Statistical Society, Series A, 143, 383-430.

Examples

pBox(zo = p2z(0.01), zr = p2z(0.02), c = 2)
pBox(zo = p2z(0.02), zr = p2z(0.01), c = 1/2)
pBox(zo = p2z(0.02, alternative = "one.sided"),
     zr = p2z(0.01, alternative = "one.sided"),
     c = 1/2, alternative = "one.sided")
pBox(zo = p2z(0.01), zr = p2z(0.02), c = 2)
pBox(zo = p2z(0.02), zr = p2z(0.01), c = 1/2)
pBox(zo = p2z(0.02, alternative = "one.sided"),
     zr = p2z(0.01, alternative = "one.sided"),
     c = 1/2, alternative = "one.sided")

Computes Edgington's p-value

Description

The combined p-value with Edgington's method is computed based on the one-sided p-values (or the corresponding the z-values) of the original and replication study, and the ratio of the weight of the replication study over the weight of the original study

Usage

pEdgington(zo = NULL, zr = NULL, po = NULL, pr = NULL, r = 1)
pEdgington(zo = NULL, zr = NULL, po = NULL, pr = NULL, r = 1)

Arguments

`zo`	A vector of z-values from original studies.
`zr`	A vector of z-values from replication studies.
`po`	A vector of one-sided original p-values.
`pr`	A vector of one-sided replication p-values.
`r`	Numeric vector of ratios of replication to original weight

Details

Either zo and zr, or po and pr, must be specified.

Value

Edgington's p-value

Author(s)

Charlotte Micheloud, Leonhard Held, Samuel Pawel

References

Held, L., Pawel, S., Micheloud, C. (2024). The assessment of replicability using the sum of p-values. Royal Society Open Science. 11(8):11240149. doi:10.1098/rsos.240149

Examples

## examples from paper
pEdgington(po = 0.026, pr = 0.001)
pEdgington(po = 0.024, pr = 0.024)

## using z-values
pEdgington(zo = 1.91, zr = 1.95)
## using combination of z-value and p-value
pEdgington(zo = 1.91, pr = 0.024)
## examples from paper
pEdgington(po = 0.026, pr = 0.001)
pEdgington(po = 0.024, pr = 0.024)

## using z-values
pEdgington(zo = 1.91, zr = 1.95)
## using combination of z-value and p-value
pEdgington(zo = 1.91, pr = 0.024)

Computes the p-value for intrinsic credibility

Description

Computes the p-value for intrinsic credibility

Usage

pIntrinsic(
  p = z2p(z, alternative = alternative),
  z = NULL,
  alternative = c("two.sided", "one.sided", "less", "greater"),
  type = c("Held", "Matthews")
)
pIntrinsic(
  p = z2p(z, alternative = alternative),
  z = NULL,
  alternative = c("two.sided", "one.sided", "less", "greater"),
  type = c("Held", "Matthews")
)

Arguments

`p`	numeric vector of p-values.
`z`	numeric vector of z-values. Default is `NULL`.
`alternative`	Either "two.sided" (default) or "one.sided". Specifies if the p-value is two-sided or one-sided. If the p-value is one-sided, then a one-sided p-value for intrinsic credibility is computed.
`type`	Type of intrinsic p-value. Default is "Held" as in Held (2019). The other option is "Matthews" as in Matthews (2018).

Value

p-values for intrinsic credibility.

Author(s)

Leonhard Held

References

Matthews, R. A. J. (2018). Beyond 'significance': principles and practice of the analysis of credibility. Royal Society Open Science, 5, 171047. doi:10.1098/rsos.171047

Held, L. (2019). The assessment of intrinsic credibility and a new argument for p < 0.005. Royal Society Open Science, 6, 181534. doi:10.1098/rsos.181534

Examples

p <- c(0.005, 0.01, 0.05)
pIntrinsic(p = p)
pIntrinsic(p = p, type = "Matthews")
pIntrinsic(p = p, alternative = "one.sided")
pIntrinsic(p = p, alternative = "one.sided", type = "Matthews")

pIntrinsic(z = 2)
p <- c(0.005, 0.01, 0.05)
pIntrinsic(p = p)
pIntrinsic(p = p, type = "Matthews")
pIntrinsic(p = p, alternative = "one.sided")
pIntrinsic(p = p, alternative = "one.sided", type = "Matthews")

pIntrinsic(z = 2)

Computes the power for replication success with Edgington's method

Description

The power with Edgington's method is computed based on the result of the original study (z-value or one-sided p-value), the corresponding variance ratio, and the ratio of the weight of the replication study over the weight of the original study

Usage

powerEdgington(
  zo = NULL,
  po = NULL,
  r = 1,
  c = 1,
  level = 0.025,
  designPrior = "conditional",
  shrinkage = 0
)
powerEdgington(
  zo = NULL,
  po = NULL,
  r = 1,
  c = 1,
  level = 0.025,
  designPrior = "conditional",
  shrinkage = 0
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`po`	Numeric vector of original one-sided p-values
`r`	Numeric vector of ratios of replication to original weight.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`level`	One-sided significance level. Default is 0.025.
`designPrior`	Either "conditional" (default) or "predictive".
`shrinkage`	Numeric vector with values in [0,1). Defaults to 0. Specifies the shrinkage of the original effect estimate towards zero, e.g., the effect is shrunken by a factor of 25% for `shrinkage = 0.25`.

Details

Either zo or po must be specified.

Value

The power with Edgington's method

Author(s)

Charlotte Micheloud, Leonhard Held, Samuel Pawel

References

Held, L., Pawel, S., Micheloud, C. (2024). The assessment of replicability using the sum of p-values. Royal Society Open Science. 11(8):11240149. doi:10.1098/rsos.240149

Examples

powerEdgington(po = 0.025, level = 0.025, c = 1.4)
powerEdgington(po = 0.025, level = 0.025, c = 1.4)

Computes the power for replication success with the sceptical p-value

Description

Computes the power for replication success with the sceptical p-value based on the result of the original study, the corresponding variance ratio, and the design prior.

Usage

powerReplicationSuccess(
  zo,
  c = 1,
  level = 0.025,
  designPrior = c("conditional", "predictive", "EB"),
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled"),
  shrinkage = 0,
  h = 0,
  strict = FALSE
)
powerReplicationSuccess(
  zo,
  c = 1,
  level = 0.025,
  designPrior = c("conditional", "predictive", "EB"),
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled"),
  shrinkage = 0,
  h = 0,
  strict = FALSE
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`level`	Threshold for the calibrated sceptical p-value. Default is 0.025.
`designPrior`	Either "conditional" (default), "predictive", or "EB". If "EB", the power is computed under a predictive distribution, where the contribution of the original study is shrunken towards zero based on the evidence in the original study (with an empirical Bayes shrinkage estimator).
`alternative`	Specifies if `level` is "one.sided" (default) or "two.sided". If "one.sided" then power calculations are based on a one-sided assessment of replication success in the direction of the original effect estimates.
`type`	Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration), or "controlled". "golden" ensures that for an original study just significant at the specified `level`, replication success is only possible for replication effect estimates larger than the original one. "controlled" ensures exact overall Type-I error control at level `level`^2.
`shrinkage`	Numeric vector with values in [0,1). Defaults to 0. Specifies the shrinkage of the original effect estimate towards zero, e.g., the effect is shrunken by a factor of 25% for `shrinkage = 0.25`. Is only taken into account if the `designPrior` is "conditional" or "predictive".
`h`	Numeric vector of relative heterogeneity variances i.e., the ratios of the heterogeneity variance to the variance of the original effect estimate. Default is 0 (no heterogeneity). Is only taken into account when `designPrior` = "predictive" or `designPrior` = "EB".
`strict`	Logical vector indicating whether the probability for replication success in the opposite direction of the original effect estimate should also be taken into account. Default is `FALSE`. Only taken into account when `alternative` = "two.sided".

Details

powerReplicationSuccess is the vectorized version of the internal function .powerReplicationSuccess_. Vectorize is used to vectorize the function.

Value

The power for replication success with the sceptical p-value

Author(s)

Leonhard Held, Charlotte Micheloud, Samuel Pawel

References

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312

Examples

## larger sample size in replication (c > 1)
powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "conditional")
powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "predictive")

## smaller sample size in replication (c < 1)
powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "conditional")
powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "predictive")

powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05,
                        alternative = "two.sided",  strict = TRUE, shrinkage = 0.9)
powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05,
                        alternative = "two.sided", strict = FALSE, shrinkage = 0.9)

## larger sample size in replication (c > 1)
powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "conditional")
powerReplicationSuccess(zo = p2z(0.005), c = 2, level = 0.025, designPrior = "predictive")

## smaller sample size in replication (c < 1)
powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "conditional")
powerReplicationSuccess(zo = p2z(0.005), c = 1/2, level = 0.025, designPrior = "predictive")

powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05,
                        alternative = "two.sided",  strict = TRUE, shrinkage = 0.9)
powerReplicationSuccess(zo = p2z(0.00005), c = 2, level = 0.05,
                        alternative = "two.sided", strict = FALSE, shrinkage = 0.9)

Computes the power for significance

Description

The power for significance is computed based on the result of the original study, the corresponding variance ratio, and the design prior.

Usage

powerSignificance(
  zo,
  c = 1,
  level = 0.025,
  designPrior = c("conditional", "predictive", "EB"),
  alternative = c("one.sided", "two.sided"),
  h = 0,
  shrinkage = 0,
  strict = FALSE
)
powerSignificance(
  zo,
  c = 1,
  level = 0.025,
  designPrior = c("conditional", "predictive", "EB"),
  alternative = c("one.sided", "two.sided"),
  h = 0,
  shrinkage = 0,
  strict = FALSE
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`level`	Significance level. Default is 0.025.
`designPrior`	Either "conditional" (default), "predictive", or "EB". If "EB", the power is computed under a predictive distribution, where the contribution of the original study is shrunken towards zero based on the evidence in the original study (with an empirical Bayes shrinkage estimator).
`alternative`	Either "one.sided" (default) or "two.sided". Specifies if the significance level is one-sided or two-sided. If the significance level is one-sided, then power calculations are based on a one-sided assessment of significance in the direction of the original effect estimates.
`h`	The relative between-study heterogeneity, i.e., the ratio of the heterogeneity variance to the variance of the original effect estimate. Default is 0 (no heterogeneity). Is only taken into account when `designPrior` = "predictive" or `designPrior` = "EB".
`shrinkage`	Numeric vector with values in [0,1). Defaults to 0. Specifies the shrinkage of the original effect estimate towards zero, e.g., the effect is shrunken by a factor of 25% for `shrinkage = 0.25`. Is only taken into account if the `designPrior` is "conditional" or "predictive".
`strict`	Logical vector indicating whether the probability for significance in the opposite direction of the original effect estimate should also be taken into account. Default is `FALSE`. Only taken into account when `alternative` = "two.sided".

Details

powerSignificance is the vectorized version of the internal function .powerSignificance_. Vectorize is used to vectorize the function.

Value

The probability that a replication study yields a significant effect estimate in the specified direction.

Author(s)

Leonhard Held, Samuel Pawel, Charlotte Micheloud, Florian Gerber

References

Goodman, S. N. (1992). A comment on replication, p-values and evidence, Statistics in Medicine, 11, 875–879. doi:10.1002/sim.4780110705

Senn, S. (2002). Letter to the Editor, Statistics in Medicine, 21, 2437–2444.

Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Held, L. (2022). Power Calculations for Replication Studies. Statistical Science. 37:369-379. doi:10.1214/21-STS828

Examples

powerSignificance(zo = p2z(0.005), c = 2)
powerSignificance(zo = p2z(0.005), c = 2, designPrior = "predictive")
powerSignificance(zo = p2z(0.005), c = 2, alternative = "two.sided")
powerSignificance(zo = -3, c = 2, designPrior = "predictive",
                  alternative = "one.sided")
powerSignificance(zo = p2z(0.005), c = 1/2)
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive")
powerSignificance(zo = p2z(0.005), c = 1/2, alternative = "two.sided")
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive",
                  alternative = "two.sided")
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive",
                  alternative = "one.sided", h = 0.5, shrinkage = 0.5)
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "EB",
                  alternative = "two.sided", h = 0.5)

# power as function of original p-value
po <- seq(0.0001, 0.06, 0.0001)
plot(po, powerSignificance(zo = p2z(po), designPrior = "conditional"),
     type = "l", ylim = c(0, 1), lwd = 1.5, las = 1, ylab = "Power",
     xlab = expression(italic(p)[o]))
lines(po, powerSignificance(zo = p2z(po), designPrior = "predictive"),
      lwd = 2, lty = 2)
lines(po, powerSignificance(zo = p2z(po), designPrior = "EB"),
      lwd = 1.5, lty = 3)
legend("topright", legend = c("conditional", "predictive", "EB"),
       title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n")
powerSignificance(zo = p2z(0.005), c = 2)
powerSignificance(zo = p2z(0.005), c = 2, designPrior = "predictive")
powerSignificance(zo = p2z(0.005), c = 2, alternative = "two.sided")
powerSignificance(zo = -3, c = 2, designPrior = "predictive",
                  alternative = "one.sided")
powerSignificance(zo = p2z(0.005), c = 1/2)
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive")
powerSignificance(zo = p2z(0.005), c = 1/2, alternative = "two.sided")
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive",
                  alternative = "two.sided")
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "predictive",
                  alternative = "one.sided", h = 0.5, shrinkage = 0.5)
powerSignificance(zo = p2z(0.005), c = 1/2, designPrior = "EB",
                  alternative = "two.sided", h = 0.5)

# power as function of original p-value
po <- seq(0.0001, 0.06, 0.0001)
plot(po, powerSignificance(zo = p2z(po), designPrior = "conditional"),
     type = "l", ylim = c(0, 1), lwd = 1.5, las = 1, ylab = "Power",
     xlab = expression(italic(p)[o]))
lines(po, powerSignificance(zo = p2z(po), designPrior = "predictive"),
      lwd = 2, lty = 2)
lines(po, powerSignificance(zo = p2z(po), designPrior = "EB"),
      lwd = 1.5, lty = 3)
legend("topright", legend = c("conditional", "predictive", "EB"),
       title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n")

Interim power of a replication study

Description

Computes the power of a replication study taking into account data from an interim analysis.

Usage

powerSignificanceInterim(
  zo,
  zi,
  c = 1,
  f = 1/2,
  level = 0.025,
  designPrior = c("conditional", "informed predictive", "predictive"),
  analysisPrior = c("flat", "original"),
  alternative = c("one.sided", "two.sided"),
  shrinkage = 0
)
powerSignificanceInterim(
  zo,
  zi,
  c = 1,
  f = 1/2,
  level = 0.025,
  designPrior = c("conditional", "informed predictive", "predictive"),
  analysisPrior = c("flat", "original"),
  alternative = c("one.sided", "two.sided"),
  shrinkage = 0
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`zi`	Numeric vector of z-values from interim analyses of replication studies.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study. Default is 1.
`f`	Fraction of the replication study already completed. Default is 0.5.
`level`	Significance level. Default is 0.025.
`designPrior`	Either "conditional" (default), "informed predictive", or "predictive". "informed predictive" refers to an informative normal prior coming from the original study. "predictive" refers to a flat prior.
`analysisPrior`	Either "flat" (default) or "original".
`alternative`	Either "one.sided" (default) or "two.sided". Specifies if the significance level is one-sided or two-sided.
`shrinkage`	Numeric vector with values in [0,1). Defaults to 0. Specifies the shrinkage of the original effect estimate towards zero, e.g., the effect is shrunken by a factor of 25% for `shrinkage=0.25`.

Details

This is an extension of powerSignificance() and adapts the ‘interim power’ from section 6.6.3 of Spiegelhalter et al. (2004) to the setting of replication studies.

powerSignificanceInterim is the vectorized version of .powerSignificanceInterim_. Vectorize is used to vectorize the function.

Value

The probability of statistical significance in the specified direction at the end of the replication study given the data collected so far in the replication study.

Author(s)

Charlotte Micheloud

References

Spiegelhalter, D. J., Abrams, K. R., and Myles, J. P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation, volume 13. John Wiley & Sons

Micheloud, C., Held, L. (2022). Power Calculations for Replication Studies. Statistical Science, 37, 369-379. doi:10.1214/21-STS828

Examples

powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "conditional",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "informed predictive",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "predictive",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = -2, c = 1, f = 1/2,
                         designPrior = "conditional",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "conditional",
                         analysisPrior = "flat",
                         shrinkage = 0.25)
powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "conditional",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "informed predictive",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "predictive",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = -2, c = 1, f = 1/2,
                         designPrior = "conditional",
                         analysisPrior = "flat")

powerSignificanceInterim(zo = 2, zi = 2, c = 1, f = 1/2,
                         designPrior = "conditional",
                         analysisPrior = "flat",
                         shrinkage = 0.25)

Compute project power of the sceptical p-value

Description

The project power of the sceptical p-value is computed for a specified level, the relative variance, significance level and power for a standard significance test of the original study, and the alternative hypothesis.

Usage

PPpSceptical(
  level,
  c,
  alpha,
  power,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)
PPpSceptical(
  level,
  c,
  alpha,
  power,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)

Arguments

`level`	Threshold for the calibrated sceptical p-value. Default is 0.025.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`alpha`	Significance level for a standard significance test in the original study. Default is 0.025.
`power`	Power to detect the assumed effect with a standard significance test in the original study.
`alternative`	Specifies if `level` and `alpha` are "two.sided" or "one.sided".
`type`	Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration), or "controlled".

Details

PPpSceptical is the vectorized version of the internal function .PPpSceptical_. Vectorize is used to vectorize the function.

Value

The project power of the sceptical p-value

Author(s)

Leonhard Held, Samuel Pawel

References

Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720.doi:10.1214/21-AOAS1502

Maca, J., Gallo, P., Branson, M., and Maurer, W. (2002). Reconsidering some aspects of the two-trials paradigm. Journal of Biopharmaceutical Statistics, 12, 107-119. doi:10.1081/bip-120006450

Examples

## compare project power for different recalibration types
types <- c("nominal", "golden", "controlled")
c <- seq(0.4, 5, by = 0.01)
alpha <- 0.025
power <- 0.9
pp <- sapply(X = types, FUN = function(t) {
  PPpSceptical(type = t, c = c, alpha, power, alternative = "one.sided",
               level = 0.025)
})

## compute project power of 2 trials rule
za <- qnorm(p = 1 - alpha)
mu <- za + qnorm(p = power)
pp2TR <- power * pnorm(q = za, mean = sqrt(c) * mu, lower.tail = FALSE)

matplot(x = c, y = pp * 100, type = "l", lty = 1, lwd = 2, las = 1, log = "x",
        xlab = bquote(italic(c)), ylab = "Project power (%)", xlim = c(0.4, 5),
        ylim = c(0, 100))
lines(x = c, y = pp2TR * 100, col = length(types) + 1, lwd = 2)
abline(v = 1, lty = 2)
abline(h = 90, lty = 2, col = "lightgrey")
legend("bottomright", legend = c(types, "2TR"), lty = 1, lwd = 2,
       col = seq(1, length(types) + 1))
## compare project power for different recalibration types
types <- c("nominal", "golden", "controlled")
c <- seq(0.4, 5, by = 0.01)
alpha <- 0.025
power <- 0.9
pp <- sapply(X = types, FUN = function(t) {
  PPpSceptical(type = t, c = c, alpha, power, alternative = "one.sided",
               level = 0.025)
})

## compute project power of 2 trials rule
za <- qnorm(p = 1 - alpha)
mu <- za + qnorm(p = power)
pp2TR <- power * pnorm(q = za, mean = sqrt(c) * mu, lower.tail = FALSE)

matplot(x = c, y = pp * 100, type = "l", lty = 1, lwd = 2, las = 1, log = "x",
        xlab = bquote(italic(c)), ylab = "Project power (%)", xlim = c(0.4, 5),
        ylim = c(0, 100))
lines(x = c, y = pp2TR * 100, col = length(types) + 1, lwd = 2)
abline(v = 1, lty = 2)
abline(h = 90, lty = 2, col = "lightgrey")
legend("bottomright", legend = c(types, "2TR"), lty = 1, lwd = 2,
       col = seq(1, length(types) + 1))

Prediction interval for effect estimate of replication study

Description

Computes a prediction interval for the effect estimate of the replication study.

Usage

predictionInterval(
  thetao,
  seo,
  ser,
  tau = 0,
  conf.level = 0.95,
  designPrior = "predictive"
)
predictionInterval(
  thetao,
  seo,
  ser,
  tau = 0,
  conf.level = 0.95,
  designPrior = "predictive"
)

Arguments

`thetao`	Numeric vector of effect estimates from original studies.
`seo`	Numeric vector of standard errors of the original effect estimates.
`ser`	Numeric vector of standard errors of the replication effect estimates.
`tau`	Between-study heterogeneity standard error. Default is `0` (no heterogeneity). Is only taken into account when `designPrior` is "predictive" or "EB".
`conf.level`	The confidence level of the prediction intervals. Default is 0.95.
`designPrior`	Either "predictive" (default), "conditional", or "EB". If "EB", the contribution of the original study to the predictive distribution is shrunken towards zero based on the evidence in the original study (with empirical Bayes).

Details

This function computes a prediction interval and a mean estimate under a specified predictive distribution of the replication effect estimate. Setting designPrior = "conditional" is not recommended since this ignores the uncertainty of the original effect estimate. See Patil, Peng, and Leek (2016) and Pawel and Held (2020) for details.

predictionInterval is the vectorized version of .predictionInterval_. Vectorize is used to vectorize the function.

Value

A data frame with the following columns

`lower`	Lower limit of prediction interval,
`mean`	Mean of predictive distribution,
`upper`	Upper limit of prediction interval.

Author(s)

Samuel Pawel

References

Patil, P., Peng, R. D., Leek, J. T. (2016). What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspectives on Psychological Science, 11, 539-544. doi:10.1177/1745691616646366

Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416

Examples

predictionInterval(thetao = c(1.5, 2, 5), seo = 1, ser = 0.5, designPrior = "EB")

# compute prediction intervals for replication projects
data("RProjects", package = "ReplicationSuccess")
parOld <- par(mfrow = c(2, 2))
for (p in unique(RProjects$project)) {
  data_project <- subset(RProjects, project == p)
  PI <- predictionInterval(thetao = data_project$fiso, seo = data_project$se_fiso,
                           ser = data_project$se_fisr)
  PI <- tanh(PI) # transforming back to correlation scale
  within <- (data_project$rr < PI$upper) & (data_project$rr > PI$lower)
  coverage <- mean(within)
  color <- ifelse(within == TRUE, "#333333B3", "#8B0000B3")
  study <- seq(1, nrow(data_project))
  plot(data_project$rr, study, col = color, pch = 20,
       xlim = c(-0.5, 1), xlab = expression(italic(r)[r]),
       main = paste0(p, ": ", round(coverage*100, 1), "% coverage"))
  arrows(PI$lower, study, PI$upper, study, length = 0.02, angle = 90,
         code = 3, col = color)
  abline(v = 0, lty = 3)
}
par(parOld)
predictionInterval(thetao = c(1.5, 2, 5), seo = 1, ser = 0.5, designPrior = "EB")

# compute prediction intervals for replication projects
data("RProjects", package = "ReplicationSuccess")
parOld <- par(mfrow = c(2, 2))
for (p in unique(RProjects$project)) {
  data_project <- subset(RProjects, project == p)
  PI <- predictionInterval(thetao = data_project$fiso, seo = data_project$se_fiso,
                           ser = data_project$se_fisr)
  PI <- tanh(PI) # transforming back to correlation scale
  within <- (data_project$rr < PI$upper) & (data_project$rr > PI$lower)
  coverage <- mean(within)
  color <- ifelse(within == TRUE, "#333333B3", "#8B0000B3")
  study <- seq(1, nrow(data_project))
  plot(data_project$rr, study, col = color, pch = 20,
       xlim = c(-0.5, 1), xlab = expression(italic(r)[r]),
       main = paste0(p, ": ", round(coverage*100, 1), "% coverage"))
  arrows(PI$lower, study, PI$upper, study, length = 0.02, angle = 90,
         code = 3, col = color)
  abline(v = 0, lty = 3)
}
par(parOld)

Probability of replicating an effect by Killeen (2005)

Description

Computes the probability that a replication study yields an effect estimate in the same direction as in the original study.

Usage

pReplicate(
  po = NULL,
  zo = p2z(p = po, alternative = alternative),
  c,
  alternative = "two.sided"
)
pReplicate(
  po = NULL,
  zo = p2z(p = po, alternative = alternative),
  c,
  alternative = "two.sided"
)

Arguments

`po`	Numeric vector of p-values from the original study, default is `NULL`.
`zo`	Numeric vector of z-values from the original study. Is calculated from `po`, if necessary.
`c`	The ratio of the variances of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`alternative`	Either "two.sided" (default) or "one.sided". Specifies whether the p-value is two-sided or one-sided.

Details

This extends the statistic p_rep ("the probability of replicating an effect") by Killeen (2005) to the case of possibly unequal sample sizes, see also Senn (2002).

Value

The probability that a replication study yields an effect estimate in the same direction as in the original study.

Author(s)

Leonhard Held

References

Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345–353. doi:10.1111/j.0956-7976.2005.01538.x

Senn, S. (2002). Letter to the Editor, Statistics in Medicine, 21, 2437–2444.

Held, L. (2019). The assessment of intrinsic credibility and a new argument for p < 0.005. Royal Society Open Science, 6, 181534. doi:10.1098/rsos.181534

Examples

pReplicate(po = c(0.05, 0.01, 0.001), c = 1)
pReplicate(po = c(0.05, 0.01, 0.001), c = 2)
pReplicate(po = c(0.05, 0.01, 0.001), c = 2, alternative = "one.sided")
pReplicate(zo = c(2, 3, 4), c = 1)
pReplicate(po = c(0.05, 0.01, 0.001), c = 1)
pReplicate(po = c(0.05, 0.01, 0.001), c = 2)
pReplicate(po = c(0.05, 0.01, 0.001), c = 2, alternative = "one.sided")
pReplicate(zo = c(2, 3, 4), c = 1)

Data from Protzko et al. (2020)

Description

Data from "High Replicability of Newly-Discovered Social-behavioral Findings is Achievable" by Protzko et al. (2020). The variables are as follows:

experiment: Experiment name
type: Type of study, either "original", "self-replication", or "external-replication"
lab: The lab which conducted the study, either 1, 2, 3, or 4.
smd: Standardized mean difference effect estimate
se: Standard error of standardized mean difference effect estimate
n: Total sample size of the study

Usage

data("protzko2020")
data("protzko2020")

Format

A data frame with 80 rows and 6 variables

Details

This data set originates from a prospective replication project involving four laboratories. Each of them conducted four original studies and for each original study a replication study was carried out within the same lab (self-replication) and by the other three labs (external-replication). Most studies used simple between-subject designs with two groups and a continuous outcome so that for each study, an estimate of the standardized mean difference (SMD) could be computed from the group means, group standard deviations, and group sample sizes. For studies with covariate adjustment and/or binary outcomes, effect size transformations as described in the supplementary material of Protzko (2020) were used to obtain effect estimates and standard errors on SMD scale. The data set is licensed under a CC-By Attribution 4.0 International license, see https://creativecommons.org/licenses/by/4.0/ for the terms of reuse.

Source

The relevant files were downloaded from https://osf.io/42ef9/ on January 24, 2022. The R markdown script "Decline effects main analysis.Rmd" was executed and the relevant variables from the objects "ES_experiments" and "decline_effects" were saved.

References

Protzko, J., Krosnick, J., Nelson, L. D., Nosek, B. A., Axt, J., Berent, M., ... Schooler, J. (2020, September 10). High Replicability of Newly-Discovered Social-behavioral Findings is Achievable. doi:10.31234/osf.io/n2a9x

Protzko, J., Berent, M., Buttrick, N., DeBell, M., Roeder, S. S., Walleczek, J., ... Nosek, B. A. (2021, January 5). Results & Data. Retrieved from https://osf.io/42ef9/

Examples

data("protzko2020", package = "ReplicationSuccess")

## forestplots of effect estimates
graphics.off()
parOld <- par(mar = c(5, 8, 4, 2), mfrow = c(4, 4))
experiments <- unique(protzko2020$experiment)
for (ex in experiments) {
  ## compute CIs
  dat <- subset(protzko2020, experiment == ex)
  za <- qnorm(p = 0.975)
  plotDF <- data.frame(lower = dat$smd - za*dat$se,
                       est = dat$smd,
                       upper = dat$smd + za*dat$se)
colpalette <- c("#000000", "#1B9E77", "#D95F02")
cols <- colpalette[dat$type]
yseq <- seq(1, nrow(dat))

## forestplot
plot(x = plotDF$est, y = yseq, xlim = c(-0.15, 0.8),
     ylim = c(0.8*min(yseq), 1.05*max(yseq)), type = "n",
     yaxt = "n", xlab = "Effect estimate (SMD)", ylab = "")
abline(v = 0, col = "#0000004D")
arrows(x0 = plotDF$lower, x1 = plotDF$upper, y0 = yseq, angle = 90,
       code = 3, length = 0.05, col = cols)
points(y = yseq, x = plotDF$est, pch = 20, lwd = 2, col = cols)
axis(side = 2, at = yseq, las = 1, labels = dat$type, cex.axis = 0.85)
title(main = ex)
}
par(parOld)
data("protzko2020", package = "ReplicationSuccess")

## forestplots of effect estimates
graphics.off()
parOld <- par(mar = c(5, 8, 4, 2), mfrow = c(4, 4))
experiments <- unique(protzko2020$experiment)
for (ex in experiments) {
  ## compute CIs
  dat <- subset(protzko2020, experiment == ex)
  za <- qnorm(p = 0.975)
  plotDF <- data.frame(lower = dat$smd - za*dat$se,
                       est = dat$smd,
                       upper = dat$smd + za*dat$se)
colpalette <- c("#000000", "#1B9E77", "#D95F02")
cols <- colpalette[dat$type]
yseq <- seq(1, nrow(dat))

## forestplot
plot(x = plotDF$est, y = yseq, xlim = c(-0.15, 0.8),
     ylim = c(0.8*min(yseq), 1.05*max(yseq)), type = "n",
     yaxt = "n", xlab = "Effect estimate (SMD)", ylab = "")
abline(v = 0, col = "#0000004D")
arrows(x0 = plotDF$lower, x1 = plotDF$upper, y0 = yseq, angle = 90,
       code = 3, length = 0.05, col = cols)
points(y = yseq, x = plotDF$est, pch = 20, lwd = 2, col = cols)
axis(side = 2, at = yseq, las = 1, labels = dat$type, cex.axis = 0.85)
title(main = ex)
}
par(parOld)

Computes the sceptical p-value and z-value

Description

Computes sceptical p-values and z-values based on the z-values of the original and the replication study and the corresponding variance ratios. If specified, the sceptical p-values are recalibrated.

Usage

pSceptical(
  zo,
  zr,
  c,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)

zSceptical(zo, zr, c)
pSceptical(
  zo,
  zr,
  c,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)

zSceptical(zo, zr, c)

Arguments

`zo`	Numeric vector of z-values from original studies.
`zr`	Numeric vector of z-values from replication studies.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`alternative`	Either "one.sided" (default) or "two.sided". If "one.sided", the sceptical p-value is based on a one-sided assessment of replication success in the direction of the original effect estimate. If "two.sided", the sceptical p-value is based on a two-sided assessment of replication success regardless of the direction of the original and replication effect estimate.
`type`	Type of recalibration. Can be either "golden" (default), "nominal", or "controlled". Setting `type` to "nominal" corresponds to no recalibration as in Held et al. (2020). A recalibration is applied if `type` is "controlled", or "golden", and the sceptical p-value can then be interpreted on the same scale as an ordinary p-value (e.g., a one-sided sceptical p-value can be thresholded at the conventional 0.025 level). "golden" ensures that for an original study just significant at the specified `level`, replication success is only possible if the replication effect estimate is at least as large as the original one. "controlled" ensures exact overall Type-I error control at level `level`^2.

Details

pSceptical is the vectorized version of the internal function .pSceptical_. Vectorize is used to vectorize the function.

Value

pSceptical returns the sceptical p-value.

zSceptical returns the z-value of the sceptical p-value.

Author(s)

Leonhard Held

References

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312

Examples

## no recalibration (type = "nominal") as in Held (2020)
pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided",
           type = "nominal")

## recalibration with golden level as in Held, Micheloud, Pawel (2020)
pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided",
           type = "golden")

## two-sided p-values 0.01 and 0.02, relative sample size 2
pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided")
## reverse the studies
pSceptical(
  zo = p2z(0.02),
  zr = p2z(0.01),
  c = 1/2,
  alternative = "one.sided"
)
## both p-values 0.01, relative sample size 2
pSceptical(zo = p2z(0.01), zr = p2z(0.01), c = 2, alternative = "two.sided")

zSceptical(zo = 2, zr = 3, c = 2)
zSceptical(zo = 3, zr = 2, c = 2)
## no recalibration (type = "nominal") as in Held (2020)
pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided",
           type = "nominal")

## recalibration with golden level as in Held, Micheloud, Pawel (2020)
pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided",
           type = "golden")

## two-sided p-values 0.01 and 0.02, relative sample size 2
pSceptical(zo = p2z(0.01), zr = p2z(0.02), c = 2, alternative = "one.sided")
## reverse the studies
pSceptical(
  zo = p2z(0.02),
  zr = p2z(0.01),
  c = 1/2,
  alternative = "one.sided"
)
## both p-values 0.01, relative sample size 2
pSceptical(zo = p2z(0.01), zr = p2z(0.01), c = 2, alternative = "two.sided")

zSceptical(zo = 2, zr = 3, c = 2)
zSceptical(zo = 3, zr = 2, c = 2)

Bound for the p-values entering the harmonic mean chi-squared test

Description

Necessary or sufficient bounds for significance of the harmonic mean chi-squared test are computed for n one-sided p-values.

Usage

pvalueBound(alpha, n, type = c("necessary", "sufficient"))
pvalueBound(alpha, n, type = c("necessary", "sufficient"))

Arguments

`alpha`	Numeric vector specifying the significance level.
`n`	The number of p-values.
`type`	Either "necessary" (default) or "sufficient". If "necessary", the necessary bounds are computed. If "sufficient", the sufficient bounds are computed.

Value

The bound for the p-values.

Author(s)

Leonhard Held

References

Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410

Examples

pvalueBound(alpha = 0.025^2, n = 2, type = "necessary")
pvalueBound(alpha = 0.025^2, n = 2, type = "sufficient")
pvalueBound(alpha = 0.025^2, n = 2, type = "necessary")
pvalueBound(alpha = 0.025^2, n = 2, type = "sufficient")

Q-test to assess compatibility between original and replication effect estimate

Description

Computes p-value from meta-analytic Q-test to assess compatibility between original and replication effect estimate.

Usage

Qtest(thetao, thetar, seo, ser)
Qtest(thetao, thetar, seo, ser)

Arguments

`thetao`	Numeric vector of effect estimates from original studies.
`thetar`	Numeric vector of effect estimates from replication studies.
`seo`	Numeric vector of standard errors of the original effect estimates.
`ser`	Numeric vector of standard errors of the replication effect estimates.

Details

This function computes the p-value from a meta-analytic Q-test assessing compatibility between original and replication effect estimate. Rejecting compatibility when the p-value is smaller than alpha is equivalent with rejecting compatibility based on a (1 - alpha) prediction interval.

Value

p-value from Q-test.

Author(s)

Samuel Pawel

References

Hedges, L. V., Schauer, J. M. (2019). More Than One Replication Study Is Needed for Unambiguous Tests of Replication. Journal of Educational and Behavioral Statistics, 44, 543-570. doi:10.3102/1076998619852953

Examples

Qtest(thetao = 2, thetar = 0.5, seo = 1, ser = 0.5)
Qtest(thetao = 2, thetar = 0.5, seo = 1, ser = 0.5)

Data from four large-scale replication projects

Description

Data from Reproduciblity Project Psychology (RPP), Experimental Economics Replication Project (EERP), Social Sciences Replication Project (SSRP), Experimental Philosophy Replicability Project (EPRP). The variables are as follows:

study: Study identifier, usually names of authors from original study
project: Name of replication project
ro: Effect estimate of original study on correlation scale
rr: Effect estimate of replication study on correlation scale
fiso: Effect estimate of original study transformed to Fisher-z scale
fisr: Effect estimate of replication study transformed to Fisher-z scale
se_fiso: Standard error of Fisher-z transformed effect estimate of original study
se_fisr: Standard error of Fisher-z transformed effect estimate of replication study
po: Two-sided p-value from significance test of effect estimate from original study
pr: Two-sided p-value from significance test of effect estimate from replication study
po1: One-sided p-value from significance test of effect estimate from original study (in the direction of the original effect estimate)
pr1: One-sided p-value from significance test of effect estimate from replication study (in the direction of the original effect estimate)
pm_belief: Peer belief about whether replication effect estimate will achieve statistical significance elicited through prediction market (only available for EERP and SSRP)
no: Sample size in original study
nr: Sample size in replication study

Usage

data(RProjects)
data(RProjects)

Format

A data frame with 143 rows and 15 variables

Details

Two-sided p-values were calculated assuming normality of Fisher-z transformed effect estimates. From the RPP only the meta-analytic subset is included, which consists of 73 out of 100 study pairs for which the standard error of the z-transformed correlation coefficient can be computed. For the RPP sample sizes were recalculated from the reported standard errors of Fisher z-transformed correlation coefficients. From the EPRP only 31 out of 40 study pairs are included where effective sample size for original and replication study are available simultaneously. For more details about how the the data was preprocessed see source below and supplement S1 of Pawel and Held (2020).

Source

RPP: The source files were downloaded from https://github.com/CenterForOpenScience/rpp/. The "masterscript.R" file was executed and the relevant variables were extracted from the generated "final" object (standard errors of Fisher-z transformed correlations) and "MASTER" object (everything else). The data set is licensed under a CC0 1.0 Universal license, see https://creativecommons.org/publicdomain/zero/1.0/ for the terms of reuse.

EERP: The source files were downloaded from https://osf.io/pnwuz/. The required data were then manually extracted from the code in the files "effectdata.py" (sample sizes) and "create_studydetails.do" (everything else). Data regarding the prediction market and survey beliefs were manually extracted from table S3 of the supplementary materials of the EERP. The authors of this R package have been granted permission to share this data set by the coordinators of the EERP.

SSRP: The relevant variables were extracted from the file "D3 - ReplicationResults.csv" downloaded from https://osf.io/abu7k. For replications which underwent only the first stage, the data from the first stage were taken as the data for the replication study. For the replications which reached the second stage, the pooled data from both stages were taken as the data for the replication study. Data regarding survey and prediction market beliefs were extracted from the "D6 - MeanPeerBeliefs.csv" file, which was downloaded from https://osf.io/vr6p8/. The data set is licensed under a CC0 1.0 Universal license, see https://creativecommons.org/publicdomain/zero/1.0/ for the terms of reuse.

EPRP: Data were taken from the "XPhiReplicability_CompleteData.csv" file, which was downloaded from https://osf.io/4ewkh/. The authors of this R package have been granted permission to share this data set by the coordinators of the EPRP.

References

Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., ... Hang, W. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351, 1433-1436. doi:10.1126/science.aaf0918

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., ... Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2, 637-644. doi:10.1038/s41562-018-0399-z

Cova, F., Strickland, B., Abatista, A., Allard, A., Andow, J., Attie, M., ... Zhou, X. (2018). Estimating the reproducibility of experimental philosophy. Review of Philosophy and Psychology. doi:10.1007/s13164-018-0400-9

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, aac4716. doi:10.1126/science.aac4716

Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416

Examples

data("RProjects", package = "ReplicationSuccess")

## Computing key quantities
RProjects$zo <- RProjects$fiso/RProjects$se_fiso
RProjects$zr <- RProjects$fisr/RProjects$se_fisr
RProjects$c <- RProjects$se_fiso^2/RProjects$se_fisr^2

## Computing one-sided p-values for alternative = "greater"
RProjects$po1 <- z2p(z = RProjects$zo, alternative = "greater")
RProjects$pr1 <- z2p(z = RProjects$zr, alternative = "greater")

## Plots of effect estimates
parOld <- par(mfrow = c(2, 2))
for (p in unique(RProjects$project)) {
  data_project <- subset(RProjects, project == p)
  plot(rr ~ ro, data = data_project, ylim = c(-0.5, 1),
       xlim = c(-0.5, 1), main = p, xlab = expression(italic(r)[o]),
       ylab = expression(italic(r)[r]))
  abline(h = 0, lty = 2)
  abline(a = 0, b = 1, col = "grey")
}
par(parOld)

## Plots of peer beliefs
RProjects$significant <- factor(RProjects$pr < 0.05,
                                levels = c(FALSE, TRUE),
                                labels = c("no", "yes"))
parOld <- par(mfrow = c(1, 2))
for (p in c("Experimental Economics", "Social Sciences")) {
  data_project <- subset(RProjects, project == p)
  boxplot(pm_belief ~ significant, data = data_project, ylim = c(0, 1),
          main = p, xlab = "Replication effect significant", ylab = "Peer belief")
  stripchart(pm_belief ~ significant, data = data_project, vertical = TRUE,
             add = TRUE, pch = 1, method = "jitter")
}
par(parOld)

## Computing the sceptical p-value
ps <- with(RProjects, pSceptical(zo = fiso/se_fiso,
                                 zr = fisr/se_fisr,
                                 c = se_fiso^2/se_fisr^2))
data("RProjects", package = "ReplicationSuccess")

## Computing key quantities
RProjects$zo <- RProjects$fiso/RProjects$se_fiso
RProjects$zr <- RProjects$fisr/RProjects$se_fisr
RProjects$c <- RProjects$se_fiso^2/RProjects$se_fisr^2

## Computing one-sided p-values for alternative = "greater"
RProjects$po1 <- z2p(z = RProjects$zo, alternative = "greater")
RProjects$pr1 <- z2p(z = RProjects$zr, alternative = "greater")

## Plots of effect estimates
parOld <- par(mfrow = c(2, 2))
for (p in unique(RProjects$project)) {
  data_project <- subset(RProjects, project == p)
  plot(rr ~ ro, data = data_project, ylim = c(-0.5, 1),
       xlim = c(-0.5, 1), main = p, xlab = expression(italic(r)[o]),
       ylab = expression(italic(r)[r]))
  abline(h = 0, lty = 2)
  abline(a = 0, b = 1, col = "grey")
}
par(parOld)

## Plots of peer beliefs
RProjects$significant <- factor(RProjects$pr < 0.05,
                                levels = c(FALSE, TRUE),
                                labels = c("no", "yes"))
parOld <- par(mfrow = c(1, 2))
for (p in c("Experimental Economics", "Social Sciences")) {
  data_project <- subset(RProjects, project == p)
  boxplot(pm_belief ~ significant, data = data_project, ylim = c(0, 1),
          main = p, xlab = "Replication effect significant", ylab = "Peer belief")
  stripchart(pm_belief ~ significant, data = data_project, vertical = TRUE,
             add = TRUE, pch = 1, method = "jitter")
}
par(parOld)

## Computing the sceptical p-value
ps <- with(RProjects, pSceptical(zo = fiso/se_fiso,
                                 zr = fisr/se_fisr,
                                 c = se_fiso^2/se_fisr^2))

Computes the required relative sample size to achieve replication success with Edgington's method based on power

Description

The relative sample size to achieve replication success with Edgington's method is computed based on the z-value (or one-sided p-value) of the original study, the significance level, the ratio of the weight of the replication study over the weight of the original study, the design prior and the power.

Usage

sampleSizeEdgington(
  zo = NULL,
  po = NULL,
  r = 1,
  power,
  level = 0.025,
  designPrior = "conditional",
  shrinkage = 0
)
sampleSizeEdgington(
  zo = NULL,
  po = NULL,
  r = 1,
  power,
  level = 0.025,
  designPrior = "conditional",
  shrinkage = 0
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`po`	Numeric vector of original one-sided p-values
`r`	Numeric vector of ratios of replication to original weight.
`power`	Power to achieve replication success.
`level`	One-sided significance level. Default is 0.025.
`designPrior`	Either "conditional" (default) or "predictive".
`shrinkage`	Numeric vector with values in [0,1). Defaults to 0. Specifies the shrinkage of the original effect estimate towards zero, e.g., the effect is shrunken by a factor of 25% for `shrinkage = 0.25`. Is only taken into account if the `designPrior` is "conditional" or "predictive".

Details

Either zo or po must be specified.

Value

The relative sample size to achieve replication success with Edgington's method. If impossible to achieve the desired power for specified inputs NaN is returned.

Author(s)

Charlotte Micheloud, Leonhard Held, Samuel Pawel

References

Held, L., Pawel, S., Micheloud, C. (2024). The assessment of replicability using the sum of p-values. Royal Society Open Science. 11(8):11240149. doi:10.1098/rsos.240149

Examples

## partially recreate Figure 5 from paper
poseq <- exp(seq(log(0.00001), log(0.025), length.out = 100))
cseq <- sampleSizeEdgington(po = poseq, power = 0.8)
cseqSig <- sampleSizeSignificance(zo = p2z(p = poseq, alternative = "one.sided"),
                                  power = 0.8)
plot(poseq, cseq/cseqSig, log = "x", xlim = c(0.00001, 0.035), ylim = c(0.9, 1.3),
     type = "l", las = 1, xlab = "Original p-value", ylab = "Sample size ratio")
## partially recreate Figure 5 from paper
poseq <- exp(seq(log(0.00001), log(0.025), length.out = 100))
cseq <- sampleSizeEdgington(po = poseq, power = 0.8)
cseqSig <- sampleSizeSignificance(zo = p2z(p = poseq, alternative = "one.sided"),
                                  power = 0.8)
plot(poseq, cseq/cseqSig, log = "x", xlim = c(0.00001, 0.035), ylim = c(0.9, 1.3),
     type = "l", las = 1, xlab = "Original p-value", ylab = "Sample size ratio")

Computes the required relative sample size to achieve replication success with the sceptical p-value

Description

The relative sample size to achieve replication success is computed based on the z-value of the original study, the type of recalibration, the power and the design prior.

Usage

sampleSizeReplicationSuccess(
  zo,
  power = NA,
  level = 0.025,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled"),
  designPrior = c("conditional", "predictive", "EB"),
  shrinkage = 0,
  h = 0
)
sampleSizeReplicationSuccess(
  zo,
  power = NA,
  level = 0.025,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled"),
  designPrior = c("conditional", "predictive", "EB"),
  shrinkage = 0,
  h = 0
)

Arguments

`zo`	Numeric vector of z-values from original studies.
`power`	The power to achieve replication success.
`level`	Threshold for the calibrated sceptical p-value. Default is 0.025.
`alternative`	Specifies if `level` is "one.sided" (default) or "two.sided". If "one.sided" then sample size calculations are based on a one-sided assessment of replication success in the direction of the original effect estimates.
`type`	Type of recalibration. Can be either "golden" (default), "nominal" (no recalibration), or "controlled". "golden" ensures that for an original study just significant at the specified `level`, replication success is only possible for replication effect estimates larger than the original one. "controlled" ensures exact overall Type-I error control at level `level`^2.
`designPrior`	Is only taken into account when `power` is specified. Either "conditional" (default), "predictive", or "EB". If "EB", the power is computed under a predictive distribution where the contribution of the original study is shrunken towards zero based on the evidence in the original study (with an empirical Bayes shrinkage estimator).
`shrinkage`	Is only taken into account when `power` is specified. A number in [0,1) with default 0. Specifies the shrinkage of the original effect estimate towards zero (e.g., the effect is shrunken by a factor of 25% for `shrinkage = 0.25`). Is only taken into account when the `designPrior` is "conditional" or "predictive".
`h`	Is only taken into account when `power` is specified and `designPrior` is "predictive" or "EB". The relative between-study heterogeneity, i.e., the ratio of the heterogeneity variance to the variance of the original effect estimate. Default is 0 (no heterogeneity).

Details

sampleSizeReplicationSuccess is the vectorized version of the internal function .sampleSizeReplicationSuccess_. Vectorize is used to vectorize the function.

Value

The relative sample size for replication success. If impossible to achieve the desired power for specified inputs NaN is returned.

Author(s)

Leonhard Held, Charlotte Micheloud, Samuel Pawel, Florian Gerber

References

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312

Examples

## based on power
sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025,
                             type = "golden")
sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025,
                             type = "golden", designPrior = "predictive")
## based on power
sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025,
                             type = "golden")
sampleSizeReplicationSuccess(zo = p2z(0.0025), power = 0.8, level = 0.025,
                             type = "golden", designPrior = "predictive")

Computes the required relative sample size to achieve significance based on power

Description

The relative sample size to achieve significance of the replication study is computed based on the z-value of the original study, the significance level and the power.

Usage

sampleSizeSignificance(
  zo,
  power = NA,
  level = 0.025,
  alternative = c("one.sided", "two.sided"),
  designPrior = c("conditional", "predictive", "EB"),
  h = 0,
  shrinkage = 0
)
sampleSizeSignificance(
  zo,
  power = NA,
  level = 0.025,
  alternative = c("one.sided", "two.sided"),
  designPrior = c("conditional", "predictive", "EB"),
  h = 0,
  shrinkage = 0
)

Arguments

`zo`	A vector of z-values from original studies.
`power`	The power to achieve replication success.
`level`	Significance level. Default is 0.025.
`alternative`	Either "one.sided" (default) or "two.sided". Specifies if the significance level is one-sided or two-sided. If the significance level is one-sided, then sample size calculations are based on a one-sided assessment of significance in the direction of the original effect estimate.
`designPrior`	Is only taken into account when `power` is specified. Either "conditional" (default), "predictive", or "EB". If "EB", the power is computed under a predictive distribution where the contribution of the original study is shrunken towards zero based on the evidence in the original study (with an empirical Bayes shrinkage estimator).
`h`	Is only taken into account when `power` is specified and `designPrior` is "predictive" or "EB". The relative between-study heterogeneity, i.e., the ratio of the heterogeneity variance to the variance of the original effect estimate. Default is 0 (no heterogeneity).
`shrinkage`	Is only taken into account when `power` is specified. A number in [0,1) with default 0. Specifies the shrinkage of the original effect towards zero (e.g., `shrinkage = 0.25` implies shrinkage by a factor of 25%). Is only taken into account when `designPrior` is "conditional" or "predictive".

Details

sampleSizeSignificance is the vectorized version of .sampleSizeSignificance_. Vectorize is used to vectorize the function.

Value

The relative sample size to achieve significance in the specified direction. If impossible to achieve the desired power for specified inputs NaN is returned.

Author(s)

Leonhard Held, Samuel Pawel, Charlotte Micheloud, Florian Gerber

References

Pawel, S., Held, L. (2020). Probabilistic forecasting of replication studies. PLOS ONE. 15, e0231416. doi:10.1371/journal.pone.0231416

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Held, L. (2022). Power Calculations for Replication Studies. Statistical Science. 37:369-379. doi:10.1214/21-STS828

Examples

sampleSizeSignificance(zo = p2z(0.005), power = 0.8)
sampleSizeSignificance(zo = p2z(0.005, alternative = "two.sided"), power = 0.8)
sampleSizeSignificance(zo = p2z(0.005), power = 0.8, designPrior = "predictive")

sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "predictive",
                       shrinkage = 0.5, h = 0.25)
sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "EB",  h = 0.5)

# sample size to achieve  0.8 power as function of original p-value
zo <- p2z(seq(0.0001, 0.05, 0.0001))
oldPar <- par(mfrow = c(1,2))
plot(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "conditional", power = 0.8),
     type = "l", ylim = c(0.5, 10), log = "y", lwd = 1.5, ylab = "Relative sample size",
     xlab = expression(italic(p)[o]), las = 1)
lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "predictive", power = 0.8),
      lwd = 2, lty = 2)
lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "EB", power = 0.8),
      lwd = 1.5, lty = 3)
legend("topleft", legend = c("conditional", "predictive", "EB"),
       title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n")

par(oldPar)
sampleSizeSignificance(zo = p2z(0.005), power = 0.8)
sampleSizeSignificance(zo = p2z(0.005, alternative = "two.sided"), power = 0.8)
sampleSizeSignificance(zo = p2z(0.005), power = 0.8, designPrior = "predictive")

sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "predictive",
                       shrinkage = 0.5, h = 0.25)
sampleSizeSignificance(zo = 3, power = 0.8, designPrior = "EB",  h = 0.5)

# sample size to achieve  0.8 power as function of original p-value
zo <- p2z(seq(0.0001, 0.05, 0.0001))
oldPar <- par(mfrow = c(1,2))
plot(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "conditional", power = 0.8),
     type = "l", ylim = c(0.5, 10), log = "y", lwd = 1.5, ylab = "Relative sample size",
     xlab = expression(italic(p)[o]), las = 1)
lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "predictive", power = 0.8),
      lwd = 2, lty = 2)
lines(z2p(zo), sampleSizeSignificance(zo = zo, designPrior = "EB", power = 0.8),
      lwd = 1.5, lty = 3)
legend("topleft", legend = c("conditional", "predictive", "EB"),
       title = "Design prior", lty = c(1, 2, 3), lwd = 1.5, bty = "n")

par(oldPar)

Data from the Social Sciences Replication Project

Description

Data from the Social Sciences Replication Project (SSRP) including the details of the interim analysis. The variables are as follows:

study: Study identifier, usually names of authors from original study
ro: Effect estimate of original study on correlation scale
ri: Effect estimate of replication study at the interim analysis on correlation scale
rr: Effect estimate of replication study at the final analysis on correlation scale
fiso: Effect estimate of original study transformed to Fisher-z scale
fisi: Effect estimate of replication study at the interim analysis transformed to Fisher-z scale
fisr: Effect estimate of replication study at the final analysis transformed to Fisher-z scale
se_fiso: Standard error of Fisher-z transformed effect estimate of original study
se_fisi: Standard error of Fisher-z transformed effect estimate of replication study at the interim analysis
se_fisr: Standard error of Fisher-z transformed effect estimate of replication study at the final analysis
no: Sample size in original study
ni: Sample size in replication study at the interim analysis
nr: Sample size in replication study at the final analysis
po: Two-sided p-value from significance test of effect estimate from original study
pi: Two-sided p-value from significance test of effect estimate from replication study at the interim analysis
pr: Two-sided p-value from significance test of effect estimate from replication study at the final analysis
n75: Sample size calculated to have 90% power in replication study to detect 75% of the original effect size (expressed as the correlation coefficient r)
n50: Sample size calculated to have 90% power in replication study to detect 50% of the original effect size (expressed as the correlation coefficient r)

Usage

data(SSRP)
data(SSRP)

Format

A data frame with 21 rows and 18 variables

Details

Two-sided p-values were calculated assuming normality of Fisher-z transformed effect estimates.A two-stage procedure was used for the replications. In stage 1, the authors had 90% power to detect 75% of the original effect size at the 5% significance level in a two-sided test. If the original result replicated in stage 1 (two-sided P-value < 0.05 and effect in the same direction as in the original study), the data collection was stopped. If not, a second data collection was carried out in stage 2 to have 90% power to detect 50% of the original effect size for the first and the second data collections pooled. n75 and n50 are the planned sample sizes calculated to reach 90% power in stage 1 and 2, respectively. They sometimes differ from the sample sizes that were actually collected (ni and nr, respectively). See supplementary information of Camerer et al. (2018) for details.

Source

https://osf.io/abu7k

References

Examples

# plot of the sample sizes
plot(ni ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400),
     xlab = expression(n[o]), ylab = expression(n[i]))
abline(a = 0, b = 1, col = "grey")


plot(nr ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400),
     xlab = expression(n[o]), ylab = expression(n[r]))
abline(a = 0, b = 1, col = "grey")


# plot of the sample sizes
plot(ni ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400),
     xlab = expression(n[o]), ylab = expression(n[i]))
abline(a = 0, b = 1, col = "grey")


plot(nr ~ no, data = SSRP, ylim = c(0, 2500), xlim = c(0, 400),
     xlab = expression(n[o]), ylab = expression(n[r]))
abline(a = 0, b = 1, col = "grey")

Compute overall type-I error rate of the sceptical p-value

Description

The overall type-I error rate of the sceptical p-value is computed for a specified level, the relative variance, and the alternative hypothesis.

Usage

T1EpSceptical(
  level,
  c,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)
T1EpSceptical(
  level,
  c,
  alternative = c("one.sided", "two.sided"),
  type = c("golden", "nominal", "controlled")
)

Arguments

`level`	Threshold for the calibrated sceptical p-value. Default is 0.025.
`c`	Numeric vector of variance ratios of the original and replication effect estimates. This is usually the ratio of the sample size of the replication study to the sample size of the original study.
`alternative`	Specifies if `level` is "two.sided" or "one.sided".
`type`	Type of recalibration. Recalibration type can be either "golden" (default), "nominal" (no recalibration), or "controlled".

Details

T1EpSceptical is the vectorized version of the internal function .T1EpSceptical_. Vectorize is used to vectorize the function.

Value

The overall type-I error rate.

Author(s)

Leonhard Held, Samuel Pawel

References

Held, L. (2020). The harmonic mean chi-squared test to substantiate scientific findings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 69, 697-708. doi:10.1111/rssc.12410

Held, L., Micheloud, C., Pawel, S. (2022). The assessment of replication success based on relative effect size. The Annals of Applied Statistics. 16:706-720. doi:10.1214/21-AOAS1502

Micheloud, C., Balabdaoui, F., Held, L. (2023). Assessing replicability with the sceptical p-value: Type-I error control and sample size planning. Statistica Neerlandica. doi:10.1111/stan.12312

Examples

## compare type-I error rate for different recalibration types
types <- c("nominal", "golden", "controlled")
c <- seq(0.2, 5, by = 0.05)
t1 <- sapply(X = types, FUN = function(t) {
  T1EpSceptical(type = t, c = c, alternative = "one.sided", level = 0.025)
})
matplot(
  x = c, y = t1*100, type = "l", lty = 1, lwd = 2, las = 1, log = "x",
  xlab = bquote(italic(c)), ylab = "Type-I error (%)",
  xlim = c(0.2, 5)
)
legend("topright", legend = types, lty = 1, lwd = 2, col = seq_along(types))

## compare type-I error rate for different recalibration types
types <- c("nominal", "golden", "controlled")
c <- seq(0.2, 5, by = 0.05)
t1 <- sapply(X = types, FUN = function(t) {
  T1EpSceptical(type = t, c = c, alternative = "one.sided", level = 0.025)
})
matplot(
  x = c, y = t1*100, type = "l", lty = 1, lwd = 2, las = 1, log = "x",
  xlab = bquote(italic(c)), ylab = "Type-I error (%)",
  xlim = c(0.2, 5)
)
legend("topright", legend = types, lty = 1, lwd = 2, col = seq_along(types))

Computes the p-value threshold for intrinsic credibility

Description

Computes the p-value threshold for intrinsic credibility

Usage

thresholdIntrinsic(
  alpha,
  alternative = c("two.sided", "one.sided"),
  type = c("Held", "Matthews")
)
thresholdIntrinsic(
  alpha,
  alternative = c("two.sided", "one.sided"),
  type = c("Held", "Matthews")
)

Arguments

`alpha`	Numeric vector of intrinsic credibility levels.
`alternative`	Either "two.sided" (default) or "one.sided". Specifies if the threshold is for one-sided or two-sided p-values.
`type`	Either "Held" (default) or "Matthews". Type of intrinsic p-value threshold, see Held (2019) and Matthews (2018) for more information.

Value

The threshold for intrinsic credibility.

Author(s)

Leonhard Held

References

Matthews, R. A. J. (2018). Beyond 'significance': principles and practice of the analysis of credibility. Royal Society Open Science, 5, 171047. doi:10.1098/rsos.171047

Held, L. (2019). The assessment of intrinsic credibility and a new argument for p < 0.005. Royal Society Open Science, 6, 181534. doi:10.1098/rsos.181534

Examples

thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05))
thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05), alternative = "one.sided")
thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05))
thresholdIntrinsic(alpha = c(0.005, 0.01, 0.05), alternative = "one.sided")

Package 'ReplicationSuccess'

Help Index

Convert between estimates, z-values, p-values, and confidence intervals

Description

Usage

Arguments

Details

Value

Examples

Computes the minimum relative effect size to achieve replication success with the sceptical p-value

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Computes the minimum relative effect size to achieve significance of the replication study

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

harmonic mean chi-squared test

Description

Usage

Arguments

Value

Author(s)

References

Examples

Computes the replication success level

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Computes Box's tail probability

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Computes Edgington's p-value

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Computes the p-value for intrinsic credibility

Description

Usage

Arguments

Value

Author(s)

References

Examples

Computes the power for replication success with Edgington's method

Description

Usage

Arguments

Details

Value

Author(s)

References