Package 'depower'

Title: Power Analysis for Differential Expression Studies
Description: Provides a convenient framework to simulate, test, power, and visualize data for differential expression studies with lognormal or negative binomial outcomes. Supported designs are two-sample comparisons of independent or dependent outcomes. Power may be summarized in the context of controlling the per-family error rate or family-wise error rate. Negative binomial methods are described in Yu, Fernandez, and Brock (2017) <doi:10.1186/s12859-017-1648-2> and Yu, Fernandez, and Brock (2020) <doi:10.1186/s12859-020-3541-7>.
Authors: Brett Klamer [aut, cre] , Lianbo Yu [aut]
Maintainer: Brett Klamer <[email protected]>
License: MIT + file LICENSE
Version: 2024.12.4.999
Built: 2024-12-09 20:25:23 UTC
Source: https://bitbucket.org/bklamer/depower

Help Index


GLM for NB ratio of means

Description

Generalized linear model for two independent negative binomial outcomes.

Usage

glm_nb(data, equal_dispersion = FALSE, test = "wald", ci_level = NULL, ...)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from group 1 and the second element is the vector of negative binomial values from group 2. NAs are silently excluded. The default output from sim_nb().

equal_dispersion

(Scalar logical: FALSE)
If TRUE, the model is fit assuming both groups have the same population dispersion parameter. If FALSE (default), the model is fit assuming different dispersions.

test

(String: "wald"; ⁠"c("wald", "lrt")⁠)
The statistical method used for the test results. test = "wald" performs a Wald test and optionally the Wald confidence intervals. test = "lrt" performs a likelihood ratio test and optionally the profile likelihood confidence intervals.

ci_level

(Scalar numeric: NULL; ⁠(0, 1)⁠)
If NULL, confidence intervals are set as NA. If in ⁠(0, 1)⁠, confidence intervals are calculated at the specified level. Profile likelihood intervals are computationally intensive, so intervals from test = "lrt" may be slow.

...

Optional arguments passed to glmmTMB::glmmTMB().

Details

Uses glmmTMB::glmmTMB() in the form

glmmTMB(
  formula = value ~ condition,
  data = data,
  dispformula = ~ condition,
  family = nbinom2
)

to model independent negative binomial outcomes X1NB(μ,θ1)X_1 \sim \text{NB}(\mu, \theta_1) and X2NB(rμ,θ2)X_2 \sim \text{NB}(r\mu, \theta_2) where μ\mu is the mean of group 1, rr is the ratio of the means of group 2 with respect to group 1, θ1\theta_1 is the dispersion parameter of group 1, and θ2\theta_2 is the dispersion parameter of group 2.

The hypotheses for the LRT and Wald test of rr are

Hnull:log(r)=0Halt:log(r)0\begin{aligned} H_{null} &: log(r) = 0 \\ H_{alt} &: log(r) \neq 0 \end{aligned}

where r=Xˉ2Xˉ1r = \frac{\bar{X}_2}{\bar{X}_1} is the population ratio of arithmetic means for group 2 with respect to group 1 and log(rnull)=0log(r_{null}) = 0 assumes the population means are identical.

Value

A list with the following elements:

Slot Subslot Name Description
1 chisq Asymptotic χ2\chi^2 test statistic for the ratio of means.
2 df Degrees of freedom.
3 p p-value.
4 ratio Estimated ratio of means (group 2 / group 1).
4 1 estimate Point estimate.
4 2 lower Confidence interval lower bound.
4 3 upper Confidence interval upper bound.
5 mean1 Estimated mean of group 1.
5 1 estimate Point estimate.
5 2 lower Confidence interval lower bound.
5 3 upper Confidence interval upper bound.
6 mean2 Estimated mean of group 2.
6 1 estimate Point estimate.
6 2 lower Confidence interval lower bound.
6 3 upper Confidence interval upper bound.
7 dispersion1 Estimated dispersion of group 1.
7 1 estimate Point estimate.
7 2 lower Confidence interval lower bound.
7 3 upper Confidence interval upper bound.
8 dispersion2 Estimated dispersion of group 2.
8 1 estimate Point estimate.
8 2 lower Confidence interval lower bound.
8 3 upper Confidence interval upper bound.
9 n1 Sample size of group 1.
10 n2 Sample size of group 2.
11 method Method used for the results.
12 test Type of hypothesis test.
13 alternative The alternative hypothesis.
14 equal_dispersion Whether or not equal dispersions were assumed.
15 ci_level Confidence level of the intervals.
16 hessian Information about the Hessian matrix.
17 convergence Information about convergence.

References

Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.

Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.

See Also

glmmTMB::glmmTMB()

Examples

#----------------------------------------------------------------------------
# glm_nb() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
d <- sim_nb(
  n1 = 60,
  n2 = 40,
  mean1 = 10,
  ratio = 1.5,
  dispersion1 = 2,
  dispersion2 = 8
)

lrt <- glm_nb(d, equal_dispersion = FALSE, test = "lrt", ci_level = 0.95)
lrt

wald <- glm_nb(d, equal_dispersion = FALSE, test = "wald", ci_level = 0.95)
wald

#----------------------------------------------------------------------------
# Compare results to manual calculation of chi-square statistic
#----------------------------------------------------------------------------
# Use the same data, but as a data frame instead of list
set.seed(1234)
d <- sim_nb(
  n1 = 60,
  n2 = 40,
  mean1 = 10,
  ratio = 1.5,
  dispersion1 = 2,
  dispersion2 = 8,
  return_type = "data.frame"
)

mod_alt <- glmmTMB::glmmTMB(
  formula = value ~ condition,
  data = d,
  dispformula = ~ condition,
  family = glmmTMB::nbinom2,
)
mod_null <- glmmTMB::glmmTMB(
  formula = value ~ 1,
  data = d,
  dispformula = ~ condition,
  family = glmmTMB::nbinom2,
)

lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt)))
lrt_chisq
wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2
wald_chisq

anova(mod_null, mod_alt)

GLMM for BNB ratio of means

Description

Generalized linear mixed model for bivariate negative binomial outcomes.

Usage

glmm_bnb(data, test = "wald", ci_level = NULL, ...)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from sample 1 and the second element is the vector of negative binomial values from sample 2. Each vector must be sorted by the subject/item index and must be the same sample size. NAs are silently excluded. The default output from sim_bnb().

test

(String: "wald"; ⁠"c("wald", "lrt")⁠)
The statistical method used for the test results. test = "wald" performs a Wald test and optionally the Wald confidence intervals. test = "lrt" performs a likelihood ratio test and optionally the profile likelihood confidence intervals (means and ratio). The Wald interval is always used for the dispersion and standard deviation of the item (subject) random intercept.

ci_level

(Scalar numeric: NULL; ⁠(0, 1)⁠)
If NULL, confidence intervals are set as NA. If in ⁠(0, 1)⁠, confidence intervals are calculated at the specified level. Profile likelihood intervals are computationally intensive, so intervals from test = "lrt" may be slow.

...

Optional arguments passed to glmmTMB::glmmTMB().

Details

Uses glmmTMB::glmmTMB() in the form

glmmTMB(
  formula = value ~ condition + (1 | item),
  data = data,
  dispformula = ~ 1,
  family = nbinom2
)

to model dependent negative binomial outcomes X1,X2BNB(μ,r,θ)X_1, X_2 \sim \text{BNB}(\mu, r, \theta) where μ\mu is the mean of sample 1, rr is the ratio of the means of sample 2 with respect to sample 1, and θ\theta is the dispersion parameter.

The hypotheses for the LRT and Wald test of rr are

Hnull:log(r)=0Halt:log(r)0\begin{aligned} H_{null} &: log(r) = 0 \\ H_{alt} &: log(r) \neq 0 \end{aligned}

where r=Xˉ2Xˉ1r = \frac{\bar{X}_2}{\bar{X}_1} is the population ratio of arithmetic means for sample 2 with respect to sample 1 and log(rnull)=0log(r_{null}) = 0 assumes the population means are identical.

When simulating data from sim_bnb(), the mean is a function of the item (subject) random effect which in turn is a function of the dispersion parameter. Thus, glmm_bnb() has biased mean and dispersion estimates. The bias increases as the dispersion parameter gets smaller and decreases as the dispersion parameter gets larger. However, estimates of the ratio and standard deviation of the random intercept tend to be accurate. The p-value for glmm_bnb() is generally overconservative compared to glmm_poisson(), wald_test_bnb() and lrt_bnb(). In summary, the negative binomial mixed-effects model fit by glmm_bnb() is not recommended for the BNB data simulated by sim_bnb(). Instead, wald_test_bnb() or lrt_bnb() should typically be used instead.

Value

A list with the following elements:

Slot Subslot Name Description
1 chisq Asymptotic χ2\chi^2 test statistic for the ratio of means.
2 df Degrees of freedom.
3 p p-value.
4 ratio Estimated ratio of means (sample 2 / sample 1).
4 1 estimate Point estimate.
4 2 lower Confidence interval lower bound.
4 3 upper Confidence interval upper bound.
5 mean1 Estimated mean of sample 1.
5 1 estimate Point estimate.
5 2 lower Confidence interval lower bound.
5 3 upper Confidence interval upper bound.
6 mean2 Estimated mean of sample 2.
6 1 estimate Point estimate.
6 2 lower Confidence interval lower bound.
6 3 upper Confidence interval upper bound.
7 dispersion Estimated dispersion.
7 1 estimate Point estimate.
7 2 lower Confidence interval lower bound.
7 3 upper Confidence interval upper bound.
8 item_sd Estimated standard deviation of the item (subject) random intercept.
8 1 estimate Point estimate.
8 2 lower Confidence interval lower bound.
8 3 upper Confidence interval upper bound.
9 n1 Sample size of sample 1.
10 n2 Sample size of sample 2.
11 method Method used for the results.
12 test Type of hypothesis test.
13 alternative The alternative hypothesis.
14 ci_level Confidence level of the interval.
15 hessian Information about the Hessian matrix.
16 convergence Information about convergence.

References

Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.

Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.

See Also

glmmTMB::glmmTMB()

Examples

#----------------------------------------------------------------------------
# glmm_bnb() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
d <- sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2
)

lrt <- glmm_bnb(d, test = "lrt")
lrt

wald <- glmm_bnb(d, test = "wald", ci_level = 0.95)
wald

#----------------------------------------------------------------------------
# Compare results to manual calculation of chi-square statistic
#----------------------------------------------------------------------------
# Use the same data, but as a data frame instead of list
set.seed(1234)
d <- sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2,
  return_type = "data.frame"
)

mod_alt <- glmmTMB::glmmTMB(
  formula = value ~ condition + (1 | item),
  data = d,
  dispformula = ~ 1,
  family = glmmTMB::nbinom2,
)
mod_null <- glmmTMB::glmmTMB(
  formula = value ~ 1 + (1 | item),
  data = d,
  dispformula = ~ 1,
  family = glmmTMB::nbinom2,
)

lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt)))
lrt_chisq
wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2
wald_chisq

anova(mod_null, mod_alt)

GLMM for Poisson ratio of means

Description

Generalized linear mixed model for two dependent Poisson outcomes.

Usage

glmm_poisson(data, test = "wald", ci_level = NULL, ...)

Arguments

data

(list)
A list whose first element is the vector of Poisson values from sample 1 and the second element is the vector of Poisson values from sample 2. Each vector must be sorted by the subject/item index and must be the same sample size. NAs are silently excluded. The default output from sim_bnb().

test

(String: "wald"; ⁠"c("wald", "lrt")⁠)
The statistical method used for the test results. test = "wald" performs a Wald test and optionally the Wald confidence intervals. test = "lrt" performs a likelihood ratio test and optionally the profile likelihood confidence intervals (means and ratio). The Wald interval is always used for the standard deviation of the item (subject) random intercept.

ci_level

(Scalar numeric: NULL; ⁠(0, 1)⁠)
If NULL, confidence intervals are set as NA. If in ⁠(0, 1)⁠, confidence intervals are calculated at the specified level. Profile likelihood intervals are computationally intensive, so intervals from test = "lrt" may be slow.

...

Optional arguments passed to glmmTMB::glmmTMB().

Details

Uses glmmTMB::glmmTMB() in the form

glmmTMB(
  formula = value ~ condition + (1 | item),
  data = data,
  family = stats::poisson
)

to model dependent Poisson outcomes X1Poisson(μ)X_1 \sim \text{Poisson}(\mu) and X2Poisson(rμ)X_2 \sim \text{Poisson}(r \mu) where μ\mu is the mean of sample 1 and rr is the ratio of the means of sample 2 with respect to sample 1.

The hypotheses for the LRT and Wald test of rr are

Hnull:log(r)=0Halt:log(r)0\begin{aligned} H_{null} &: log(r) = 0 \\ H_{alt} &: log(r) \neq 0 \end{aligned}

where r=Xˉ2Xˉ1r = \frac{\bar{X}_2}{\bar{X}_1} is the population ratio of arithmetic means for sample 2 with respect to sample 1 and log(rnull)=0log(r_{null}) = 0 assumes the population means are identical.

When simulating data from sim_bnb(), the mean is a function of the item (subject) random effect which in turn is a function of the dispersion parameter. Thus, glmm_poisson() has biased mean estimates. The bias increases as the dispersion parameter gets smaller and decreases as the dispersion parameter gets larger. However, estimates of the ratio and standard deviation of the random intercept tend to be accurate. In summary, the Poisson mixed-effects model fit by glmm_poisson() is not recommended for the BNB data simulated by sim_bnb(). Instead, wald_test_bnb() or lrt_bnb() should typically be used instead.

Value

A list with the following elements:

Slot Subslot Name Description
1 chisq Asymptotic χ2\chi^2 test statistic for the ratio of means.
2 df Degrees of freedom.
3 p p-value.
4 ratio Estimated ratio of means (sample 2 / sample 1).
4 1 estimate Point estimate.
4 2 lower Confidence interval lower bound.
4 3 upper Confidence interval upper bound.
5 mean1 Estimated mean of sample 1.
5 1 estimate Point estimate.
5 2 lower Confidence interval lower bound.
5 3 upper Confidence interval upper bound.
6 mean2 Estimated mean of sample 2.
6 1 estimate Point estimate.
6 2 lower Confidence interval lower bound.
6 3 upper Confidence interval upper bound.
7 item_sd Estimated standard deviation of the item (subject) random intercept.
7 1 estimate Point estimate.
7 2 lower Confidence interval lower bound.
7 3 upper Confidence interval upper bound.
8 n1 Sample size of sample 1.
9 n2 Sample size of sample 2.
10 method Method used for the results.
11 test Type of hypothesis test.
12 alternative The alternative hypothesis.
13 ci_level Confidence level of the interval.
14 hessian Information about the Hessian matrix.
15 convergence Information about convergence.

References

Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.

Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.

See Also

glmmTMB::glmmTMB()

Examples

#----------------------------------------------------------------------------
# glmm_poisson() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
d <- sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2
)

lrt <- glmm_poisson(d, test = "lrt")
lrt

wald <- glmm_poisson(d, test = "wald", ci_level = 0.95)
wald

#----------------------------------------------------------------------------
# Compare results to manual calculation of chi-square statistic
#----------------------------------------------------------------------------
# Use the same data, but as a data frame instead of list
set.seed(1234)
d <- sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2,
  return_type = "data.frame"
)

mod_alt <- glmmTMB::glmmTMB(
  formula = value ~ condition + (1 | item),
  data = d,
  family = stats::poisson,
)
mod_null <- glmmTMB::glmmTMB(
  formula = value ~ 1 + (1 | item),
  data = d,
  family = stats::poisson,
)

lrt_chisq <- as.numeric(-2 * (logLik(mod_null) - logLik(mod_alt)))
lrt_chisq
wald_chisq <- summary(mod_alt)$coefficients$cond["condition2", "z value"]^2
wald_chisq

anova(mod_null, mod_alt)

Likelihood ratio test for BNB ratio of means

Description

Likelihood ratio test for the ratio of means from bivariate negative binomial outcomes.

Usage

lrt_bnb(data, ratio_null = 1, ...)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from sample 1 and the second element is the vector of negative binomial values from sample 2. Each vector must be sorted by the subject/item index and must be the same sample size. NAs are silently excluded. The default output from sim_bnb().

ratio_null

(Scalar numeric: 1; ⁠(0, Inf)⁠)
The ratio of means assumed under the null hypothesis (sample 2 / sample 1). Typically, ratio_null = 1 (no difference). See 'Details' for additional information.

...

Optional arguments passed to the MLE function mle_bnb().

Details

This function is primarily designed for speed in simulation. Missing values are silently excluded.

Suppose X1G=gPoisson(μg)X_1 \mid G = g \sim \text{Poisson}(\mu g) and X2G=gPoisson(rμg)X_2 \mid G = g \sim \text{Poisson}(r \mu g) where GGamma(θ,θ1)G \sim \text{Gamma}(\theta, \theta^{-1}) is the random item (subject) effect. Then X1,X2BNB(μ,r,θ)X_1, X_2 \sim \text{BNB}(\mu, r, \theta) is the joint distribution where X1X_1 and X2X_2 are dependent (though conditionally independent), X1X_1 is the count outcome for sample 1 of the items (subjects), X2X_2 is the count outcome for sample 2 of the items (subjects), μ\mu is the conditional mean of sample 1, rr is the ratio of the conditional means of sample 2 with respect to sample 1, and θ\theta is the gamma distribution shape parameter which controls the dispersion and the correlation between sample 1 and 2.

The hypotheses for the LRT of rr are

Hnull:r=rnullHalt:rrnull\begin{aligned} H_{null} &: r = r_{null} \\ H_{alt} &: r \neq r_{null} \end{aligned}

where r=Xˉ2Xˉ1r = \frac{\bar{X}_2}{\bar{X}_1} is the population ratio of arithmetic means for sample 2 with respect to sample 1 and rnullr_{null} is a constant for the assumed null population ratio of means (typically rnull=1r_{null} = 1).

The LRT statistic is

λ=2lnsupΘnullL(r,μ,θ)supΘL(r,μ,θ)=2[lnsupΘnullL(r,μ,θ)lnsupΘL(r,μ,θ)]=2(l(rnull,μ~,θ~)l(r^,μ^,θ^))\begin{aligned} \lambda &= -2 \ln \frac{\text{sup}_{\Theta_{null}} L(r, \mu, \theta)}{\text{sup}_{\Theta} L(r, \mu, \theta)} \\ &= -2 \left[ \ln \text{sup}_{\Theta_{null}} L(r, \mu, \theta) - \ln \text{sup}_{\Theta} L(r, \mu, \theta) \right] \\ &= -2(l(r_{null}, \tilde{\mu}, \tilde{\theta}) - l(\hat{r}, \hat{\mu}, \hat{\theta})) \end{aligned}

Under HnullH_{null}, the LRT test statistic is asymptotically distributed as χ12\chi^2_1. The approximate level α\alpha test rejects HnullH_{null} if λχ12(1α)\lambda \geq \chi^2_1(1 - \alpha). Note that the asymptotic critical value is known to underestimate the exact critical value. Hence, the nominal significance level may not be achieved for small sample sizes (possibly n10n \leq 10 or n50n \leq 50).

Value

A list with the following elements:

Slot Subslot Name Description
1 chisq Asymptotic χ2\chi^2 test statistic for the ratio of means.
2 df Degrees of freedom.
3 p p-value.
4 ratio Estimated ratio of means (sample 2 / sample 1).
5 alternative Point estimates under the alternative hypothesis.
5 1 mean1 Estimated mean of sample 1.
5 2 mean2 Estimated mean of sample 2.
5 3 dispersion Estimated dispersion.
6 null Point estimates under the null hypothesis.
6 1 mean1 Estimated mean of sample 1.
6 2 mean2 Estimated mean of sample 2.
6 3 dispersion Estimated dispersion.
7 n1 The sample size of sample 1.
8 n2 The sample size of sample 2.
9 method Method used for the results.
10 ratio_null Assumed population ratio of means.
11 mle_code Integer indicating why the optimization process terminated.
12 mle_message Information from the optimizer.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

Examples

#----------------------------------------------------------------------------
# lrt_bnb() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2
) |>
  lrt_bnb()

Likelihood ratio test for NB ratio of means

Description

Likelihood ratio test for the ratio of means from two independent negative binomial outcomes.

Usage

lrt_nb(data, equal_dispersion = FALSE, ratio_null = 1, ...)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from group 1 and the second element is the vector of negative binomial values from group 2. NAs are silently excluded. The default output from sim_nb().

equal_dispersion

(Scalar logical: FALSE)
If TRUE, the LRT is calculated assuming both groups have the same population dispersion parameter. If FALSE (default), the LRT is calculated assuming different dispersions.

ratio_null

(Scalar numeric: 1; ⁠(0, Inf)⁠)
The ratio of means assumed under the null hypothesis (group 2 / group 1). Typically ratio_null = 1 (no difference). See 'Details' for additional information.

...

Optional arguments passed to the MLE function mle_nb().

Details

This function is primarily designed for speed in simulation. Missing values are silently excluded.

Suppose X1NB(μ,θ1)X_1 \sim NB(\mu, \theta_1) and X2NB(rμ,θ2)X_2 \sim NB(r\mu, \theta_2) where X1X_1 and X2X_2 are independent, X1X_1 is the count outcome for items in group 1, X2X_2 is the count outcome for items in group 2, μ\mu is the arithmetic mean count in group 1, rr is the ratio of arithmetic means for group 2 with respect to group 1, θ1\theta_1 is the dispersion parameter of group 1, and θ2\theta_2 is the dispersion parameter of group 2.

The hypotheses for the LRT of rr are

Hnull:r=rnullHalt:rrnull\begin{aligned} H_{null} &: r = r_{null} \\ H_{alt} &: r \neq r_{null} \end{aligned}

where r=Xˉ2Xˉ1r = \frac{\bar{X}_2}{\bar{X}_1} is the population ratio of arithmetic means for group 2 with respect to group 1 and rnullr_{null} is a constant for the assumed null population ratio of means (typically rnull=1r_{null} = 1).

The LRT statistic is

λ=2lnsupΘnullL(r,μ,θ1,θ2)supΘL(r,μ,θ1,θ2)=2[lnsupΘnullL(r,μ,θ1,θ2)lnsupΘL(r,μ,θ1,θ2)]=2(l(rnull,μ~,θ~1,θ~2)l(r^,μ^,θ^1,θ^2))\begin{aligned} \lambda &= -2 \ln \frac{\text{sup}_{\Theta_{null}} L(r, \mu, \theta_1, \theta_2)}{\text{sup}_{\Theta} L(r, \mu, \theta_1, \theta_2)} \\ &= -2 \left[ \ln \text{sup}_{\Theta_{null}} L(r, \mu, \theta_1, \theta_2) - \ln \text{sup}_{\Theta} L(r, \mu, \theta_1, \theta_2) \right] \\ &= -2(l(r_{null}, \tilde{\mu}, \tilde{\theta}_1, \tilde{\theta}_2) - l(\hat{r}, \hat{\mu}, \hat{\theta}_1, \hat{\theta}_2)) \end{aligned}

Under HnullH_{null}, the LRT test statistic is asymptotically distributed as χ12\chi^2_1. The approximate level α\alpha test rejects HnullH_{null} if λχ12(1α)\lambda \geq \chi^2_1(1 - \alpha). Note that the asymptotic critical value is known to underestimate the exact critical value. Hence, the nominal significance level may not be achieved for small sample sizes (possibly n10n \leq 10 or n50n \leq 50).

Value

A list with the following elements:

Slot Subslot Name Description
1 chisq Asymptotic χ2\chi^2 test statistic for the ratio of means.
2 df Degrees of freedom.
3 p p-value.
4 ratio Estimated ratio of means (group 2 / group 1).
5 alternative Point estimates under the alternative hypothesis.
5 1 mean1 Estimated mean of group 1.
5 2 mean2 Estimated mean of group 2.
5 3 dispersion1 Estimated dispersion of group 1.
5 4 dispersion2 Estimated dispersion of group 2.
6 null Point estimates under the null hypothesis.
6 1 mean1 Estimated mean of group 1.
6 2 mean2 Estimated mean of group 2.
6 3 dispersion1 Estimated dispersion of group 1.
6 4 dispersion2 Estimated dispersion of group 2.
7 n1 Sample size of group 1.
8 n2 Sample size of group 2.
9 method Method used for the results.
10 equal_dispersion Whether or not equal dispersions were assumed.
11 ratio_null Assumed population ratio of means.
12 mle_code Integer indicating why the optimization process terminated.
13 mle_message Information from the optimizer.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

Examples

#----------------------------------------------------------------------------
# lrt_nb() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
sim_nb(
  n1 = 60,
  n2 = 40,
  mean1 = 10,
  ratio = 1.5,
  dispersion1 = 2,
  dispersion2 = 8
) |>
  lrt_nb()

MLE for BNB

Description

Maximum likelihood estimates (MLE) for bivariate negative binomial outcomes.

Usage

mle_bnb_null(data, ratio_null = 1, method = "nlm_constrained", ...)

mle_bnb_alt(data, method = "nlm_constrained", ...)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from sample 1 and the second element is the vector of negative binomial values from sample 2. Each vector must be sorted by the subject/item index and must be the same sample size. NAs are silently excluded. The default output from sim_bnb().

ratio_null

(Scalar numeric: 1; ⁠(0, Inf)⁠)
The ratio of means assumed under the null hypothesis (sample 2 / sample 1). Typically ratio_null = 1 (no difference).

method

(string: "nlm_constrained")
The optimization method. Must choose one of "nlm", "nlm_constrained", "optim", or "optim_constrained". The default bounds for constrained optimization are ⁠[1e-03, 1e06]⁠.

...

Optional arguments passed to the optimization method.

Details

These functions are primarily designed for speed in simulation. Missing values are silently excluded.

Suppose X1G=gPoisson(μg)X_1 \mid G = g \sim \text{Poisson}(\mu g) and X2G=gPoisson(rμg)X_2 \mid G = g \sim \text{Poisson}(r \mu g) where GGamma(θ,θ1)G \sim \text{Gamma}(\theta, \theta^{-1}) is the random item (subject) effect. Then X1,X2BNB(μ,r,θ)X_1, X_2 \sim \text{BNB}(\mu, r, \theta) is the joint distribution where X1X_1 and X2X_2 are dependent (though conditionally independent), X1X_1 is the count outcome for sample 1 of the items (subjects), X2X_2 is the count outcome for sample 2 of the items (subjects), μ\mu is the conditional mean of sample 1, rr is the ratio of the conditional means of sample 2 with respect to sample 1, and θ\theta is the gamma distribution shape parameter which controls the dispersion and the correlation between sample 1 and 2.

The MLEs of rr and μ\mu are r^=xˉ2xˉ1\hat{r} = \frac{\bar{x}_2}{\bar{x}_1} and μ^=xˉ1\hat{\mu} = \bar{x}_1. The MLE of θ\theta is found by maximizing the profile log-likelihood l(r^,μ^,θ)l(\hat{r}, \hat{\mu}, \theta) with respect to θ\theta. When r=rnullr = r_{null} is known, the MLE of μ\mu is μ~=xˉ1+xˉ21+rnull\tilde{\mu} = \frac{\bar{x}_1 + \bar{x}_2}{1 + r_{null}} and θ~\tilde{\theta} is obtained by maximizing the profile log-likelihood l(rnull,μ~,θ)l(r_{null}, \tilde{\mu}, \theta) with respect to θ\theta.

The backend method for numerical optimization is controlled by argument method which refers to stats::nlm(), stats::nlminb(), or stats::optim(). If you would like to see warnings from the optimizer, include argument warnings = TRUE.

Value

  • For mle_bnb_alt, a list with the following elements:

    Slot Name Description
    1 mean1 MLE for mean of sample 1.
    2 mean2 MLE for mean of sample 2.
    3 ratio MLE for ratio of means.
    4 dispersion MLE for BNB dispersion.
    5 nll Minimum of negative log-likelihood.
    6 nparams Number of estimated parameters.
    7 n1 Sample size of sample 1.
    8 n2 Sample size of sample 2.
    9 method Method used for the results.
    10 mle_method Method used for optimization.
    11 mle_code Integer indicating why the optimization process terminated.
    12 mle_message Additional information from the optimizer.
  • For mle_bnb_null, a list with the following elements:

    Slot Name Description
    1 mean1 MLE for mean of sample 1.
    2 mean2 MLE for mean of sample 2.
    3 ratio_null Population ratio of means assumed for null hypothesis. mean2 = mean1 * ratio_null.
    4 dispersion MLE for BNB dispersion.
    5 nll Minimum of negative log-likelihood.
    6 nparams Number of estimated parameters.
    7 n1 Sample size of sample 1.
    8 n2 Sample size of sample 2.
    9 method Method used for the results.
    10 mle_method Method used for optimization.
    11 mle_code Integer indicating why the optimization process terminated.
    12 mle_message Additional information from the optimizer.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

See Also

sim_bnb(), nll_bnb

Examples

#----------------------------------------------------------------------------
# mle_bnb() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
d <- sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2
)

mle_alt <- d |>
  mle_bnb_alt()

mle_null <- d |>
  mle_bnb_null()

mle_alt
mle_null

MLE for NB

Description

Maximum likelihood estimates (MLE) for two independent negative binomial outcomes.

Usage

mle_nb_null(
  data,
  equal_dispersion = FALSE,
  ratio_null = 1,
  method = "nlm_constrained",
  ...
)

mle_nb_alt(data, equal_dispersion = FALSE, method = "nlm_constrained", ...)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from group 1 and the second element is the vector of negative binomial values from group 2. NAs are silently excluded. The default output from sim_nb().

equal_dispersion

(Scalar logical: FALSE)
If TRUE, the MLEs are calculated assuming both groups have the same population dispersion parameter. If FALSE (default), the MLEs are calculated assuming different dispersions.

ratio_null

(Scalar numeric: 1; ⁠(0, Inf)⁠)
The ratio of means assumed under the null hypothesis (group 2 / group 1). Typically ratio_null = 1 (no difference).

method

(string: "nlm_constrained")
The optimization method. Must choose one of "nlm", "nlm_constrained", "optim", or "optim_constrained". The default bounds for constrained optimization are ⁠[1e-03, 1e06]⁠.

...

Optional arguments passed to the optimization method.

Details

These functions are primarily designed for speed in simulation. Missing values are silently excluded.

Suppose X1NB(μ,θ1)X_1 \sim \text{NB}(\mu, \theta_1) and X2NB(rμ,θ2)X_2 \sim \text{NB}(r\mu, \theta_2), where X1X_1 and X2X_2 are independent, X1X_1 is the count outcome for items in group 1, X2X_2 is the count outcome for items in group 2, μ\mu is the arithmetic mean count in group 1, rr is the ratio of arithmetic means for group 2 with respect to group 1, θ1\theta_1 is the dispersion parameter of group 1, and θ2\theta_2 is the dispersion parameter of group 2.

The MLEs of rr and μ\mu are r^=xˉ2xˉ1\hat{r} = \frac{\bar{x}_2}{\bar{x}_1} and μ^=xˉ1\hat{\mu} = \bar{x}_1. The MLEs of θ1\theta_1 and θ2\theta_2 are found by maximizing the profile log-likelihood l(r^,μ^,θ1,θ2)l(\hat{r}, \hat{\mu}, \theta_1, \theta_2) with respect to θ1\theta_1 and θ2\theta_2. When r=rnullr = r_{null} is known, the MLE of μ\mu is μ~=n1xˉ1+n2xˉ2n1+n2\tilde{\mu} = \frac{n_1 \bar{x}_1 + n_2 \bar{x}_2}{n_1 + n_2} and θ~1\tilde{\theta}_1 and θ~2\tilde{\theta}_2 are obtained by maximizing the profile log-likelihood l(rnull,μ~,θ1,θ2)l(r_{null}, \tilde{\mu}, \theta_1, \theta_2).

The backend method for numerical optimization is controlled by argument method which refers to stats::nlm(), stats::nlminb(), or stats::optim(). If you would like to see warnings from the optimizer, include argument warnings = TRUE.

Value

  • For mle_nb_alt(), a list with the following elements:

    Slot Name Description
    1 mean1 MLE for mean of group 1.
    2 mean2 MLE for mean of group 2.
    3 ratio MLE for ratio of means.
    4 dispersion1 MLE for dispersion of group 1.
    5 dispersion2 MLE for dispersion of group 2.
    6 equal_dispersion Were equal dispersions assumed.
    7 n1 Sample size of group 1.
    8 n2 Sample size of group 2.
    9 nll Minimum of negative log-likelihood.
    10 nparams Number of estimated parameters.
    11 method Method used for the results.
    12 mle_method Method used for optimization.
    13 mle_code Integer indicating why the optimization process terminated.
    14 mle_message Additional information from the optimizer.
  • For mle_nb_null(), a list with the following elements:

    Slot Name Description
    1 mean1 MLE for mean of group 1.
    2 mean2 MLE for mean of group 2.
    3 ratio_null Population ratio of means assumed for null hypothesis. mean2 = mean1 * ratio_null.
    4 dispersion1 MLE for dispersion of group 1.
    5 dispersion2 MLE for dispersion of group 2.
    6 equal_dispersion Were equal dispersions assumed.
    7 n1 Sample size of group 1.
    8 n2 Sample size of group 2.
    9 nll Minimum of negative log-likelihood.
    10 nparams Number of estimated parameters.
    11 method Method used for the results.
    12 mle_method Method used for optimization.
    13 mle_code Integer indicating why the optimization process terminated.
    14 mle_message Additional information from the optimizer.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

See Also

sim_nb(), nll_nb

Examples

#----------------------------------------------------------------------------
# mle_nb() examples
#----------------------------------------------------------------------------
library(depower)

d <- sim_nb(
  n1 = 60,
  n2 = 40,
  mean1 = 10,
  ratio = 1.5,
  dispersion1 = 2,
  dispersion2 = 8
)

mle_alt <- d |>
  mle_nb_alt()

mle_null <- d |>
  mle_nb_null()

mle_alt
mle_null

Negative log-likelihood for BNB

Description

The negative log-likelihood for bivariate negative binomial outcomes.

Usage

nll_bnb_null(param, value1, value2, ratio_null)

nll_bnb_alt(param, value1, value2)

Arguments

param

(numeric)
The vector of initial values for BNB parameters. Must be in the following order for each scenario:

  • Null: c(mean1, dispersion)

  • Alternative: c(mean1, mean2, dispersion)

for samples 1 and 2.

value1

(numeric)
The vector of BNB values from sample 1. Must be sorted by the subject/item index. Must not contain NAs.

value2

(numeric)
The vector of BNB values from sample 2. Must be sorted by the subject/item index. Must not contain NAs.

ratio_null

(Scalar numeric: ⁠(0, Inf)⁠)
The ratio of means assumed under the null hypothesis (sample 2 / sample 1). Typically ratio_null = 1 (no difference).

Details

These functions are primarily designed for speed in simulation. Arguments are not checked.

Suppose X1G=gPoisson(μg)X_1 \mid G = g \sim \text{Poisson}(\mu g) and X2G=gPoisson(rμg)X_2 \mid G = g \sim \text{Poisson}(r \mu g) where GGamma(θ,θ1)G \sim \text{Gamma}(\theta, \theta^{-1}) is the random item (subject) effect. Then X1,X2BNB(μ,r,θ)X_1, X_2 \sim \text{BNB}(\mu, r, \theta) is the joint distribution where X1X_1 and X2X_2 are dependent (though conditionally independent), X1X_1 is the count outcome for sample 1 of the items (subjects), X2X_2 is the count outcome for sample 2 of the items (subjects), μ\mu is the conditional mean of sample 1, rr is the ratio of the conditional means of sample 2 with respect to sample 1, and θ\theta is the gamma distribution shape parameter which controls the dispersion and the correlation between sample 1 and 2.

The likelihood is

L(r,μ,θX1,X2)=(θθΓ(θ))n×μx1i+x2ii=1nx1i!rx2ii=1nx2i!×i=1nΓ(x1i+x2i+θ)(μ+rμ+θ)x1i+x2i+θ\begin{aligned} L(r, \mu, \theta \mid X_1, X_2) = & \left( \frac{\theta^{\theta}}{\Gamma(\theta)} \right)^{n} \times \\ & \frac{\mu^{\sum{x_{1i}} + \sum{x_{2i}}}}{\prod_{i=1}^{n} x_{1i}!} \frac{r^{\sum{x_{2i}}}}{\prod_{i=1}^{n} x_{2i}!} \times \\ & \frac{\prod_{i = 1}^{n} \Gamma(x_{1i} + x_{2i} + \theta)}{(\mu + r \mu + \theta)^{\sum x_{1i} + x_{2i} + \theta}} \end{aligned}

and the parameter space is Θ={(r,μ,θ):r,μ,θ>0}\Theta = \left\{ (r, \mu, \theta) : r, \mu, \theta > 0 \right\}. The log-likelihood is

l(r,μ,θ)= n[θlnθlnΓ(θ)]+n(xˉ1+xˉ2)ln(μ)+nxˉ2lnr+i=1nlnΓ(x1i+x2i+θ)n(xˉ1+xˉ2+θ)ln(μ+rμ+θ)i=1nlnx1i!i=1nlnx2i!\begin{aligned} l(r, \mu, \theta) = \ & n \left[ \theta \ln \theta - \ln \Gamma(\theta) \right] + \\ & n (\bar{x}_1 + \bar{x}_2) \ln(\mu) + n \bar{x}_2 \ln r + \\ & \sum_{i=1}^{n}{\ln \Gamma(x_{1i} + x_{2i} + \theta)} - \\ & n (\bar{x}_1 + \bar{x}_2 + \theta) \ln(\mu + r\mu + \theta) - \\ & \sum_{i = 1}^{n}{\ln x_{1i}!} - \sum_{i = 1}^{n}{\ln x_{2i}!} \end{aligned}

Value

Scalar numeric negative log-likelihood.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

See Also

sim_nb(), stats::nlminb(), stats::nlm(), stats::optim()

Examples

#----------------------------------------------------------------------------
# nll_bnb*() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
d <- sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2
)

nll_bnb_alt(
  param = c(mean1 = 10, mean2 = 12, dispersion = 2),
  value1 = d[[1L]],
  value2 = d[[2L]]
)

nll_bnb_null(
  param = c(mean = 10, dispersion = 2),
  value1 = d[[1L]],
  value2 = d[[2L]],
  ratio_null = 1
)

Negative log-likelihood for NB

Description

The negative log-likelihood for two independent samples of negative binomial distributions.

Usage

nll_nb_null(param, value1, value2, equal_dispersion, ratio_null)

nll_nb_alt(param, value1, value2, equal_dispersion)

Arguments

param

(numeric)
The vector of initial values for NB parameters. Must be in the following order for each scenario:

  • Null and unequal dispersion: c(mean1, dispersion1, dispersion2)

  • Alternative and unequal dispersion: c(mean1, mean2, dispersion1, dispersion2)

  • Null and equal dispersion: c(mean1, dispersion)

  • Alternative and equal dispersion: c(mean1, mean2, dispersion)

for groups 1 and 2.

value1

(numeric)
The vector of NB values from group 1. Must not contain NAs.

value2

(numeric)
The vector of NB values from group 2. Must not contain NAs.

equal_dispersion

(Scalar logical)
If TRUE, the log-likelihood is calculated assuming both groups have the same population dispersion parameter. If FALSE (default), the log-likelihood is calculated assuming different dispersions.

ratio_null

(Scalar numeric: ⁠(0, Inf)⁠)
The ratio of means assumed under the null hypothesis (group 2 / group 1). Typically ratio_null = 1 (no difference).

Details

These functions are primarily designed for speed in simulation. Arguments are not checked.

Suppose X1NB(μ,θ1)X_1 \sim \text{NB}(\mu, \theta_1) and X2NB(rμ,θ2)X_2 \sim \text{NB}(r\mu, \theta_2) where X1X_1 and X2X_2 are independent, X1X_1 is the count outcome for items in group 1, X2X_2 is the count outcome for items in group 2, μ\mu is the arithmetic mean count in group 1, rr is the ratio of arithmetic means for group 2 with respect to group 1, θ1\theta_1 is the dispersion parameter of group 1, and θ2\theta_2 is the dispersion parameter of group 2.

Unequal dispersion parameters

When the dispersion parameters are not equal, the likelihood is

L(r,μ,θ1,θ2X1,X2)=(θ1θ1Γ(θ1))n1μx1i(μ+θ1)x1i+n1θ1×(θ2θ2Γ(θ2))n2(rμ)x2j(rμ+θ2)x2j+n2θ2×i=1n1Γ(x1i+θ1)x1i!j=1n2Γ(x2j+θ2)x2j!\begin{aligned} L(r, \mu, \theta_1, \theta_2 \mid X_1, X_2) = & \left( \frac{\theta_1^{\theta_1}}{\Gamma(\theta_1)} \right)^{n_1} \frac{\mu^{\sum{x_{1i}}}}{(\mu + \theta_1)^{\sum{x_{1i} + n_1 \theta_1}}} \times \\ & \left( \frac{\theta_2^{\theta_2}}{\Gamma(\theta_2)} \right)^{n_2} \frac{(r \mu)^{\sum{x_{2j}}}}{(r \mu + \theta_2)^{\sum{x_{2j} + n_2 \theta_2}}} \times \\ & \prod_{i = 1}^{n_1}{\frac{\Gamma(x_{1i} + \theta_1)}{x_{1i}!}} \prod_{j = 1}^{n_2}{\frac{\Gamma(x_{2j} + \theta_2)}{x_{2j}!}} \end{aligned}

and the parameter space is Θ={(r,μ,θ1,θ2):r,μ,θ1,θ2>0}\Theta = \left\{ (r, \mu, \theta_1, \theta_2) : r, \mu, \theta_1, \theta_2 > 0 \right\}. The log-likelihood is

l(r,μ,θ1,θ2)= n1[θ1lnθ1lnΓ(θ1)]+n2[θ2lnθ2lnΓ(θ2)]+(n1xˉ1+n2xˉ2)ln(μ)n1(xˉ1+θ1)ln(μ+θ1)+n2xˉ2ln(r)n2(xˉ2+θ2)ln(rμ+θ2)+i=1n1(lnΓ(y1i+θ1)ln(y1i!))+j=1n2(lnΓ(y2j+θ2)ln(y2j!))\begin{aligned} l(r, \mu, \theta_1, \theta_2) = \ &n_1 \left[ \theta_1 \ln \theta_1 - \ln \Gamma(\theta_1) \right] + \\ &n_2 \left[ \theta_2 \ln \theta_2 - \ln \Gamma(\theta_2) \right] + \\ &(n_1 \bar{x}_1 + n_2 \bar{x}_2) \ln(\mu) - n_1 (\bar{x}_1 + \theta_1) \ln(\mu + \theta_1) + \\ &n_2 \bar{x}_2 \ln(r) - n_2 (\bar{x}_2 + \theta_2) \ln(r \mu + \theta_2) + \\ &\sum_{i = 1}^{n_1}{\left( \ln \Gamma(y_{1i} + \theta_1) - \ln(y_{1i}!) \right)} + \\ &\sum_{j = 1}^{n_2}{\left( \ln \Gamma(y_{2j} + \theta_2) - \ln(y_{2j}!) \right)} \end{aligned}

Equal dispersion parameters

When the dispersion parameters are equal, the likelihood is

L(r,μ,θX1,X2)=(θθΓ(θ))n1+n2×μx1i(μ+θ)x1i+n1θ(rμ)x2j(rμ+θ)x2j+n2θ×i=1n1Γ(x1i+θ)x1i!j=1n2Γ(x2j+θ)x2j!\begin{aligned} L(r, \mu, \theta \mid X_1, X_2) = & \left( \frac{\theta^{\theta}}{\Gamma(\theta)} \right)^{n_1 + n_2} \times \\ & \frac{\mu^{\sum{x_{1i}}}}{(\mu + \theta)^{\sum{x_{1i} + n_1 \theta}}} \frac{(r \mu)^{\sum{x_{2j}}}}{(r \mu + \theta)^{\sum{x_{2j} + n_2 \theta}}} \times \\ & \prod_{i = 1}^{n_1}{\frac{\Gamma(x_{1i} + \theta)}{x_{1i}!}} \prod_{j = 1}^{n_2}{\frac{\Gamma(x_{2j} + \theta)}{x_{2j}!}} \end{aligned}

and the parameter space is Θ={(r,μ,θ):r,μ,θ>0}\Theta = \left\{ (r, \mu, \theta) : r, \mu, \theta > 0 \right\}. The log-likelihood is

l(r,μ,θ)= (n1+n2)[θlnθlnΓ(θ)]+(n1xˉ1+n2xˉ2)ln(μ)n1(xˉ1+θ)ln(μ+θ)+n2xˉ2ln(r)n2(xˉ2+θ)ln(rμ+θ)+i=1n1(lnΓ(y1i+θ)ln(y1i!))+j=1n2(lnΓ(y2j+θ)ln(y2j!))\begin{aligned} l(r, \mu, \theta) = \ &(n_1 + n_2) \left[ \theta \ln \theta - \ln \Gamma(\theta) \right] + \\ &(n_1 \bar{x}_1 + n_2 \bar{x}_2) \ln(\mu) - n_1 (\bar{x}_1 + \theta) \ln(\mu + \theta) + \\ &n_2 \bar{x}_2 \ln(r) - n_2 (\bar{x}_2 + \theta) \ln(r \mu + \theta) + \\ &\sum_{i = 1}^{n_1}{\left( \ln \Gamma(y_{1i} + \theta) - \ln(y_{1i}!) \right)} + \\ &\sum_{j = 1}^{n_2}{\left( \ln \Gamma(y_{2j} + \theta) - \ln(y_{2j}!) \right)} \end{aligned}

Value

Scalar numeric negative log-likelihood.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

See Also

sim_nb(), stats::nlminb(), stats::nlm(), stats::optim()

Examples

#----------------------------------------------------------------------------
# nll_nb_*() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
d <- sim_nb(
  n1 = 60,
  n2 = 40,
  mean1 = 10,
  ratio = 1.5,
  dispersion1 = 2,
  dispersion2 = 8
)

nll_nb_alt(
  param = c(mean1 = 10, mean2 = 15, dispersion = 2, dispersion2 = 8),
  value1 = d[[1L]],
  value2 = d[[2L]],
  equal_dispersion = FALSE
)

nll_nb_null(
  param = c(mean = 10, dispersion = 2, dispersion2 = 8),
  value1 = d[[1L]],
  value2 = d[[2L]],
  equal_dispersion = FALSE,
  ratio_null = 1
)

Plot power objects

Description

An automatic plot method for objects returned by power().

Usage

## S3 method for class 'depower'
plot(
  x,
  x_axis = NULL,
  y_axis = NULL,
  color = NULL,
  facet_row = NULL,
  facet_col = NULL,
  hline = NULL,
  caption = TRUE,
  caption_width = 70L,
  ...
)

Arguments

x

(depower)
The data frame returned by power().

x_axis

(string: NULL; names(x))
The name of the column to be used for the x-axis. Automatically chosen if NULL.

y_axis

(string: NULL; names(x))
The name of the column to be used for the y-axis. Automatically chosen if NULL. Generally, "power" (default) should be used for the y-axis.

color

(string: NULL; names(x))
The name of the column to be used for the ggplot2::aes() color aesthetic. Automatically chosen if NULL. Use NA to turn off.

facet_row

(string: NULL; names(x))
The name of the column to be used for the ggplot2::facet_grid() row. Automatically chosen if NULL. Use NA to turn off.

facet_col

(string: NULL; names(x))
The name of the column to be used for the ggplot2::facet_grid() column. Automatically chosen if NULL. Use NA to turn off.

hline

(numeric: NULL; ⁠(0, 1)⁠)
The y-intercept at which to draw a horizontal line.

caption

(Scalar logical: TRUE)
If TRUE (default), a caption is added to the plot. The caption includes information on parameter values that were conditioned on to generate the plot. If FALSE, the caption is not included.

caption_width

(Scalar integer: 70L)
The target column number for wrapping the caption text.

...

Unused additional arguments.

Details

If you are limited by the output from plot.depower(), keep in mind that the object returned by power() is a standard data frame. This allows you to easily plot all results with standard plotting functions. In addition, because plot.depower() uses ggplot2, you can modify the plot as you normally would. For example:

set.seed(1234)
sim_log_lognormal(
  n1 = c(10, 15),
  n2 = c(10, 15),
  ratio = c(1.3, 1.5),
  cv1 = c(0.3),
  cv2 = c(0.3, 0.5),
  nsims = 1000
) |>
  power(alpha = 0.05) |>
  plot(hline = 0.8, caption_width = 60) +
  ggplot2::theme_bw() +
  ggplot2::theme(plot.caption = ggplot2::element_text(hjust = 0)) +
  ggplot2::labs(title = "Power for the ratio of geometric means")
Example of extending plot() with ggplot2 functions

Value

A ggplot2::ggplot() object.

See Also

power()

Examples

#----------------------------------------------------------------------------
# plot() examples
#----------------------------------------------------------------------------
library(depower)

# Power for independent two-sample t-test
set.seed(1234)
sim_log_lognormal(
  n1 = c(10, 15),
  n2 = c(10, 15),
  ratio = c(1.3, 1.5),
  cv1 = c(0.3),
  cv2 = c(0.3, 0.5),
  nsims = 500
) |>
  power(alpha = 0.05) |>
  plot()

# Power for dependent two-sample t-test
set.seed(1234)
sim_log_lognormal(
  n1 = c(10, 15),
  n2 = c(10, 15),
  ratio = c(1.3, 1.5),
  cv1 = c(0.3, 0.5),
  cv2 = c(0.3, 0.5),
  cor = c(0.3),
  nsims = 500
) |>
  power(alpha = 0.01) |>
  plot()

# Power for two-sample independent AND two-sample dependent t-test
set.seed(1234)
sim_log_lognormal(
  n1 = c(10, 15),
  n2 = c(10, 15),
  ratio = c(1.3, 1.5),
  cv1 = c(0.3),
  cv2 = c(0.3),
  cor = c(0, 0.3, 0.6),
  nsims = 500
) |>
  power(alpha = c(0.05, 0.01)) |>
  plot(facet_row = "cor", color = "test")

# Power for one-sample t-test
set.seed(1234)
sim_log_lognormal(
  n1 = c(10, 15),
  ratio = c(1.2, 1.4),
  cv1 = c(0.3, 0.5),
  nsims = 500
) |>
  power(alpha = c(0.05, 0.01)) |>
  plot()


# Power for independent two-sample NB test
set.seed(1234)
sim_nb(
  n1 = c(10, 15),
  mean1 = 10,
  ratio = c(1.8, 2),
  dispersion1 = 10,
  dispersion2 = 3,
  nsims = 100
) |>
  power(alpha = 0.01) |>
  plot()

# Power for BNB test
set.seed(1234)
sim_bnb(
  n = c(10, 12),
  mean1 = 10,
  ratio = c(1.3, 1.5),
  dispersion = 5,
  nsims = 100
) |>
  power(alpha = 0.01) |>
  plot()

Simulated power

Description

A method to calculate power for objects returned by sim_log_lognormal(), sim_nb(), and sim_bnb().

Usage

power(data, ..., alpha = 0.05, list_column = FALSE, ncores = 1L)

Arguments

data

(depower)
The simulated data returned by sim_log_lognormal(), sim_nb(), or sim_bnb().

...

(functions)
The function(s) used to perform the test. If empty, a default test function will be selected based on class(data). Names are used for labeling and should be unique. If names are empty, the call coerced to character will be used for name-value pairs. See 'Details'.

alpha

(numeric: 0.05; ⁠(0,1)⁠)
The expected probability of rejecting a single null hypothesis when it is actually true. See 'Details'.

list_column

(Scalar logical: FALSE)
If TRUE, the data and result list-columns are included in the returned data frame. If FALSE (default), the data and result list-columns are not included in the returned data frame.

ncores

(Scalar integer: 1L; ⁠[1,Inf)⁠)
The number of cores (number of worker processes) to use. Do not set greater than the value returned by parallel::detectCores(). May be helpful when the number of parameter combinations is large and nsims is large.

Details

Power is calculated as the proportion of hypothesis tests which result in a p-value less than or equal to alpha. e.g.

sum(p <= alpha, na.rm = TRUE) / nsims

Power is defined as the expected probability of rejecting the null hypothesis for a chosen value of the unknown effect. In a multiple comparisons scenario, power is defined as the marginal power, which is the expected power of the test for each individual null hypothesis assumed to be false.

Other forms of power under the multiple comparisons scenario include disjunctive or conjunctive power. Disjunctive power is defined as the expected probability of correctly rejecting one or more null hypotheses. Conjunctive power is defined as the expected probability of correctly rejecting all null hypotheses. In the simplest case, and where all hypotheses are independent, if the marginal power is defined as π\pi and mm is the number of null hypotheses assumed to be false, then disjunctive power may be calculated as 1(1π)m1 - (1 - \pi)^m and conjunctive power may be calculated as πm\pi^m. Disjunctive power tends to decrease with increasingly correlated hypotheses and conjunctive power tends to increase with increasingly correlated hypotheses.

Argument ...

... are the name-value pairs for the functions used to perform the tests. If not named, the functions coerced to character will be used for the name-value pairs. Typical in non-standard evaluation, ... accepts bare functions and converts them to a list of expressions. Each element in this list will be validated as a call and then evaluated on the simulated data. A base::call() is simply an unevaluated function. Below are some examples of specifying ... in power().

# Examples of specifying ... in power()
data <- sim_nb(
  n1 = 10,
  mean1 = 10,
  ratio = c(1.6, 2),
  dispersion1 = 2,
  dispersion2 = 2,
  nsims = 200
)

# ... is empty, so a an appropriate default function will be provided
power(data)

# This is equivalent to leaving ... empty
power(data, "NB Wald test" = wald_test_nb())

# If not named, "wald_test_nb()" will be used to label the function
power(data, wald_test_nb())

# You can specify any parameters in the call. The data argument
# will automatically be inserted or overwritten.
data |>
  power("NB Wald test" = wald_test_nb(equal_dispersion=TRUE, link="log"))

# Multiple functions may be used.
data |>
  power(
    wald_test_nb(link='log'),
    wald_test_nb(link='sqrt'),
    wald_test_nb(link='squared'),
    wald_test_nb(link='identity')
  )

# Just like functions in a pipe, the parentheses are required.
# This will error because wald_test_nb is missing parentheses.
try(power(data, wald_test_nb))

In most cases*, any user created test function may be utilized in ... if the following conditions are satisfied:

  1. The function contains argument data which is defined as a list with the first and second elements for simulated data.

  2. The return object is a list with element p for the p-value of the hypothesis test.

Validate with test cases beforehand.

*Simulated data of class log_lognormal_mixed_two_sample has both independent and dependent data. To ensure the appropriate test function is used, power.log_lognormal_mixed_two_sample() allows only t_test_welch() and t_test_paired() in .... Each will be evaluated on the simulated data according to column data$cor. If one or both of these functions are not included in ..., the corresponding default function will be used automatically. If any other test function is included, an error will be returned.

Argument alpha

α\alpha is known as the type I assertion probability and is defined as the expected probability of rejecting a null hypothesis when it was actually true. α\alpha is compared with the p-value and used as the decision boundary for rejecting or not rejecting the null hypothesis.

The family-wise error rate is the expected probability of making one or more type I assertions among a family of hypotheses. Using Bonferroni's method, α\alpha is chosen for the family of hypotheses then divided by the number of tests performed (mm). Each individual hypothesis is tested at αm\frac{\alpha}{m}. For example, if you plan to conduct 30 hypothesis tests and want to control the family-wise error rate to no greater than α=0.05\alpha = 0.05, you would set alpha = 0.05/30.

The per-family error rate is the expected number of type I assertions among a family of hypotheses. If you calculate power for the scenario where you perform 1,000 hypotheses and want to control the per-family error rate to no greater than 10 type I assertions, you would choose alpha = 10/1000. This implicitly assumes all 1,000 hypotheses are truly null. Alternatively, if you assume 800 of these hypotheses are truly null and 200 are not, alpha = 10/1000 would control the per-family error rate to no greater than 8 type I assertions. If it is acceptable to keep the per-family error rate as 10, setting alpha = 10/800 would provide greater marginal power than the previous scenario.

These two methods assume that the distribution of p-values for the truly null hypotheses are uniform(0,1), but remain valid under various other testing scenarios (such as dependent tests). Other multiple comparison methods, such as FDR control, are common in practice but don't directly fit into this power simulation framework.

Column nsims

The final number of valid simulations per unique set of simulation parameters may be less than the original number requested. This may occur when the test results in a missing p-value. For wald_test_bnb(), pathological MLE estimates, generally from small sample sizes and very small dispersions, may result in a negative estimated standard deviation of the ratio. Thus the test statistic and p-value would not be calculated. Note that simulated data from sim_nb() and sim_bnb() may also reduce nsims during the data simulation phase.

The nsims column in the return data frame is the effective number of simulations for power results.

Value

A data frame with the following columns appended to the data object:

Column Name Description
alpha Type I assertion probability.
test Name-value pair of the function and statistical test: ⁠c(as.character(...) = names(...)⁠.
data List-column of simulated data.
result List-column of test results.
power Power of the test for corresponding parameters.

For power(list_column = FALSE), columns data, and result are excluded from the data frame.

References

Yu L, Fernandez S, Brock G (2017). “Power analysis for RNA-Seq differential expression studies.” BMC Bioinformatics, 18(1), 234. ISSN 1471-2105, doi:10.1186/s12859-017-1648-2.

Yu L, Fernandez S, Brock G (2020). “Power analysis for RNA-Seq differential expression studies using generalized linear mixed effects models.” BMC Bioinformatics, 21(1), 198. ISSN 1471-2105, doi:10.1186/s12859-020-3541-7.

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.

Vickerstaff V, Omar RZ, Ambler G (2019). “Methods to adjust for multiple comparisons in the analysis and sample size calculation of randomised controlled trials with multiple primary outcomes.” BMC Medical Research Methodology, 19(1), 129. ISSN 1471-2288, doi:10.1186/s12874-019-0754-4.

See Also

plot.depower()

Examples

#----------------------------------------------------------------------------
# power() examples
#----------------------------------------------------------------------------
library(depower)

# Power for independent two-sample t-Test
set.seed(1234)
data <- sim_log_lognormal(
  n1 = 20,
  n2 = 20,
  ratio = c(1.2, 1.4),
  cv1 = 0.4,
  cv2 = 0.4,
  cor = 0,
  nsims = 1000
)

# Welch's t-test is used by default
power(data)

# But you can specify anything else that is needed
power(
  data = data,
  "Welch's t-Test" = t_test_welch(alternative = "greater"),
  alpha = 0.01
)

# Power for dependent two-sample t-Test
set.seed(1234)
sim_log_lognormal(
  n1 = 20,
  n2 = 20,
  ratio = c(1.2, 1.4),
  cv1 = 0.4,
  cv2 = 0.4,
  cor = 0.5,
  nsims = 1000
) |>
  power()

# Power for mixed-type two-sample t-Test
set.seed(1234)
sim_log_lognormal(
  n1 = 20,
  n2 = 20,
  ratio = c(1.2, 1.4),
  cv1 = 0.4,
  cv2 = 0.4,
  cor = c(0, 0.5),
  nsims = 1000
) |>
  power()

# Power for one-sample t-Test
set.seed(1234)
sim_log_lognormal(
  n1 = 20,
  ratio = c(1.2, 1.4),
  cv1 = 0.4,
  nsims = 1000
) |>
  power()

# Power for independent two-sample NB test
set.seed(1234)
sim_nb(
  n1 = 10,
  mean1 = 10,
  ratio = c(1.6, 2),
  dispersion1 = 2,
  dispersion2 = 2,
  nsims = 200
) |>
  power()

# Power for BNB test
set.seed(1234)
sim_bnb(
  n = 10,
  mean1 = 10,
  ratio = c(1.4, 1.6),
  dispersion = 10,
  nsims = 200
) |>
  power()

Simulate data from a BNB distribution

Description

Simulate data from the bivariate negative binomial (BNB) distribution. The BNB distribution is used to simulate count data where the event counts are jointly dependent (correlated). For independent data, see sim_nb().

Usage

sim_bnb(
  n,
  mean1,
  mean2,
  ratio,
  dispersion,
  nsims = 1L,
  return_type = "list",
  max_zeros = 0.99,
  ncores = 1L
)

Arguments

n

(integer: ⁠[2, Inf)⁠)
The number(s) of paired observations.

mean1

(numeric: ⁠(0, Inf)⁠)
The mean(s) of sample 1 (μ1)(\mu_1).

mean2, ratio

(numeric: ⁠(0, Inf)⁠)
Only specify one of these arguments.

  • mean2: The mean(s) of sample 2 (μ2)(\mu_2).

  • ratio: The ratio(s) of means for sample 2 with respect to sample 1 (r=μ2μ1)\left( r = \frac{\mu_2}{\mu_1} \right).

mean2 = ratio * mean1

dispersion

(numeric: ⁠(0, Inf)⁠)
The gamma distribution shape parameter(s) (θ)(\theta) which control the dispersion and the correlation between sample 1 and 2. See 'Details' and 'Examples'.

nsims

(Scalar integer: 1L; ⁠[1, Inf)⁠)
The expected number of simulated data sets. If nsims > 1, the data is returned in a list-column of a depower simulation data frame. nsims may be reduced depending on max_zeros.

return_type

(string: "list"; c("list", "data.frame"))
The data structure of the simulated data. If "list" (default), a list object is returned. If "data.frame" a data frame in tall format is returned. The list object provides computational efficiency and the data frame object is convenient for formulas. See 'Value'.

max_zeros

(Scalar numeric: 0.99; ⁠[0, 1]⁠)
The maximum proportion of zeros each group in a simulated dataset is allowed to have. If the proportion of zeros is greater than this value, the corresponding data is excluded from the set of simulations. This is most likely to occur when the sample size is small and the dispersion parameter is small.

ncores

(Scalar integer: 1L; ⁠[1,Inf)⁠)
The number of cores (number of worker processes) to use. Do not set greater than the value returned by parallel::detectCores(). May be helpful when the number of parameter combinations is large and nsims is large.

Details

The negative binomial distribution may be defined using a gamma-Poisson mixture distribution. In this case, the Poisson parameter λ\lambda is a random variable with gamma distribution. Equivalence between different parameterizations are demonstrated below:

# Define constants and their relationships
n <- 10000
dispersion <- 8
mu <- 4
p <- dispersion / (dispersion + mu)
q <- mu / (mu + dispersion)
variance <- mu + (mu^2 / dispersion)
rate <- p / (1 - p)
scale <- (1 - p) / p

# alternative formula for mu
mu_alt <- (dispersion * (1 - p)) / p
stopifnot(isTRUE(all.equal(mu, mu_alt)))

set.seed(20240321)

# Using built-in rnbinom with dispersion and mean
w <- rnbinom(n = n, size = dispersion, mu = mu)

# Using gamma-Poisson mixture with gamma rate parameter
x <- rpois(
  n = n,
  lambda = rgamma(n = n, shape = dispersion, rate = rate)
)

# Using gamma-Poisson mixture with gamma scale parameter
y <- rpois(
  n = n,
  lambda = rgamma(n = n, shape = dispersion, scale = scale)
)

# Using gamma-Poisson mixture with multiplicative mean and
# gamma scale parameter
z <- rpois(
  n = n,
  lambda = mu * rgamma(n = n, shape = dispersion, scale = 1/dispersion)
)

# Compare CDFs
par(mar=c(4,4,1,1))
plot(
  x = sort(w),
  y = (1:n)/n,
  xlim = range(c(w,x,y,z)),
  ylim = c(0,1),
  col = 'green',
  lwd = 4,
  type = 'l',
  main = 'CDF'
)
lines(x = sort(x), y = (1:n)/n, col = 'red', lwd = 2)
lines(x = sort(y), y = (1:n)/n, col = 'yellow', lwd = 1.5)
lines(x = sort(z), y = (1:n)/n, col = 'black')
Gamma-Poisson Mixture CDF

The BNB distribution is implemented by compounding two conditionally independent Poisson random variables X1G=gPoisson(μg)X_1 \mid G = g \sim \text{Poisson}(\mu g) and X2G=gPoisson(rμg)X_2 \mid G = g \sim \text{Poisson}(r \mu g) with a gamma random variable GGamma(θ,θ1)G \sim \text{Gamma}(\theta, \theta^{-1}). The probability mass function for the joint distribution of X1,X2X_1, X_2 is

P(X1=x1,X2=x2)=Γ(x1+x2+θ)(μ+rμ+θ)x1+x2+θμx1x1!(rμ)x2x2!θθΓ(θ)P(X_1 = x_1, X_2 = x_2) = \frac{\Gamma(x_1 + x_2 + \theta)}{(\mu + r \mu + \theta)^{x_1 + x_2 + \theta}} \frac{\mu^{x_1}}{x_1!} \frac{(r \mu)^{x_2}}{x_2!} \frac{\theta^{\theta}}{\Gamma(\theta)}

where x1,x2N0x_1,x_2 \in \mathbb{N}^{\geq 0} are specific values of the count outcomes, θR>0\theta \in \mathbb{R}^{> 0} is the dispersion parameter which controls the dispersion and level of correlation between the two samples (otherwise known as the shape parameter of the gamma distribution), μR>0\mu \in \mathbb{R}^{> 0} is the mean parameter, and r=μ2μ1R>0r = \frac{\mu_2}{\mu_1} \in \mathbb{R}^{> 0} is the ratio parameter representing the multiplicative change in the mean of the second sample relative to the first sample. GG denotes the random subject effect and the gamma distribution scale parameter is assumed to be the inverse of the dispersion parameter (θ1\theta^{-1}) for identifiability.

Correlation decreases from 1 to 0 as the dispersion parameter increases from 0 to infinity. For a given dispersion, increasing means also increases the correlation. See 'Examples' for a demonstration.

See 'Details' in sim_nb() for additional information on the negative binomial distribution.

Value

If nsims = 1 and the number of unique parameter combinations is one, the following objects are returned:

  • If return_type = "list", a list:

    Slot Name Description
    1 Simulated counts from sample 1.
    2 Simulated counts from sample 2.
  • If return_type = "data.frame", a data frame:

    Column Name Description
    1 item Subject/item indicator.
    2 condition Sample/condition indicator.
    3 value Simulated counts.

If nsims > 1 or the number of unique parameter combinations is greater than one, each object described above is returned in a list-column named data in a depower simulation data frame:

Column Name Description
1 n1 Sample size of sample 1.
2 n2 Sample size of sample 2.
3 mean1 Mean for sample 1.
4 mean2 Mean for sample 2.
5 ratio Ratio of means (sample 2 / sample 1).
6 dispersion Gamma distribution shape parameter (dispersion).
7 nsims Number of valid simulation iterations.
8 distribution Distribution sampled from.
9 data List-column of simulated data.

References

Yu L, Fernandez S, Brock G (2020). “Power analysis for RNA-Seq differential expression studies using generalized linear mixed effects models.” BMC Bioinformatics, 21(1), 198. ISSN 1471-2105, doi:10.1186/s12859-020-3541-7.

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

See Also

sim_nb(), stats::rpois(), stats::rgamma(), stats::rnbinom()

Examples

#----------------------------------------------------------------------------
# sim_bnb() examples
#----------------------------------------------------------------------------
library(depower)

# Paired two-sample data returned in a data frame
sim_bnb(
  n = 10,
  mean1 = 10,
  ratio = 1.6,
  dispersion = 3,
  nsims = 1,
  return_type = "data.frame"
)

# Paired two-sample data returned in a list
sim_bnb(
  n = 10,
  mean1 = 10,
  ratio = 1.6,
  dispersion = 3,
  nsims = 1,
  return_type = "list"
)

# Two simulations of paired two-sample data
# returned as a list of data frames
sim_bnb(
  n = 10,
  mean1 = 10,
  ratio = 1.6,
  dispersion = 3,
  nsims = 2,
  return_type = "data.frame"
)

# Two simulations of Paired two-sample data
# returned as a list of lists
sim_bnb(
  n = 10,
  mean1 = 10,
  ratio = 1.6,
  dispersion = 3,
  nsims = 2,
  return_type = "list"
)

#----------------------------------------------------------------------------
# Visualization of the BNB distribution as dispersion varies.
#----------------------------------------------------------------------------
set.seed(1234)
data <- lapply(
  X = c(1, 10, 100, 1000),
  FUN = function(x) {
    d <- sim_bnb(
      n = 1000,
      mean1 = 10,
      ratio = 1.5,
      dispersion = x,
      nsims = 1,
      return_type = "data.frame"
    )
    cor <- cor(
      x = d[d$condition == "1", ]$value,
      y = d[d$condition == "2", ]$value
    )
    cbind(dispersion = x, correlation = cor, d)
  }
)

data <- do.call(what = "rbind", args = data)

ggplot2::ggplot(
  data = data,
  mapping = ggplot2::aes(x = value, fill = condition)
) +
  ggplot2::facet_wrap(
    facets = ggplot2::vars(.data$dispersion),
    ncol = 2,
    labeller = ggplot2::labeller(.rows = ggplot2::label_both)
  ) +
  ggplot2::geom_density(alpha = 0.3) +
  ggplot2::coord_cartesian(xlim = c(0, 60)) +
  ggplot2::geom_text(
    mapping = ggplot2::aes(
      x = 30,
      y = 0.12,
      label = paste0("Correlation: ", round(correlation, 2))
    ),
    check_overlap = TRUE
  ) +
  ggplot2::labs(
    x = "Value",
    y = "Density",
    fill = "Condition",
    caption = "Mean1=10, Mean2=15, ratio=1.5"
  )

Simulate data from a normal distribution

Description

Simulate data from the log transformed lognormal distribution (i.e. a normal distribution). This function handles all three cases:

  1. One-sample data

  2. Dependent two-sample data

  3. Independent two-sample data

Usage

sim_log_lognormal(
  n1,
  n2 = NULL,
  ratio,
  cv1,
  cv2 = NULL,
  cor = 0,
  nsims = 1L,
  return_type = "list",
  ncores = 1L,
  messages = TRUE
)

Arguments

n1

(integer: ⁠[2, Inf)⁠)
The sample size(s) of sample 1.

n2

(integer: NULL; ⁠[2, Inf)⁠)
The sample size(s) of sample 2. Set as NULL if you want to simulate for the one-sample case.

ratio

(numeric: ⁠(0, Inf)⁠)
The assumed population fold change(s) of sample 2 with respect to sample 1.

  • For one-sample data, ratio is defined as the geometric mean (GM) of the original lognormal population distribution.

  • For dependent two-sample data, ratio is defined by GM(sample 2 / sample 1) of the original lognormal population distributions.

    • e.g. ratio = 2 assumes that the geometric mean of all paired ratios (sample 2 / sample 1) is 2.

  • For independent two-sample data, the ratio is defined by GM(group 2) / GM(group 1) of the original lognormal population distributions.

    • e.g. ratio = 2 assumes that the geometric mean of sample 2 is 2 times larger than the geometric mean of sample 1.

See 'Details' for additional information.

cv1

(numeric: ⁠(0, Inf)⁠)
The coefficient of variation(s) of sample 1 in the original lognormal data.

cv2

(numeric: NULL; ⁠(0, Inf)⁠)
The coefficient of variation(s) of sample 2 in the original lognormal data. Set as NULL if you want to simulate for the one-sample case.

cor

(numeric: 0; ⁠[-1, 1]⁠)
The correlation(s) between sample 1 and sample 2 in the original lognormal data. Not used for the one-sample case.

nsims

(Scalar integer: 1L; ⁠[1,Inf)⁠)
The number of simulated data sets. If nsims > 1, the data is returned in a list-column of a depower simulation data frame.

return_type

(string: "list"; c("list", "data.frame"))
The data structure of the simulated data. If "list" (default), a list object is returned. If "data.frame" a data frame in tall format is returned. The list object provides computational efficiency and the data frame object is convenient for formulas. See 'Value'.

ncores

(Scalar integer: 1L; ⁠[1,Inf)⁠)
The number of cores (number of worker processes) to use. Do not set greater than the value returned by parallel::detectCores(). May be helpful when the number of parameter combinations is large and nsims is large.

messages

(Scalar logical: TRUE)
Whether or not to display messages for pathological simulation cases.

Details

Based on assumed characteristics of the original lognormal distribution, data is simulated from the corresponding log-transformed (normal) distribution. This simulated data is suitable for assessing power of a hypothesis for the geometric mean or ratio of geometric means from the original lognormal data.

This method can also be useful for other population distributions which are positive and where it makes sense to describe the ratio of geometric means. However, the lognormal distribution is theoretically correct in the sense that you can log transform to a normal distribution, compute the summary statistic, then apply the inverse transformation to summarize on the original lognormal scale.

Let GM()GM(\cdot) be the geometric mean and AM()AM(\cdot) be the arithmetic mean. For independent lognormal samples X1X_1 and X2X_2

Fold Change=GM(X2)GM(X1)\text{Fold Change} = \frac{GM(X_2)}{GM(X_1)}

For dependent lognormal samples X1X_1 and X2X_2

Fold Change=GM(X2X1)\text{Fold Change} = GM\left( \frac{X_2}{X_1} \right)

Unlike ratios and the arithmetic mean, for equal sample sizes of X1X_1 and X2X_2 it follows that GM(X2)GM(X1)=GM(X2X1)=eAM(lnX2)AM(lnX1)=eAM(lnX2lnX1)\frac{GM(X_2)}{GM(X_1)} = GM \left( \frac{X_2}{X_1} \right) = e^{AM(\ln X_2) - AM(\ln X_1)} = e^{AM(\ln X_2 - \ln X_1)}.

The coefficient of variation (CV) for XX is defined as

CV=SD(X)AM(X)CV = \frac{SD(X)}{AM(X)}

The relationship between sample statistics for the original lognormal data (XX) and the natural logged data (lnX\ln{X}) are

AM(X)=eAM(lnX)+Var(lnX)2GM(X)=eAM(lnX)Var(X)=AM(X)2(eVar(lnX)1)CV(X)=AM(X)2(eVar(lnX)1)AM(X)=eVar(lnX)1\begin{aligned} AM(X) &= e^{AM(\ln{X}) + \frac{Var(\ln{X})}{2}} \\ GM(X) &= e^{AM(\ln{X})} \\ Var(X) &= AM(X)^2 \left( e^{Var(\ln{X})} - 1 \right) \\ CV(X) &= \frac{\sqrt{AM(X)^2 \left( e^{Var(\ln{X})} - 1 \right)}}{AM(X)} \\ &= \sqrt{e^{Var(\ln{X})} - 1} \end{aligned}

and

AM(lnX)=ln(AM(X)CV(X)2+1)Var(lnX)=ln(CV(X)2+1)Cor(lnX1,lnX2)=ln(Cor(X1,X2)CV(X1)CV(X2)+1)SD(lnX1)SD(lnX2)\begin{aligned} AM(\ln{X}) &= \ln \left( \frac{AM(X)}{\sqrt{CV(X)^2 + 1}} \right) \\ Var(\ln{X}) &= \ln(CV(X)^2 + 1) \\ Cor(\ln{X_1}, \ln{X_2}) &= \frac{\ln \left( Cor(X_1, X_2)CV(X_1)CV(X_2) + 1 \right)}{SD(\ln{X_1})SD(\ln{X_2})} \end{aligned}

Based on the properties of correlation and variance,

Var(X2X1)=Var(X1)+Var(X2)2Cov(X1,X2)=Var(X1)+Var(X2)2Cor(X1,X2)SD(X1)SD(X2)SD(X2X1)=Var(X2X1)\begin{aligned} Var(X_2 - X_1) &= Var(X_1) + Var(X_2) - 2Cov(X_1, X_2) \\ &= Var(X_1) + Var(X_2) - 2Cor(X_1, X_2)SD(X_1)SD(X_2) \\ SD(X_2 - X_1) &= \sqrt{Var(X_2 - X_1)} \end{aligned}

The standard deviation of the differences gets smaller the more positive the correlation and conversely gets larger the more negative the correlation. For the special case where the two samples are uncorrelated and each has the same variance, it follows that

Var(X2X1)=σ2+σ2SD(X2X1)=2σ\begin{aligned} Var(X_2 - X_1) &= \sigma^2 + \sigma^2 \\ SD(X_2 - X_1) &= \sqrt{2}\sigma \end{aligned}

Value

If nsims = 1 and the number of unique parameter combinations is one, the following objects are returned:

  • If one-sample data with return_type = "list", a list:

    Slot Name Description
    1 One sample of simulated normal values.
  • If one-sample data with return_type = "data.frame", a data frame:

    Column Name Description
    1 item Pair/subject/item indicator.
    2 value Simulated normal values.
  • If two-sample data with return_type = "list", a list:

    Slot Name Description
    1 Simulated normal values from sample 1.
    2 Simulated normal values from sample 2.
  • If two-sample data with return_type = "data.frame", a data frame:

    Column Name Description
    1 item Pair/subject/item indicator.
    2 condition Time/group/condition indicator.
    3 value Simulated normal values.

If nsims > 1 or the number of unique parameter combinations is greater than one, each object described above is returned in data frame, located in a list-column named data.

  • If one-sample data, a data frame:

    Column Name Description
    1 n1 The sample size.
    2 ratio Geometric mean [GM(sample 1)].
    3 cv1 Coefficient of variation for sample 1.
    4 nsims Number of data simulations.
    5 distribution Distribution sampled from.
    6 data List-column of simulated data.
  • If two-sample data, a data frame:

    Column Name Description
    1 n1 Sample size of sample 1.
    2 n2 Sample size of sample 2.
    3 ratio Ratio of geometric means [GM(sample 2) / GM(sample 1)] or geometric mean ratio [GM(sample 2 / sample 1)].
    4 cv1 Coefficient of variation for sample 1.
    5 cv2 Coefficient of variation for sample 2.
    6 cor Correlation between samples.
    7 nsims Number of data simulations.
    8 distribution Distribution sampled from.
    9 data List-column of simulated data.

References

Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.

Hauschke D, Steinijans VW, Diletti E, Burke M (1992). “Sample size determination for bioequivalence assessment using a multiplicative model.” Journal of Pharmacokinetics and Biopharmaceutics, 20(5), 557–561. ISSN 0090-466X, doi:10.1007/BF01061471.

Johnson NL, Kotz S, Balakrishnan N (1994). Continuous univariate distributions, Wiley series in probability and mathematical statistics, 2nd ed edition. Wiley, New York. ISBN 9780471584957 9780471584940.

See Also

stats::rnorm(), mvnfast::rmvn()

Examples

#----------------------------------------------------------------------------
# sim_log_lognormal() examples
#----------------------------------------------------------------------------
library(depower)

# Independent two-sample data returned in a data frame
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0,
  nsims = 1,
  return_type = "data.frame"
)

# Independent two-sample data returned in a list
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0,
  nsims = 1,
  return_type = "list"
)

# Dependent two-sample data returned in a data frame
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0.4,
  nsims = 1,
  return_type = "data.frame"
)

# Dependent two-sample data returned in a list
sim_log_lognormal(
  n1 = 10,
  n2 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0.4,
  nsims = 1,
  return_type = "list"
)

# One-sample data returned in a data frame
sim_log_lognormal(
  n1 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  nsims = 1,
  return_type = "data.frame"
)

# One-sample data returned in a list
sim_log_lognormal(
  n1 = 10,
  ratio = 1.3,
  cv1 = 0.35,
  nsims = 1,
  return_type = "list"
)

# Independent two-sample data: two simulations for four parameter combinations.
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
  n1 = c(10, 20),
  n2 = c(10, 20),
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0,
  nsims = 2,
  return_type = "list"
)

# Dependent two-sample data: two simulations for two parameter combinations.
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
  n1 = c(10, 20),
  n2 = c(10, 20),
  ratio = 1.3,
  cv1 = 0.35,
  cv2 = 0.35,
  cor = 0.4,
  nsims = 2,
  return_type = "list"
)

# One-sample data: two simulations for two parameter combinations
# Returned as a list-column of lists within a data frame
sim_log_lognormal(
  n1 = c(10, 20),
  ratio = 1.3,
  cv1 = 0.35,
  nsims = 2,
  return_type = "list"
)

Simulate data from a NB distribution

Description

Simulate data from two independent negative binomial (NB) distributions. For paired data, see sim_bnb().

Usage

sim_nb(
  n1,
  n2 = n1,
  mean1,
  mean2,
  ratio,
  dispersion1,
  dispersion2 = dispersion1,
  nsims = 1L,
  return_type = "list",
  max_zeros = 0.99,
  ncores = 1L
)

Arguments

n1

(integer: ⁠[2, Inf)⁠)
The sample size(s) of group 1.

n2

(integer: n1; ⁠[2, Inf)⁠)
The sample size(s) of group 2.

mean1

(numeric: ⁠(0, Inf)⁠)
The mean(s) of group 1 (μ1)(\mu_1).

mean2, ratio

(numeric: ⁠(0, Inf)⁠)
Only specify one of these arguments.

  • mean2: The mean(s) of group 2 (μ2)(\mu_2).

  • ratio: The ratio(s) of means for group 2 with respect to group 1 (r=μ2μ1)\left( r = \frac{\mu_2}{\mu_1} \right).

mean2 = ratio * mean1

dispersion1

(numeric: ⁠(0, Inf)⁠)
The dispersion parameter(s) of group 1 (θ1)(\theta_1). See 'Details' and 'Examples'.

dispersion2

(numeric: dispersion1; ⁠(0, Inf)⁠)
The dispersion parameter(s) of group 2 (θ2)(\theta_2). See 'Details' and 'Examples'.

nsims

(Scalar integer: 1L; ⁠[1,Inf)⁠)
The expected number of simulated data sets. If nsims > 1, the data is returned in a list-column of a depower simulation data frame. nsims may be reduced depending on max_zeros.

return_type

(string: "list"; c("list", "data.frame"))
The data structure of the simulated data. If "list" (default), a list object is returned. If "data.frame" a data frame in tall format is returned. The list object provides computational efficiency and the data frame object is convenient for formulas. See 'Value'.

max_zeros

(Scalar numeric: 0.99; ⁠[0, 1]⁠)
The maximum proportion of zeros each group in a simulated dataset is allowed to have. If the proportion of zeros is greater than this value, the corresponding data is excluded from the set of simulations. This is most likely to occur when the sample size is small and the dispersion parameter is small.

ncores

(Scalar integer: 1L; ⁠[1,Inf)⁠)
The number of cores (number of worker processes) to use. Do not set greater than the value returned by parallel::detectCores(). May be helpful when the number of parameter combinations is large and nsims is large.

Details

The negative binomial distribution has many parameterizations. In the regression modeling context, it is common to specify the distribution in terms of its mean and dispersion. We use the following probability mass function:

P(X=x)=(x+θ1x)(θθ+μ)θ(μμ+θ)x=Γ(x+θ)x!Γ(θ)(θθ+μ)θ(μμ+θ)x=Γ(x+θ)(θ+μ)θ+xθθΓ(θ)μxx!\begin{aligned} P(X = x) &= \dbinom{x + \theta - 1}{x} \left( \frac{\theta}{\theta + \mu} \right)^{\theta} \left( \frac{\mu}{\mu + \theta} \right)^x \\ &= \frac{\Gamma(x + \theta)}{x! \Gamma(\theta)} \left( \frac{\theta}{\theta + \mu} \right)^{\theta} \left( \frac{\mu}{\mu + \theta} \right)^{x} \\ &= \frac{\Gamma(x + \theta)}{(\theta + \mu)^{\theta + x}} \frac{\theta^{\theta}}{\Gamma(\theta)} \frac{\mu^{x}}{x!} \end{aligned}

where xN0x \in \mathbb{N}^{\geq 0}, θR>0\theta \in \mathbb{R}^{> 0} is the dispersion parameter, and μR>0\mu \in \mathbb{R}^{> 0} is the mean. This is analogous to the typical formulation where XX is counting xx failures given θ\theta successes and p=θθ+μp = \frac{\theta}{\theta + \mu} is the probability of success on each trial. It follows that E(X)=μE(X) = \mu and Var(X)=μ+μ2θVar(X) = \mu + \frac{\mu^2}{\theta}. The θ\theta parameter describes the 'dispersion' among observations. Smaller values of θ\theta lead to overdispersion and larger values of θ\theta decrease the overdispersion, eventually converging to the Poisson distribution.

Described above is the 'indirect quadratic parameterization' of the negative binomial distribution, which is commonly found in the R ecosystem. However, it is somewhat counterintuitive because the smaller θ\theta gets, the larger the overdispersion. The 'direct quadratic parameterization' of the negative binomial distribution may be found in some R packages and other languages such as SAS and Stata. The direct parameterization is defined by substituting α=1θ\alpha = \frac{1}{\theta} (α>0\alpha > 0) which results in Var(X)=μ+αμ2Var(X) = \mu + \alpha\mu^2. In this case, the larger α\alpha gets the larger the overdispersion, and the Poisson distribution is a special case of the negative binomial distribution where α=0\alpha = 0.

A general class of negative binomial models may be defined with mean μ\mu and variance μ+αμp\mu + \alpha\mu^{p}. The 'linear parameterization' is then found by setting p=1p=1, resulting in Var(X)=μ+αμVar(X) = \mu + \alpha\mu. It's common to label the linear parameterization as 'NB1' and the direct quadratic parameterization as 'NB2'.

See 'Details' in sim_bnb() for additional information on the gamma-Poisson mixture formulation of the negative binomial distribution.

Value

If nsims = 1 and the number of unique parameter combinations is one, the following objects are returned:

  • If return_type = "list", a list:

    Slot Name Description
    1 Simulated counts from group 1.
    2 Simulated counts from group 2.
  • If return_type = "data.frame", a data frame:

    Column Name Description
    1 item Subject/item indicator.
    2 condition Group/condition indicator.
    3 value Simulated counts.

If nsims > 1 or the number of unique parameter combinations is greater than one, each object described above is returned in a list-column named data in a depower simulation data frame:

Column Name Description
1 n1 Sample size of group 1.
2 n2 Sample size of group 2.
3 mean1 Mean for group 1.
4 mean2 Mean for group 2.
5 ratio Ratio of means (group 2 / group 1).
6 dispersion1 Dispersion parameter for group 1.
7 dispersion2 Dispersion parameter for group 2.
8 nsims Number of valid simulation iterations.
9 distribution Distribution sampled from.
10 data List-column of simulated data.

References

Yu L, Fernandez S, Brock G (2017). “Power analysis for RNA-Seq differential expression studies.” BMC Bioinformatics, 18(1), 234. ISSN 1471-2105, doi:10.1186/s12859-017-1648-2.

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

Hilbe JM (2011). Negative Binomial Regression, 2 edition. Cambridge University Press. ISBN 9780521198158 9780511973420, doi:10.1017/CBO9780511973420.

Hilbe JM (2014). Modeling count data. Cambridge University Press, New York, NY. ISBN 9781107028333 9781107611252, doi:10.1017/CBO9781139236065.

Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, Econometric Society Monographs, 2 edition. Cambridge University Press. doi:10.1017/CBO9781139013567.

See Also

sim_bnb(), stats::rnbinom()

Examples

#----------------------------------------------------------------------------
# sim_nb() examples
#----------------------------------------------------------------------------
library(depower)

# Independent two-sample NB data returned in a data frame
sim_nb(
  n1 = 10,
  mean1 = 5,
  ratio = 1.6,
  dispersion1 = 0.5,
  dispersion2 = 0.5,
  nsims = 1,
  return_type = "data.frame"
)

# Independent two-sample NB data returned in a list
sim_nb(
  n1 = 10,
  mean1 = 5,
  ratio = 1.6,
  dispersion1 = 0.5,
  dispersion2 = 0.5,
  nsims = 1,
  return_type = "list"
)

# Two simulations of independent two-sample data
# returned as a list of data frames
sim_nb(
  n1 = 10,
  mean1 = 5,
  ratio = 1.6,
  dispersion1 = 0.5,
  dispersion2 = 0.5,
  nsims = 2,
  return_type = "data.frame"
)

# Two simulations of independent two-sample data
# returned as a list of lists
sim_nb(
  n1 = 10,
  mean1 = 5,
  ratio = 1.6,
  dispersion1 = 0.5,
  dispersion2 = 0.5,
  nsims = 2,
  return_type = "list"
)

#----------------------------------------------------------------------------
# Visualization of the NB distribution as dispersion varies between groups.
#----------------------------------------------------------------------------
disp <- expand.grid(c(1, 10, 100), c(1, 10, 100))
set.seed(1234)
data <- mapply(
  FUN = function(disp1, disp2) {
    d <- sim_nb(
      n1 = 1000,
      mean1 = 10,
      ratio = 1.5,
      dispersion1 = disp1,
      dispersion2 = disp2,
      nsims = 1,
      return_type = "data.frame"
    )
    cbind(dispersion1 = disp1, dispersion2 = disp2, d)
  },
  disp1 = disp[[1]],
  disp2 = disp[[2]],
  SIMPLIFY = FALSE
)

data <- do.call(what = "rbind", args = data)

ggplot2::ggplot(
  data = data,
  mapping = ggplot2::aes(x = value, fill = condition)
) +
  ggplot2::facet_grid(
    rows = ggplot2::vars(.data$dispersion2),
    cols = ggplot2::vars(.data$dispersion1),
    labeller = ggplot2::labeller(
      .rows = ggplot2::label_both,
      .cols = ggplot2::label_both
    )
  ) +
  ggplot2::geom_density(alpha = 0.3) +
  ggplot2::coord_cartesian(xlim = c(0, 50)) +
  ggplot2::labs(
    x = "Value",
    y = "Density",
    fill = "Condition",
    caption = "Mean1=10, Mean2=15, ratio=1.5"
  )

Paired and one-sample t-Tests

Description

Performs paired and one-sample t-Tests.

Usage

t_test_paired(data, alternative = "two.sided", ci_level = NULL, mean_null = 0)

Arguments

data

(list)
A list whose first element is the vector of normal values from sample 1 and the second element is the vector of normal values from sample 2. Both vectors must be the same sample size and sorted by the subject/item index. If length(data) == 1L, the one-sample test is used. NAs are silently excluded. The default output from sim_log_lognormal().

alternative

(string: "two.sided")
The alternative hypothesis. Must be one of "two.sided", "greater", or "less". See 'Details' for additional information.

ci_level

(Scalar numeric: NULL; ⁠(0, 1)⁠)
If NULL, confidence intervals are set as NA. If in ⁠(0, 1)⁠, confidence intervals are calculated at the specified level.

mean_null

(Scalar numeric: 0; ⁠(-Inf, Inf)⁠)
The mean or mean difference assumed under the null hypothesis. See 'Details' for additional information.

Details

This function is primarily designed for speed in simulation. Missing values are silently excluded.

The one-sample test is used for both the true one-sample scenario and for the paired differences from a dependent two-sample scenario. Below we use paired difference language as that is the most common case. The hypotheses for the paired t-test are

Hnull:μdiff=μnullHalt:{μdiffμnulltwo-sidedμdiff>μnullgreater thanμdiff<μnullless than\begin{aligned} H_{null} &: \mu_{diff} = \mu_{null} \\ H_{alt} &: \begin{cases} \mu_{diff} \neq \mu_{null} & \text{two-sided}\\ \mu_{diff} > \mu_{null} & \text{greater than}\\ \mu_{diff} < \mu_{null} & \text{less than} \end{cases} \end{aligned}

where μdiff=AM(X2X1)\mu_{diff} = AM(X_2 - X_1) is the arithmetic mean of the paired differences (sample 2 - sample 1) and μnull\mu_{null} is a constant for the assumed population mean difference (usually μnull=0\mu_{null} = 0).

The test statistic is

T=xˉdiffμnulls2nT = \frac{\bar{x}_{diff} - \mu_{null}}{\sqrt{\frac{s^2}{n}}}

where xˉdiff\bar{x}_{diff} is the sample mean of the differences, μnull\mu_{null} is the population mean difference assumed under the null hypothesis, nn is the sample size of the differences, and s2s^2 is the sample variance.

The critical value of the test statistic has degrees of freedom

df=n1df = n-1

and the p-value is calculated as

p={2min{P(Ttn1Hnull),P(Ttn1Hnull)}two-sidedP(Ttn1Hnull)greater thanP(Ttn1Hnull)less than\begin{aligned} p &= \begin{cases} 2 \text{min} \{P(T \geq t_{n-1} \mid H_{null}), P(T \leq t_{n-1} \mid H_{null})\} & \text{two-sided}\\ P(T \geq t_{n-1} \mid H_{null}) & \text{greater than}\\ P(T \leq t_{n-1} \mid H_{null}) & \text{less than} \end{cases} \end{aligned}

Let GM()GM(\cdot) be the geometric mean and AM()AM(\cdot) be the arithmetic mean. For dependent lognormal samples X1X_1 and X2X_2 it follows that lnX1\ln X_1 and lnX2\ln X_2 are dependent normally distributed variables. Setting μdiff=AM(lnX2lnX1)\mu_{diff} = AM(\ln X_2 - \ln X_1) we have

eμdiff=GM(X2X1)e^{\mu_{diff}} = GM\left( \frac{X_2}{X_1} \right)

This forms the basis for making inference about the geometric mean ratio of the original lognormal data using the mean difference of the log transformed normal data.

Value

A list with the following elements:

Slot Subslot Name Description
1 t Value of the t-statistic.
2 df Degrees of freedom for the t-statistic.
3 p p-value.
4 mean_diff Estimated mean or mean of the differences (sample 2 – sample 1).
4 1 estimate Point estimate.
4 2 lower Confidence interval lower bound.
4 3 upper Confidence interval upper bound.
5 n Number of paired observations.
6 method Method used for the results.
7 alternative The alternative hypothesis.
8 ci_level The confidence level.
9 mean_null Assumed population mean of the differences under the null hypothesis.

References

Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.

Hauschke D, Steinijans VW, Diletti E, Burke M (1992). “Sample size determination for bioequivalence assessment using a multiplicative model.” Journal of Pharmacokinetics and Biopharmaceutics, 20(5), 557–561. ISSN 0090-466X, doi:10.1007/BF01061471.

Johnson NL, Kotz S, Balakrishnan N (1994). Continuous univariate distributions, Wiley series in probability and mathematical statistics, 2nd ed edition. Wiley, New York. ISBN 9780471584957 9780471584940.

See Also

stats::t.test()

Examples

#----------------------------------------------------------------------------
# t_test_paired() examples
#----------------------------------------------------------------------------
library(depower)

# One-sample t-test
set.seed(1234)
t_test1 <- sim_log_lognormal(
  n1 = 40,
  ratio = 1.5,
  cv1 = 0.4
) |>
  t_test_paired(ci_level = 0.95)

t_test1

# Paired t-test using two dependent samples
set.seed(1234)
t_test2 <- sim_log_lognormal(
  n1 = 40,
  n2 = 40,
  ratio = 1.5,
  cv1 = 0.4,
  cv2 = 0.2,
  cor = 0.3
) |>
  t_test_paired(ci_level = 0.95)

t_test2

Welch's t-Test

Description

Performs Welch's independent two-sample t-test.

Usage

t_test_welch(data, alternative = "two.sided", ci_level = NULL, mean_null = 0)

Arguments

data

(list)
A list whose first element is the vector of normal values from group 1 and the second element is the vector of normal values from group 2. NAs are silently excluded. The default output from sim_log_lognormal().

alternative

(string: "two.sided")
The alternative hypothesis. Must be one of "two.sided", "greater", or "less". See 'Details' for additional information.

ci_level

(Scalar numeric: NULL; ⁠(0, 1)⁠)
If NULL, confidence intervals are set as NA. If in ⁠(0, 1)⁠, confidence intervals are calculated at the specified level.

mean_null

(Scalar numeric: 0; ⁠(-Inf, Inf)⁠)
The difference of means assumed under the null hypothesis. See 'Details' for additional information.

Details

This function is primarily designed for speed in simulation. Missing values are silently excluded.

The hypotheses for Welch's independent two-sample t-test are

Hnull:μ2μ1=μnullHalt:{μ2μ1μnulltwo-sidedμ2μ1>μnullgreater thanμ2μ1<μnullless than\begin{aligned} H_{null} &: \mu_2 - \mu_1 = \mu_{null} \\ H_{alt} &: \begin{cases} \mu_2 - \mu_1 \neq \mu_{null} & \text{two-sided}\\ \mu_2 - \mu_1 > \mu_{null} & \text{greater than}\\ \mu_2 - \mu_1 < \mu_{null} & \text{less than} \end{cases} \end{aligned}

where μ1\mu_1 is the population mean of group 1, μ2\mu_2 is the population mean of group 2, and μnull\mu_{null} is a constant for the assumed difference of population means (usually μnull=0\mu_{null} = 0).

The test statistic is

T=(xˉ2xˉ1)μnulls12n1+s22n2T = \frac{(\bar{x}_2 - \bar{x}_1) - \mu_{null}}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

where xˉ1\bar{x}_1 and xˉ2\bar{x}_2 are the sample means, μnull\mu_{null} is the difference of population means assumed under the null hypothesis, n1n_1 and n2n_2 are the sample sizes, and s12s_1^2 and s22s_2^2 are the sample variances.

The critical value of the test statistic uses the Welch–Satterthwaite degrees of freedom

v=(s12n1+s22n2)2(N11)1(s12n1)2+(N21)1(s22n2)2v = \frac{\left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2} {(N_1 - 1)^{-1}\left( \frac{s_1^2}{n_1} \right)^2 + (N_2 - 1)^{-1}\left( \frac{s_2^2}{n_2} \right)^2}

and the p-value is calculated as

p={2min{P(TtvHnull),P(TtvHnull)}two-sidedP(TtvHnull)greater thanP(TtvHnull)less than\begin{aligned} p &= \begin{cases} 2 \text{min} \{P(T \geq t_{v} \mid H_{null}), P(T \leq t_{v} \mid H_{null})\} & \text{two-sided}\\ P(T \geq t_{v} \mid H_{null}) & \text{greater than}\\ P(T \leq t_{v} \mid H_{null}) & \text{less than} \end{cases} \end{aligned}

Let GM()GM(\cdot) be the geometric mean and AM()AM(\cdot) be the arithmetic mean. For independent lognormal variables X1X_1 and X2X_2 it follows that lnX1\ln X_1 and lnX2\ln X_2 are independent normally distributed variables. Defining μX2μX1=AM(lnX2)AM(lnX1)\mu_{X_2} - \mu_{X_1} = AM(\ln X_2) - AM(\ln X_1) we have

eμX2μX1=GM(X2)GM(X1)e^{\mu_{X_2} - \mu_{X_1}} = \frac{GM(X_2)}{GM(X_1)}

This forms the basis for making inference about the ratio of geometric means of the original lognormal data using the difference of means of the log transformed normal data.

Value

A list with the following elements:

Slot Subslot Name Description
1 t Value of the t-statistic.
2 df Degrees of freedom for the t-statistic.
3 p p-value.
4 diff_mean Estimated difference of means (group 2 – group 1).
4 1 estimate Point estimate.
4 2 lower Confidence interval lower bound.
4 3 upper Confidence interval upper bound.
5 mean_1 Estimated mean of group 1.
6 mean_2 Estimated mean of group 2.
7 n1 Sample size of group 1.
8 n2 Sample size of group 2.
9 method Method used for the results.
10 alternative The alternative hypothesis.
11 ci_level The confidence level.
12 mean_null Assumed population difference of the means under the null hypothesis.

References

Julious SA (2004). “Sample sizes for clinical trials with Normal data.” Statistics in Medicine, 23(12), 1921–1986. doi:10.1002/sim.1783.

Hauschke D, Steinijans VW, Diletti E, Burke M (1992). “Sample size determination for bioequivalence assessment using a multiplicative model.” Journal of Pharmacokinetics and Biopharmaceutics, 20(5), 557–561. ISSN 0090-466X, doi:10.1007/BF01061471.

Johnson NL, Kotz S, Balakrishnan N (1994). Continuous univariate distributions, Wiley series in probability and mathematical statistics, 2nd ed edition. Wiley, New York. ISBN 9780471584957 9780471584940.

See Also

stats::t.test()

Examples

#----------------------------------------------------------------------------
# t_test_welch() examples
#----------------------------------------------------------------------------
library(depower)

# Welch's t-test
set.seed(1234)
sim_log_lognormal(
  n1 = 40,
  n2 = 40,
  ratio = 1.5,
  cv1 = 0.4,
  cv2 = 0.4
) |>
  t_test_welch(ci_level = 0.95)

Wald test for BNB ratio of means

Description

Wald test for the ratio of means from bivariate negative binomial outcomes.

Usage

wald_test_bnb(data, ci_level = NULL, link = "log", ratio_null = 1, ...)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from sample 1 and the second element is the vector of negative binomial values from sample 2. Each vector must be sorted by the subject/item index and must be the same sample size. NAs are silently excluded. The default output from sim_bnb().

ci_level

(Scalar numeric: NULL; ⁠(0, 1)⁠)
If NULL, confidence intervals are set as NA. If in ⁠(0, 1)⁠, confidence intervals are calculated at the specified level.

link

(Scalar string: "log")
The one-to-one link function for transformation of the ratio in the test hypotheses. Must be one of "log" (default), "sqrt", "squared", or "⁠identity"⁠. See 'Details' for additional information.

ratio_null

(Scalar numeric: 1; ⁠(0, Inf)⁠)
The (pre-transformation) ratio of means assumed under the null hypothesis (sample 2 / sample 1). Typically ratio_null = 1 (no difference). See 'Details' for additional information.

...

Optional arguments passed to the MLE function mle_bnb().

Details

This function is primarily designed for speed in simulation. Missing values are silently excluded.

Suppose X1G=gPoisson(μg)X_1 \mid G = g \sim \text{Poisson}(\mu g) and X2G=gPoisson(rμg)X_2 \mid G = g \sim \text{Poisson}(r \mu g) where GGamma(θ,θ1)G \sim \text{Gamma}(\theta, \theta^{-1}) is the random item (subject) effect. Then X1,X2BNB(μ,r,θ)X_1, X_2 \sim \text{BNB}(\mu, r, \theta) is the joint distribution where X1X_1 and X2X_2 are dependent (though conditionally independent), X1X_1 is the count outcome for sample 1 of the items (subjects), X2X_2 is the count outcome for sample 2 of the items (subjects), μ\mu is the conditional mean of sample 1, rr is the ratio of the conditional means of sample 2 with respect to sample 1, and θ\theta is the gamma distribution shape parameter which controls the dispersion and the correlation between sample 1 and 2.

The hypotheses for the Wald test of rr are

Hnull:f(r)=f(rnull)Halt:f(r)f(rnull)\begin{aligned} H_{null} &: f(r) = f(r_{null}) \\ H_{alt} &: f(r) \neq f(r_{null}) \end{aligned}

where f()f(\cdot) is a one-to-one link function with nonzero derivative, r=Xˉ2Xˉ1r = \frac{\bar{X}_2}{\bar{X}_1} is the population ratio of arithmetic means for sample 2 with respect to sample 1, and rnullr_{null} is a constant for the assumed null population ratio of means (typically rnull=1r_{null} = 1).

Rettiganti and Nagaraja (2012) found that f(r)=r2f(r) = r^2, f(r)=rf(r) = r, and f(r)=r0.5f(r) = r^{0.5} had greatest power when r<1r < 1. However, when r>1r > 1, f(r)=lnrf(r) = \ln r, the likelihood ratio test, and f(r)=r0.5f(r) = r^{0.5} had greatest power. f(r)=r2f(r) = r^2 was biased when r>1r > 1. Both f(r)=lnrf(r) = \ln r and f(r)=r0.5f(r) = r^{0.5} produced acceptable results for any rr value. These results depend on the use of asymptotic vs. exact critical values.

The Wald test statistic is

W(f(r^))=(f(xˉ2xˉ1)f(rnull)f(r^)σ^r^)2W(f(\hat{r})) = \left( \frac{f \left( \frac{\bar{x}_2}{\bar{x}_1} \right) - f(r_{null})}{f^{\prime}(\hat{r}) \hat{\sigma}_{\hat{r}}} \right)^2

where

σ^r^2=r^(1+r^)(μ^+r^μ^+θ^)n[μ^(1+r^)(μ^+θ^)θ^r^]\hat{\sigma}^{2}_{\hat{r}} = \frac{\hat{r} (1 + \hat{r}) (\hat{\mu} + \hat{r}\hat{\mu} + \hat{\theta})}{n \left[ \hat{\mu} (1 + \hat{r}) (\hat{\mu} + \hat{\theta}) - \hat{\theta}\hat{r} \right]}

Under HnullH_{null}, the Wald test statistic is asymptotically distributed as χ12\chi^2_1. The approximate level α\alpha test rejects HnullH_{null} if W(f(r^))χ12(1α)W(f(\hat{r})) \geq \chi^2_1(1 - \alpha). Note that the asymptotic critical value is known to underestimate the exact critical value. Hence, the nominal significance level may not be achieved for small sample sizes (possibly n10n \leq 10 or n50n \leq 50). The level of significance inflation also depends on f()f(\cdot) and is most severe for f(r)=r2f(r) = r^2, where only the exact critical value is recommended.

Value

A list with the following elements:

Slot Subslot Name Description
1 chisq Asymptotic χ2\chi^2 test statistic for the ratio of means.
2 df Degrees of freedom.
3 p p-value.
4 ratio Estimated ratio of means (group 2 / group 1).
4 1 estimate Point estimate.
4 2 lower Confidence interval lower bound.
4 3 upper Confidence interval upper bound.
5 mean1 Estimated mean of sample 1.
6 mean2 Estimated mean of sample 2.
7 dispersion Estimated dispersion.
8 n1 The sample size of sample 1.
9 n2 The sample size of sample 2.
10 method Method used for the results.
11 ci_level The confidence level.
12 link Link function used to transform the ratio of means in the test hypotheses.
13 ratio_null Assumed ratio of means under the null hypothesis.
14 mle_code Integer indicating why the optimization process terminated.
15 mle_message Information from the optimizer.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

Examples

#----------------------------------------------------------------------------
# wald_test_bnb() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
sim_bnb(
  n = 40,
  mean1 = 10,
  ratio = 1.2,
  dispersion = 2
) |>
  wald_test_bnb()

Wald test for NB ratio of means

Description

Wald test for the ratio of means from two independent negative binomial outcomes.

Usage

wald_test_nb(
  data,
  equal_dispersion = FALSE,
  ci_level = NULL,
  link = "log",
  ratio_null = 1,
  ...
)

Arguments

data

(list)
A list whose first element is the vector of negative binomial values from group 1 and the second element is the vector of negative binomial values from group 2. NAs are silently excluded. The default output from sim_nb().

equal_dispersion

(Scalar logical: FALSE)
If TRUE, the Wald test is calculated assuming both groups have the same population dispersion parameter. If FALSE (default), the Wald test is calculated assuming different dispersions.

ci_level

(Scalar numeric: NULL; ⁠(0, 1)⁠)
If NULL, confidence intervals are set as NA. If in ⁠(0, 1)⁠, confidence intervals are calculated at the specified level.

link

(Scalar string: "log")
The one-to-one link function for transformation of the ratio in the test hypotheses. Must be one of "log" (default), "sqrt", "squared", or "⁠identity"⁠.

ratio_null

(Scalar numeric: 1; ⁠(0, Inf)⁠)
The (pre-transformation) ratio of means assumed under the null hypothesis (group 2 / group 1). Typically ratio_null = 1 (no difference). See 'Details' for additional information.

...

Optional arguments passed to the MLE function mle_nb().

Details

This function is primarily designed for speed in simulation. Missing values are silently excluded.

Suppose X1NB(μ,θ1)X_1 \sim NB(\mu, \theta_1) and X2NB(rμ,θ2)X_2 \sim NB(r\mu, \theta_2) where X1X_1 and X2X_2 are independent, X1X_1 is the count outcome for items in group 1, X2X_2 is the count outcome for items in group 2, μ\mu is the arithmetic mean count in group 1, rr is the ratio of arithmetic means for group 2 with respect to group 1, θ1\theta_1 is the dispersion parameter of group 1, and θ2\theta_2 is the dispersion parameter of group 2.

The hypotheses for the Wald test of rr are

Hnull:f(r)=f(rnull)Halt:f(r)f(rnull)\begin{aligned} H_{null} &: f(r) = f(r_{null}) \\ H_{alt} &: f(r) \neq f(r_{null}) \end{aligned}

where f()f(\cdot) is a one-to-one link function with nonzero derivative, r=Xˉ2Xˉ1r = \frac{\bar{X}_2}{\bar{X}_1} is the population ratio of arithmetic means for group 2 with respect to group 1, and rnullr_{null} is a constant for the assumed null population ratio of means (typically rnull=1r_{null} = 1).

Rettiganti and Nagaraja (2012) found that f(r)=r2f(r) = r^2 and f(r)=rf(r) = r had greatest power when r<1r < 1. However, when r>1r > 1, f(r)=lnrf(r) = \ln r, the likelihood ratio test, and the Rao score test have greatest power. Note that f(r)=lnrf(r) = \ln r, LRT, and RST were unbiased tests while the f(r)=rf(r) = r and f(r)=r2f(r) = r^2 tests were biased when r>1r > 1. The f(r)=lnrf(r) = \ln r, LRT, and RST produced acceptable results for any rr value. These results depend on the use of asymptotic vs. exact critical values.

The Wald test statistic is

W(f(r^))=(f(xˉ2xˉ1)f(rnull)f(r^)σ^r^)2W(f(\hat{r})) = \left( \frac{f \left( \frac{\bar{x}_2}{\bar{x}_1} \right) - f(r_{null})}{f^{\prime}(\hat{r}) \hat{\sigma}_{\hat{r}}} \right)^2

where

σ^r^2=r^[n1θ^1(r^μ^+θ^2)+n2θ^2r^(μ^+θ^1)]n1n2θ^1θ^2μ^\hat{\sigma}^{2}_{\hat{r}} = \frac{\hat{r} \left[ n_1 \hat{\theta}_1 (\hat{r} \hat{\mu} + \hat{\theta}_2) + n_2 \hat{\theta}_2 \hat{r} (\hat{\mu} + \hat{\theta}_1) \right]}{n_1 n_2 \hat{\theta}_1 \hat{\theta}_2 \hat{\mu}}

Under HnullH_{null}, the Wald test statistic is asymptotically distributed as χ12\chi^2_1. The approximate level α\alpha test rejects HnullH_{null} if W(f(r^))χ12(1α)W(f(\hat{r})) \geq \chi^2_1(1 - \alpha). Note that the asymptotic critical value is known to underestimate the exact critical value. Hence, the nominal significance level may not be achieved for small sample sizes (possibly n10n \leq 10 or n50n \leq 50). The level of significance inflation also depends on f()f(\cdot) and is most severe for f(r)=r2f(r) = r^2, where only the exact critical value is recommended.

Value

A list with the following elements:

Slot Subslot Name Description
1 chisq Asymptotic χ2\chi^2 test statistic for the ratio of means.
2 df Degrees of freedom.
3 p p-value.
4 ratio Estimated ratio of means (group 2 / group 1).
4 1 estimate Point estimate.
4 2 lower Confidence interval lower bound.
4 3 upper Confidence interval upper bound.
5 mean1 Estimated mean of group 1.
6 mean2 Estimated mean of group 2.
7 dispersion1 Estimated dispersion of group 1.
8 dispersion2 Estimated dispersion of group 2.
9 n1 Sample size of group 1.
10 n2 Sample size of group 2.
11 method Method used for the results.
12 ci_level The confidence level.
13 equal_dispersion Whether or not equal dispersions were assumed.
14 link Link function used to transform the ratio of means in the test hypotheses.
15 ratio_null Assumed ratio of means under the null hypothesis.
16 mle_code Integer indicating why the optimization process terminated.
17 mle_message Information from the optimizer.

References

Rettiganti M, Nagaraja HN (2012). “Power Analyses for Negative Binomial Models with Application to Multiple Sclerosis Clinical Trials.” Journal of Biopharmaceutical Statistics, 22(2), 237–259. ISSN 1054-3406, 1520-5711, doi:10.1080/10543406.2010.528105.

Aban IB, Cutter GR, Mavinga N (2009). “Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data.” Computational Statistics & Data Analysis, 53(3), 820–833. ISSN 01679473, doi:10.1016/j.csda.2008.07.034.

Examples

#----------------------------------------------------------------------------
# wald_test_nb() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
sim_nb(
  n1 = 60,
  n2 = 40,
  mean1 = 10,
  ratio = 1.5,
  dispersion1 = 2,
  dispersion2 = 8
) |>
  wald_test_nb()