Package 'rankdifferencetest'

Title: Kornbrot's Rank Difference Test
Description: Implements Kornbrot's rank difference test as described in <doi:10.1111/j.2044-8317.1990.tb00939.x>. This method is a modified Wilcoxon signed-rank test which produces consistent and meaningful results for ordinal or monotonically-transformed data.
Authors: Brett Klamer [aut, cre]
Maintainer: Brett Klamer <[email protected]>
License: MIT + file LICENSE
Version: 2021.11.25.9000
Built: 2025-01-24 03:39:04 UTC
Source: https://bitbucket.org/bklamer/rankdifferencetest

Help Index


Coerce to a data frame

Description

Coerce's a rankdifferencetest object to a data frame.

Usage

## S3 method for class 'rankdifferencetest'
as.data.frame(x, ...)

## S3 method for class 'rankdifferencetest'
tidy(x, ...)

Arguments

x

A rankdifferencetest object.

...

Unused arguments.

Value

data.frame

Examples

#----------------------------------------------------------------------------
# as.data.frame() and tidy() examples
#----------------------------------------------------------------------------
library(rankdifferencetest)

# Use example data from Kornbrot (1990)
data <- kornbrot_table1

rdt(
  data = data,
  formula = placebo ~ drug,
  alternative = "two.sided",
  distribution = "asymptotic",
  zero.method = "wilcoxon",
  correct = FALSE,
  ci = TRUE
) |> as.data.frame()

rdt(
  data = data,
  formula = placebo ~ drug,
  alternative = "two.sided",
  distribution = "asymptotic",
  zero.method = "wilcoxon",
  correct = FALSE,
  ci = TRUE
) |> tidy()

Alertness example data

Description

An example dataset as seen in table 1 from Kornbrot (1990). The time per problem was recorded for each subject under placebo and drug conditions for the purpose of measuring 'alertness'.

Usage

kornbrot_table1

Format

A data frame with 13 rows and 3 variables:

subject

Subject identifier

placebo

The time required to complete a task under the placebo condition

drug

The time required to complete a task under the drug condition

Details

The table 1 values appear to be rounded, thus results do not match exactly with further calculations in Kornbrot (1990).

Source

Kornbrot DE (1990). “The rank difference test: A new and meaningful alternative to the Wilcoxon signed ranks test for ordinal data.” British Journal of Mathematical and Statistical Psychology, 43(2), 241–264. ISSN 00071102, doi:10.1111/j.2044-8317.1990.tb00939.x.


Rank difference test

Description

Performs Kornbrot's rank difference test, which is a modified Wilcoxon signed-rank test that produces consistent and meaningful results for ordinal or monotonically transformed data.

Usage

rdt(
  data,
  formula,
  ci = FALSE,
  ci.level = 0.95,
  alternative = "two.sided",
  mu = 0,
  distribution = NULL,
  correct = TRUE,
  zero.method = "wilcoxon",
  tol.root = 1e-04
)

Arguments

data

A data frame.

formula

A formula of form:

y ~ x | block

Use when data is in tall format. y is the numeric outcome, x is the binary explanatory variable, and block is the subject/item-level variable. If x is a factor, the first level will be the reference value. e.g., levels(data$x) <- c("pre", "post") will result in the difference post - pre.

y ~ x

Use when data is in wide format. Differences are calculated as data$y - data$x.

ci

A scalar logical. Whether or not to calculate the pseudomedian and its confidence interval.

ci.level

A scalar numeric from (0, 1). The confidence level.

alternative

A string for the alternative hypothesis: "two.sided" (default), "greater", or "less".

mu

A scalar numeric from (-Inf, Inf). Under the null hypothesis, x or x - y is assumed to be symmetric around mu.

distribution

A string for the method used to calculate the p-value. If "exact", the exact Wilcoxon signed-rank distribution is used. If "asymptotic", the asymptotic normal approximation is used. The default (NULL) will automatically choose an appropriate method (distribution = "exact" when n < 50 or distribution = "asymptotic" otherwise).

correct

A scalar logical. Whether or not to apply a continuity correction for the normal approximation of the p-value.

zero.method

A string for the method used to handle values equal to zero: "wilcoxon" (default) or "pratt".

tol.root

A numeric scalar from (0, Inf). For stats::uniroot⁠(*, tol=tol.root)⁠ calls when ci = TRUE and distribution = "asymptotic".

Details

For paired data, the Wilcoxon signed-rank test results in subtraction of the paired values. However, this subtraction is not meaningful for ordinal scale variables. In addition, any monotone transformation of the data will result in different signed ranks, thus different p-values. However, ranking the original data allows for meaningful addition and subtraction of ranks and preserves ranks over monotonic transformation. Kornbrot developed the rank difference test for these reasons.

Kornbrot recommends that the rank difference test be used in preference to the Wilcoxon signed-rank test in all paired comparison designs where the data are not both of interval scale and of known distribution. The rank difference test preserves good power compared to Wilcoxon's signed-rank test, is more powerful than the sign test, and has the benefit of being a true distribution-free test.

The procedure for Kornbrot's rank difference test is as follows:

  1. Combine all 2n2n paired observations.

  2. Order the values from smallest to largest.

  3. Assign ranks 1,2,,2n1, 2, \dots, 2n with average rank for ties.

  4. Perform the Wilcoxon signed-rank test using the paired ranks.

The test statistic for the rank difference test (D)(D) is not exactly equal to the test statistic of the naive rank-transformed Wilcoxon signed-rank test (W+)(W^+), the latter being implemented in rdt(). Using W+W^+ should result in a conservative estimate for DD, and they approach in distribution as the sample size increases. Kornbrot (1990) discusses methods for calculating DD when n<7n<7 and 8<n208 < n \leq 20.

See srt() for additional details about implementation of Wilcoxon's signed-rank test.

Value

A list with the following elements:

Slot Subslot Name Description
1 statistic Test statistic. W+W^+ for the exact Wilcoxon signed-rank distribution or ZZ for the asymptotic normal approximation.
2 p p-value.
3 alternative The alternative hypothesis.
4 method Method used for test results.
5 formula Model formula.
6 pseudomedian Measure of centrality for y - x. Not calculated when argument ci = FALSE.
6 1 estimate Estimated pseudomedian.
6 2 lower Lower bound of confidence interval for the pseudomedian.
6 3 upper Upper bound of confidence interval for the pseudomedian.
6 4 ci.level.requested The chosen ci.level.
6 5 ci.level.achieved For pathological cases, the achievable confidence level.
6 6 estimate.method Method used for calculating the pseudomedian.
6 7 ci.method Method used for calculating the confidence interval.
7 n Number of observations
7 1 original The number of observations contained in data.
7 2 nonmissing The number of non-missing observations available for analysis.
7 3 zero.adjusted The number of non-missing and non-zero values available for analysis. i.e. n$nonmissing - n$zeros.
7 4 zeros The number of values that were zero.
7 5 ties The number of values that were tied.

References

Kornbrot DE (1990). “The rank difference test: A new and meaningful alternative to the Wilcoxon signed ranks test for ordinal data.” British Journal of Mathematical and Statistical Psychology, 43(2), 241–264. ISSN 00071102, doi:10.1111/j.2044-8317.1990.tb00939.x.

See Also

srt()

Examples

#----------------------------------------------------------------------------
# rdt() example
#----------------------------------------------------------------------------
library(rankdifferencetest)

# Use example data from Kornbrot (1990)
data <- kornbrot_table1

# Create long-format data for demonstration purposes
data_long <- reshape(
  data = kornbrot_table1,
  direction = "long",
  varying = c("placebo", "drug"),
  v.names = c("time"),
  idvar = "subject",
  times = c("placebo", "drug"),
  timevar = "treatment",
  new.row.names = seq_len(prod(length(c("placebo", "drug")), nrow(kornbrot_table1)))
)

# Subject and treatment should be factors. The ordering of the treatment factor
# will determine the difference (placebo - drug).
data_long$subject <- factor(data_long$subject)
data_long$treatment <- factor(data_long$treatment, levels = c("drug", "placebo"))

# Recreate analysis and results from section 7.1 in Kornbrot (1990)
## The p-value shown in Kornbrot (1990) was continuity corrected. rdt() does
## not apply a continuity correction, so the p-value here will be slightly
## lower. It does match the uncorrected p-value shown in footnote on page 246.
rdt(
  data = data,
  formula = placebo ~ drug,
  alternative = "two.sided",
  distribution = "asymptotic",
  zero.method = "wilcoxon",
  correct = FALSE,
  ci = TRUE
)
rdt(
  data = data_long,
  formula = time ~ treatment | subject,
  alternative = "two.sided",
  distribution = "asymptotic",
  zero.method = "wilcoxon",
  correct = FALSE,
  ci = TRUE
)

# The same outcome is seen after transforming time to rate.
## The rate transformation inverts the rank ordering.
data$placebo_rate <- 60 / data$placebo
data$drug_rate <- 60 / data$drug
data_long$rate <- 60 / data_long$time

rdt(
  data = data,
  formula = placebo_rate ~ drug_rate,
  alternative = "two.sided",
  distribution = "asymptotic",
  zero.method = "wilcoxon",
  correct = FALSE,
  ci = TRUE
)
rdt(
  data = data_long,
  formula = rate ~ treatment | subject,
  alternative = "two.sided",
  distribution = "asymptotic",
  zero.method = "wilcoxon",
  correct = FALSE,
  ci = TRUE
)

# In contrast to the rank difference test, the Wilcoxon signed-rank test
# produces differing results. See table 1 and table 2 in Kornbrot (1990).
wilcox.test(
  x = data$placebo,
  y = data$drug,
  alternative = "two.sided",
  paired = TRUE,
  exact = FALSE,
  correct = FALSE
)$p.value/2
wilcox.test(
  x = data$placebo_rate,
  y = data$drug_rate,
  alternative = "two.sided",
  paired = TRUE,
  exact = FALSE,
  correct = FALSE
)$p.value/2

Signed-rank test

Description

Performs Wilcoxon's signed-rank test.

Usage

srt(
  data,
  formula,
  ci = FALSE,
  ci.level = 0.95,
  alternative = "two.sided",
  mu = 0,
  distribution = NULL,
  correct = TRUE,
  zero.method = "wilcoxon",
  tol.root = 1e-04,
  digits.rank = Inf
)

Arguments

data

A data frame.

formula

A formula of form:

y ~ x | block

Use when data is in tall format. y is the numeric outcome, x is the binary explanatory variable, and block is the subject/item-level variable. If x is a factor, the first level will be the reference value. e.g., levels(data$x) <- c("pre", "post") will result in the difference post - pre.

y ~ x

Use when data is in wide format. Differences are calculated as data$y - data$x.

~ x

Use when data$x represents pre-calculated differences or for the one sample case.

ci

A scalar logical. Whether or not to calculate the pseudomedian and its confidence interval.

ci.level

A scalar numeric from (0, 1). The confidence level.

alternative

A string for the alternative hypothesis: "two.sided" (default), "greater", or "less".

mu

A scalar numeric from (-Inf, Inf). Under the null hypothesis, x or x - y is assumed to be symmetric around mu.

distribution

A string for the method used to calculate the p-value. If "exact", the exact Wilcoxon signed-rank distribution is used. If "asymptotic", the asymptotic normal approximation is used. The default (NULL) will automatically choose an appropriate method (distribution = "exact" when n < 50 or distribution = "asymptotic" otherwise).

correct

A scalar logical. Whether or not to apply a continuity correction for the normal approximation of the p-value.

zero.method

A string for the method used to handle values equal to zero: "wilcoxon" (default) or "pratt".

tol.root

A numeric scalar from (0, Inf). For stats::uniroot⁠(*, tol=tol.root)⁠ calls when ci = TRUE and distribution = "asymptotic".

digits.rank

A numeric scalar from (0, Inf]. If finite, base::rank(base::signif⁠(abs(diffs), digits.rank))⁠ will be used to compute ranks for the test statistic instead of (the default) rank(abs(diffs)). e.g. digits.rank = 7 can improve stability in determination of ties because they no longer depend on extremely small numeric differences.

Details

The procedure for Wilcoxon's signed-rank test is as follows:

  1. For one-sample data x or paired samples x and y, where mu is the measure of center under the null hypothesis, define the 'values' used for analysis as (x - mu) or (x - y - mu).

  2. Define 'zero' values as (x - mu == 0) or (x - y - mu == 0).

    • zero.method = "wilcoxon": Remove values equal to zero.

    • zero.method = "pratt": Keep values equal to zero.

  3. Order the absolute values from smallest to largest.

  4. Assign ranks 1,,n1, \dots, n to the ordered absolute values, using mean rank for ties.

  5. zero.method = "pratt": remove values equal to zero and their corresponding ranks.

  6. Calculate W+W^+ as the sum of the ranks for positive values. The sum of W+W^+ and WW^- is n(n+1)/2n(n+1)/2, so either can be calculated from the other. If the null hypothesis is true, W+W^+ and WW^- are expected to be similar in value.

  7. Calculate the test statistic

    • distribution = "exact": Use W+W^+ as the test statistic. W+W^+ takes values between 0 and n(n+1)/2n(n+1)/2. Under the null hypothesis, its expected mean and variance are E0(W+)=n(n+1)/4E_0(W^+) = n(n+1)/4 and Var0(W+)=(n(n+1)(2n+1))/24Var_0(W^+) = (n(n+1)(2n+1))/24.

    • distribution = "asymptotic": Let Z=W+E0(W+)Var0(W+)1/2Z=\frac{W^+ - E_0(W^+)}{Var_0(W^+)^{1/2}} be the standardized version of W+W^+, then ZN(0,1)Z \sim N(0, 1) asymptotically. If there are ties, use the adjusted variance Var0(W+)=n(n+1)(2n+1)24(t3t)48Var_0(W^+) = \frac{n(n+1)(2n+1)}{24} - \frac{\sum(t^3 - t)}{48}, where tt is the number of ties for each unique ranked absolute 'value'.

      • zero.method = "pratt": The expected mean and variance are modified to be E0(W+)=n(n+1)4(nzeros(nzeros+1)4E_0(W^+) = \frac{n(n+1)}{4} - \frac{(n_{zeros}(n_{zeros}+1)}{4} and Var0(W+)=n(n+1)(2n+1)nzeros(nzeros+1)(2nzeros+1)24Var_0(W^+) = \frac{n(n+1)(2n+1) - n_{zeros}(n_{zeros}+1)(2n_{zeros}+1)}{24}.

      • correct = TRUE: For two-sided, greater than, and less than alternatives, define the continuity correction as sign(W+)0.5\text{sign}(W^+)0.5, 0.50.5, and 0.5-0.5, respectively. ZZ is redefined as Z=W+E0(W+)correctionVar0(W+)1/2Z = \frac{W^+ - E_0(W^+) - \text{correction}}{Var_0(W^+)^{1/2}}.

  8. Calculate the p-value

    • distribution = "exact": Use the Wilcoxon signed-rank distribution to calculate the probability of being as or more extreme than W+W^+.

    • distribution = "asymptotic": Use the standard normal distribution to calculate the probability of being as or more extreme than ZZ. See stats::pnorm().

zero.method = "pratt" uses the method by Pratt (1959), which first rank-transforms the absolute values, including zeros, and then removes the ranks corresponding to the zeros. zero.method = "wilcoxon" uses the method by Wilcoxon (1950), which first removes the zeros and then rank-transforms the remaining absolute values. Conover (1973) found that when comparing a discrete uniform distribution to a distribution where probabilities linearly increase from left to right, Pratt's method outperforms Wilcoxon's. When testing a binomial distribution centered at zero to see whether the parameter of each Bernoulli trial is 12\frac{1}{2}, Wilcoxon's method outperforms Pratt's.

When ci = TRUE, a pseudomedian and it's confidence interval are returned. For exact tests, The Hodges-Lehman estimate is used and the confidence bounds are estimated by calculating an appropriate quantile of the pairwise averages (Walsh averages). For asymptotic tests, the pseudomedian is estimated using stats::uniroot() to search for a root of the asymptotic normal approximation of the Wilcoxon signed-rank distribution, with similar strategy for the confidence bounds.

The signed rank test traditionally assumes the values are independent and identically distributed, with a continuous and symmetric distribution. The hypotheses are stated as:

  • Null: (x) or (x - y) is centered at mu.

  • Two-sided alternative: (x) or (x - y) is not centered at mu.

  • Greater than alternative: (x) or (x - y) is centered at a value greater than mu.

  • Less than alternative: (x) or (x - y) is centered at a value less than mu.

However, not all of these assumptions are required (Pratt and Gibbons 1981). The 'identically distributed' assumption is not required, keeping the level of test as expected for the hypotheses as stated above. The symmetry assumption is not required when using one-sided alternative hypotheses as:

  • Null: (x) or (x - y) is symmetric and centered at mu.

  • Greater than alternative: (x) or (x - y) is stochastically larger than mu.

  • Less than alternative: (x) or (x - y) is stochastically smaller than mu.

stats::wilcox.test() is the canonical function for the Wilcoxon signed-rank test. Improvements and updated methods were introduced in exactRankTests::wilcox.exact() and later coin::wilcoxsign_test(). srt() attempts to refactor these functions so the best features of each is available in a fast and easy to use format.

Value

A list with the following elements:

Slot Subslot Name Description
1 statistic Test statistic. W+W^+ for the exact Wilcoxon signed-rank distribution or ZZ for the asymptotic normal approximation.
2 p p-value.
3 alternative The alternative hypothesis.
4 method Method used for test results.
5 formula Model formula.
6 pseudomedian Measure of centrality for x or y - x. Not calculated when argument ci = FALSE.
6 1 estimate Estimated pseudomedian.
6 2 lower Lower bound of confidence interval for the pseudomedian.
6 3 upper Upper bound of confidence interval for the pseudomedian.
6 4 ci.level.requested The chosen ci.level.
6 5 ci.level.achieved For pathological cases, the achievable confidence level.
6 6 estimate.method Method used for calculating the pseudomedian.
6 7 ci.method Method used for calculating the confidence interval.
7 n Number of observations
7 1 original The number of observations contained in data.
7 2 nonmissing The number of non-missing observations available for analysis.
7 3 zero.adjusted The number of non-missing and non-zero values available for analysis. i.e. n$nonmissing - n$zeros.
7 4 zeros The number of values that were zero.
7 5 ties The number of values that were tied.

References

Wilcoxon F (1950). “SOME RAPID APPROXIMATE STATISTICAL PROCEDURES.” Annals of the New York Academy of Sciences, 52(6), 808–814. ISSN 00778923, 17496632, doi:10.1111/j.1749-6632.1950.tb53974.x.

Pratt JW, Gibbons JD (1981). Concepts of Nonparametric Theory, Springer Series in Statistics. Springer New York, New York, NY. ISBN 9781461259336 9781461259312, doi:10.1007/978-1-4612-5931-2.

Pratt JW (1959). “Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures.” Journal of the American Statistical Association, 54(287), 655–667. ISSN 0162-1459, 1537-274X, doi:10.1080/01621459.1959.10501526.

Cureton EE (1967). “The Normal Approximation to the Signed-Rank Sampling Distribution When Zero Differences are Present.” Journal of the American Statistical Association, 62(319), 1068–1069. ISSN 0162-1459, 1537-274X, doi:10.1080/01621459.1967.10500917.

Conover WJ (1973). “On Methods of Handling Ties in the Wilcoxon Signed-Rank Test.” Journal of the American Statistical Association, 68(344), 985–988. ISSN 0162-1459, 1537-274X, doi:10.1080/01621459.1973.10481460.

Hollander M, Wolfe DA, Chicken E (2014). Nonparametric statistical methods, Third edition edition. John Wiley & Sons, Inc, Hoboken, New Jersey. ISBN 9780470387375.

Bauer DF (1972). “Constructing Confidence Sets Using Rank Statistics.” Journal of the American Statistical Association, 67(339), 687–690. ISSN 0162-1459, 1537-274X, doi:10.1080/01621459.1972.10481279.

Streitberg B, Röhmel J (1984). “Exact nonparametrics in APL.” SIGAPL APL Quote Quad, 14(4), 313–325. ISSN 0163-6006, doi:10.1145/384283.801115.

Hothorn T (2001). “On exact rank tests in R.” R News, 1(1), 11-12. ISSN 1609-3631, https://journal.r-project.org/articles/RN-2001-002/.

Hothorn T, Hornik K (2002). “Exact Nonparametric Inference in R.” In Härdle W, Rönz B (eds.), Compstat, 355–360. Physica-Verlag HD, Heidelberg. ISBN 9783790815177 9783642574894, doi:10.1007/978-3-642-57489-4_52.

Hothorn T, Lausen B (2003). “On the exact distribution of maximally selected rank statistics.” Computational Statistics & Data Analysis, 43(2), 121–137. ISSN 01679473, doi:10.1016/S0167-9473(02)00225-6.

Hothorn T, Hornik K, Wiel MAVD, Zeileis A (2008). “Implementing a Class of Permutation Tests: The coin Package.” Journal of Statistical Software, 28(8). ISSN 1548-7660, doi:10.18637/jss.v028.i08.

See Also

stats::wilcox.test(), coin::wilcoxsign_test(), rdt()

Examples

#----------------------------------------------------------------------------
# srt() example
#----------------------------------------------------------------------------
library(rankdifferencetest)

# Use example data from Kornbrot (1990)
data <- kornbrot_table1

# The rate transformation inverts the rank ordering.
data$placebo_rate <- 60 / data$placebo
data$drug_rate <- 60 / data$drug

# In contrast to the rank difference test, the Wilcoxon signed-rank test
# produces differing results. See table 1 and table 2 in Kornbrot (1990).
srt(
  formula = placebo ~ drug,
  data = data,
  alternative = "two.sided",
  distribution = "asymptotic",
  correct = FALSE,
  zero.method = "wilcoxon",
  ci = TRUE
)
srt(
  formula = placebo_rate ~ drug_rate,
  data = data,
  alternative = "two.sided",
  distribution = "asymptotic",
  correct = FALSE,
  zero.method = "wilcoxon",
  ci = TRUE
)