Yule's Q and Y, Tschuprow's T

`CramerV.Rd`

Calculate Cramer's V, Pearson's contingency coefficient and phi,
Yule's Q and Y and Tschuprow's T of `x`

, if `x`

is a table. If both, `x`

and `y`

are given, then the according table will be built first.

```
Phi(x, y = NULL, ...)
ContCoef(x, y = NULL, correct = FALSE, ...)
CramerV(x, y = NULL, conf.level = NA,
method = c("ncchisq", "ncchisqadj", "fisher", "fisheradj"),
correct = FALSE, ...)
YuleQ(x, y = NULL, ...)
YuleY(x, y = NULL, ...)
TschuprowT(x, y = NULL, correct = FALSE, ...)
```

- x
can be a numeric vector, a matrix or a table.

- y
NULL (default) or a vector with compatible dimensions to

`x`

. If y is provided,`table(x, y, ...)`

is calculated.- conf.level
confidence level of the interval. This is only implemented for Cramer's V. If set to

`NA`

(which is the default) no confidence interval will be calculated.

See examples for calculating bootstrap intervals.- method
string defining the method to calculate confidence intervals for Cramer's V. One out of

`"ncchisq"`

(using noncentral chisquare),`"ncchisqadj"`

,`"fisher"`

(using fisher z transformation),`"fisheradj"`

(using fisher z transformation and bias correction). Default is`"ncchisq"`

.- correct
logical. Applying to

`ContCoef`

this indicates, whether the Sakoda's adjusted Pearson's C should be returned. For`CramerV()`

and`TschuprowT()`

it defines, whether a bias correction should be applied or not. Default is`FALSE`

.- ...
further arguments are passed to the function

`table`

, allowing i.e. to set`useNA`

.

For x either a matrix or two vectors `x`

and `y`

are expected. In latter case `table(x, y, ...)`

is calculated.
The function handles `NAs`

the same way the `table`

function does, so tables are by default calculated with `NAs`

omitted.

A provided matrix is interpreted as a contingency table, which seems in the case of frequency data the natural interpretation
(this is e.g. also what `chisq.test`

expects).

Use the function `PairApply`

(pairwise apply) if the measure should be calculated pairwise for all columns.
This allows matrices of association measures to be calculated the same way `cor`

does. `NAs`

are by default omitted pairwise,
which corresponds to the `pairwise.complete`

option of `cor`

.
Use `complete.cases`

, if only the complete cases of a `data.frame`

are to be used. (see examples)

The maximum value for Phi is \(\sqrt(min(r, c) - 1)\). The contingency coefficient goes from 0 to \(\sqrt(\frac{min(r, c) - 1}{min(r, c)})\). For the corrected contingency coefficient and for Cramer's V the range is 0 to 1.

A Cramer's V in the range of [0, 0.3] is considered as weak, [0.3,0.7] as medium and > 0.7 as strong.
The minimum value for all is 0 under statistical independence.

a single numeric value if no confidence intervals are requested,

and otherwise a numeric vector with 3 elements for the estimate, the lower and the upper confidence interval

Yule, G. Uday (1912) On the methods of measuring association between two attributes. *Journal of the Royal Statistical Society, LXXV*, 579-652

Tschuprow, A. A. (1939) *Principles of the Mathematical Theory of Correlation*, translated by M. Kantorowitsch. W. Hodge & Co.

Cramer, H. (1946) *Mathematical Methods of Statistics*. Princeton University Press

Agresti, Alan (1996) *Introduction to categorical data analysis*. NY: John Wiley and Sons

Sakoda, J.M. (1977) Measures of Association for Multivariate Contingency Tables,
*Proceedings of the Social Statistics Section of the American Statistical Association* (Part III), 777-780.

Smithson, M.J. (2003) *Confidence Intervals, Quantitative Applications in the Social Sciences Series*, No. 140. Thousand Oaks, CA: Sage. pp. 39-41

Bergsma, W. (2013) A bias-correction for Cramer's V and Tschuprow's T
*Journal of the Korean Statistical Society* 42(3) DOI: 10.1016/j.jkss.2012.10.002

```
tab <- table(d.pizza$driver, d.pizza$wine_delivered)
Phi(tab)
#> [1] 0.1328222
ContCoef(tab)
#> [1] 0.1316659
CramerV(tab)
#> [1] 0.1328222
TschuprowT(tab)
#> [1] 0.08486583
# just x and y
CramerV(d.pizza$driver, d.pizza$wine_delivered)
#> [1] 0.1328222
# data.frame
PairApply(d.pizza[,c("driver","operator","area")], CramerV, symmetric = TRUE)
#> driver operator area
#> driver 1.0000000 0.23585686 0.65018461
#> operator 0.2358569 1.00000000 0.08670047
#> area 0.6501846 0.08670047 1.00000000
# useNA is passed to table
PairApply(d.pizza[,c("driver","operator","area")], CramerV,
useNA="ifany", symmetric = TRUE)
#> driver operator area
#> driver 1.0000000 0.20253639 0.53066544
#> operator 0.2025364 1.00000000 0.07847762
#> area 0.5306654 0.07847762 1.00000000
d.frm <- d.pizza[,c("driver","operator","area")]
PairApply(d.frm[complete.cases(d.frm),], CramerV, symmetric = TRUE)
#> driver operator area
#> driver 1.0000000 0.2345141 0.6504665
#> operator 0.2345141 1.0000000 0.0869935
#> area 0.6504665 0.0869935 1.0000000
m <- as.table(matrix(c(2,4,1,7), nrow=2))
YuleQ(m)
#> [1] 0.5555556
YuleY(m)
#> [1] 0.303337
# Bootstrap confidence intervals for Cramer's V
# http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf, p. 1821
tab <- as.table(rbind(
c(26,26,23,18, 9),
c( 6, 7, 9,14,23)))
d.frm <- Untable(tab)
n <- 1000
idx <- matrix(sample(nrow(d.frm), size=nrow(d.frm) * n, replace=TRUE), ncol=n, byrow=FALSE)
v <- apply(idx, 2, function(x) CramerV(d.frm[x,1], d.frm[x,2]))
quantile(v, probs=c(0.025,0.975))
#> 2.5% 97.5%
#> 0.2814951 0.5600137
# compare this to the analytical ones
CramerV(tab, conf.level=0.95)
#> Cramer V lwr.ci upr.ci
#> 0.4064888 0.2211672 0.5410622
```