binGroup2 {binGroup2} | R Documentation |

Methods for the group testing identification and estimation problems.

Methods for identification of positive items in group testing designs: Operating characteristics (e.g., expected number of tests) are calculated for commonly used hierarchical and array-based algorithms. Optimal testing configurations for an algorithm can be found as well. Please see Hitt et al. (2019) for specific details.

Methods for estimation and inference for proportions in group testing designs: For estimating one proportion or the difference of proportions, confidence interval methods are included that account for different pool sizes. Functions for hypothesis testing of proportions, calculation of power, and calculation of the expected width of confidence intervals are also included. Furthermore, regression methods and simulation of group testing data are implemented for simple pooling, halving, and array testing designs.

The `binGroup2`

package is based upon the `binGroup`

package that
was originally designed for the group testing estimation problem. Over time,
additional functions for estimation and for the group testing identification
problem were included. Due to the diverse styles resulting from these
additions, we have created `binGroup2`

as a way to unify functions in
a coherent structure and incorporate additional functions for
identification. The name “binGroup” originates from the assumption in basic
estimation for group testing that the number of positive groups has a
binomial distribution. While more advanced estimation methods no longer make
this assumption, we continue with the `binGroup`

name for consistency.

Bilder (2019a,b) provide introductions to group testing. These papers and additional details about group testing are available at http://chrisbilder.com/grouptesting/.

This research was supported by the National Institutes of Health under grant R01 AI121351.

The binGroup2 package focuses on the group testing identification problem using hierarchical and array-based group testing algorithms.

The `OTC1`

function implements a number of group testing
algorithms, described in Hitt et al. (2019), which calculate the operating
characteristics and find the optimal testing configuration over a range of
possible initial group sizes and/or testing configurations (sets of
subsequent group sizes). The `OTC2`

function does the same with
a multiplex assay that tests for two diseases.

The `operatingCharacteristics1`

(`opChar1`

) and
`operatingCharacteristics2`

(`opChar2`

) functions
calculate operating characteristics for a specified testing configuration
with assays that test for one and two diseases, respectively.

These functions allow the sensitivity and specificity to differ across stages of testing. This means that the accuracy of the diagnostic test can differ for stages in a hierarchical testing algorithm or between row/column testing and individual testing in an array testing algorithm.

The binGroup2 package also provides functions for estimation and inference for proportions in group testing designs.

The `propCI`

function calculates the point estimate and
confidence intervals for a single proportion from group testing data.
The `propDiffCI`

function does the same for the difference of
proportions. A number of confidence interval methods are available for
groups of equal or different sizes.

The `gtWidth`

function calculates the expected width of
confidence intervals in group testing. The `gtTest`

function
calculates p-values for hypothesis tests of single proportions. The
`gtPower`

function calculates power to reject a hypothesis.

The `designPower`

function iterates either the number of groups
or group size in a one-parameter group testing design until a pre-specified
power level is achieved. The `designEst`

function finds the
optimal group size corresponding to the minimal mean-squared error of the
point estimator.

The `gtReg`

function implements regression methods and the
`gtSim`

function simulates group testing data for simple
pooling, halving, and array testing designs.

**Maintainer**: Brianna Hitt brianna.hitt@huskers.unl.edu (ORCID)

Authors:

Christopher Bilder (ORCID)

Frank Schaarschmidt (ORCID)

Brad Biggerstaff (ORCID)

Christopher McMahan (ORCID)

Joshua Tebbs (ORCID)

Other contributors:

Boan Zhang [contributor]

Michael Black [contributor]

Peijie Hou [contributor]

Peng Chen [contributor]

Altman, D., Bland, J. (1994).
“Diagnostic tests 1: Sensitivity and specificity.”
*BMJ*, **308**, 1552.

Altman, D., Bland, J. (1994).
“Diagnostic tests 2: Predictive values.”
*BMJ*, **309**, 102.

Biggerstaff, B. (2008).
“Confidence intervals for the difference of proportions estimated from pooled samples.”
*Journal of Agricultural, Biological, and Environmental Statistics*, **13**, 478–496.
doi: 10.1198/108571108X379055, https://doi.org/10.1198/108571108X379055.

Bilder, C., Tebbs, J., Chen, P. (2010).
“Informative retesting.”
*Journal of the American Statistical Association*, **105**, 942–955.
doi: 10.1198/jasa.2010.ap09231, https://doi.org/10.1198/jasa.2010.ap09231.

Bilder, C., Tebbs, J., McMahan, C. (2019).
“Informative group testing for multiplex assays.”
*Biometrics*, **75**, 278–288.
doi: 10.1111/biom.12988, https://doi.org/10.1111/biom.12988.

Bilder, C. (2019a).
“Group Testing for Estimation.”
*Wiley StatsRef: Statistics Reference Online*.
doi: 10.1002/9781118445112.stat08231, https://doi.org/10.1002/9781118445112.stat08231.

Bilder, C. (2019b).
“Group Testing for Identification.”
*Wiley StatsRef: Statistics Reference Online*.
doi: 10.1002/9781118445112.stat08227, https://doi.org/10.1002/9781118445112.stat08227.

Black, M., Bilder, C., Tebbs, J. (2012).
“Group testing in heterogeneous populations by using halving algorithms.”
*Journal of the Royal Statistical Society. Series C: Applied Statistics*, **61**, 277–290.
doi: 10.1111/j.1467-9876.2011.01008.x, https://doi.org/10.1111/j.1467-9876.2011.01008.x.

Black, M., Bilder, C., Tebbs, J. (2015).
“Optimal retesting configurations for hierarchical group testing.”
*Journal of the Royal Statistical Society. Series C: Applied Statistics*, **64**, 693–710.
doi: 10.1111/rssc.12097, https://doi.org/10.1111/rssc.12097.

Graff, L., Roeloffs, R. (1972).
“Group testing in the presence of test error; an extension of the Dorfman procedure.”
*Technometrics*, **14**, 113–122.
doi: 10.1080/00401706.1972.10488888, https://doi.org/10.1080/00401706.1972.10488888.

Hepworth, G. (1996).
“Exact confidence intervals for proportions estimated by group testing.”
*Biometrics*, **52**, 1134–1146.

Hepworth, G., Biggerstaff, B. (2017).
“Bias correction in estimating proportions by pooled testing.”
*Journal of Agricultural, Biological, and Environmental Statistics*, **22**, 602–614.
doi: 10.1007/s13253-017-0297-2, https://doi.org/10.1007/s13253-017-0297-2.

Hitt, B., Bilder, C., Tebbs, J., McMahan, C. (2019).
“The objective function controversy for group testing: Much ado about nothing?”
*Statistics in Medicine*, **38**, 4912–4923.
doi: 10.1002/sim.8341, https://doi.org/10.1002/sim.8341.

Hou, P., Tebbs, J., Wang, D., McMahan, C., Bilder, C. (2020).
“Array testing with multiplex assays.”
To appear in *Biostatistics*.

Malinovsky, Y., Albert, P., Roy, A. (2016).
“Reader reaction: A note on the evaluation of group testing algorithms in the presence of misclassification.”
*Biometrics*, **72**, 299–302.
doi: 10.1111/biom.12385, https://doi.org/10.1111/biom.12385.

McMahan, C., Tebbs, J., Bilder, C. (2012a).
“Informative Dorfman Screening.”
*Biometrics*, **68**, 287–296.
doi: 10.1111/j.1541-0420.2011.01644.x, https://doi.org/10.1111/j.1541-0420.2011.01644.x.

McMahan, C., Tebbs, J., Bilder, C. (2012b).
“Two-Dimensional Informative Array Testing.”
*Biometrics*, **68**, 793–804.
doi: 10.1111/j.1541-0420.2011.01726.x, https://doi.org/10.1111/j.1541-0420.2011.01726.x.

Schaarschmidt, F. (2007).
“Experimental design for one-sided confidence intervals or hypothesis tests in binomial group testing.”
*Communications in Biometry and Crop Science*, **2**, 32–40.
ISSN 1896-0782.

Swallow, W. (1985).
“Group testing for estimating infection rates and probabilities of disease transmission.”
*Phytopathology*, **75**, 882–889.
doi: 10.1094/Phyto-75-882, https://doi.org/10.1094/Phyto-75-882.

Tebbs, J., Bilder, C. (2004).
“Confidence interval procedures for the probability of disease transmission in multiple-vector-transfer designs.”
*Journal of Agricultural, Biological, and Environmental Statistics*, **9**, 75–90.
doi: 10.1198/1085711043127, https://doi.org/10.1198/1085711043127.

Vansteelandt, S., Goetghebeur, E., Verstraeten, T. (2000).
“Regression models for disease prevalence with diagnostic tests on pools of serum samples.”
*Biometrics*, **56**, 1126–1133.
doi: 10.1111/j.0006-341x.2000.01126.x, https://doi.org/10.1111/j.0006-341x.2000.01126.x.

Xie, M. (2001).
“Regression analysis of group testing samples.”
*Statistics in Medicine*, **20**, 1957–1969.
doi: 10.1002/sim.817, https://doi.org/10.1002/sim.817.

# Estimated running time for all examples was calculated # using a computer with 16 GB of RAM and one core of # an Intel i7-6500U processor. Please take this into # account when interpreting the run times given. # 1) Identification using hierarchical and array-based group testing # algorithms with an assay that tests for one disease. # 1.1) Find the optimal testing configuration over a range of initial # group sizes, using informative three-stage hierarchical testing, where # p denotes the overall prevalence of disease; # Se denotes the sensitivity of the diagnostic test; # Sp denotes the specificity of the diagnostic test; # group.sz denotes the range of initial pool sizes for consideration; and # obj.fn specifies the objective functions for which to find results. # This example takes approximately 25 seconds to run. set.seed(1002) results1 <- OTC1(algorithm = "ID3", p = 0.01, Se = 0.95, Sp = 0.95, group.sz = 3:30, obj.fn = "ET", alpha = 2) summary(results1) # 1.2) Find the optimal testing configuration using non-informative # array testing without master pooling. # The sensitivity and specificity differ for row/column testing and # individual testing. # This example takes approximately 15 seconds to run. results2 <- OTC1(algorithm = "A2", p = 0.05, Se = c(0.95, 0.99), Sp = c(0.95, 0.98), group.sz = 3:20, obj.fn = "ET") summary(results2) # 1.3) Calculate the operating characteristics using informative # two-stage hierarchical (Dorfman) testing, implemented via the # pool-specific optimal Dorfman (PSOD) method described in # McMahan et al. (2012a). # Hierarchical testing configurations are specified by a matrix # in the hier.config argument. The rows of the matrix correspond # to the stages of the hierarchical testing algorithm, the columns # correspond to the individuals to be tested, and the cell values # correspond to the group number of each individual at each stage. config.mat <- matrix(data = c(rep(1, 5), rep(2, 4), 3, 1:10), nrow = 2, ncol = 10, byrow = TRUE) set.seed(8791) results3 <- opChar1(algorithm = "ID2", p = 0.02, Se = 0.95, Sp = 0.99, hier.config = config.mat, alpha = 0.5) summary(results3) # 1.4) Calculate the operating characteristics using non-informative # four-stage hierarchical testing. config.mat <- matrix(data = c(rep(1, 15), rep(c(1, 2, 3), each = 5), rep(1, 3), rep(2, 2), rep(3, 3), rep(4, 2), rep(5, 4), 6, 1:15), nrow = 4, ncol = 15, byrow = TRUE) results4 <- opChar1(algorithm = "D4", p = 0.008, Se = 0.96, Sp = 0.98, hier.config = config.mat, a = c(1, 4, 6, 9, 11, 15)) summary(results4) # 2) Identification using hierarchical and array-based group testing # algorithms with a multiplex assay that tests for two diseases. # 2.1) Find the optimal testing configuration using non-informative # two-stage hierarchical testing, given # p.vec, a vector of overall joint probabilities of disease; # Se, a vector of sensitivity values for each disease; and # Sp, a vector of specificity values for each disease. # Se and Sp can also be specified as a matrix, where one value # is specified for each disease at each stage of testing. results5 <- OTC2(algorithm = "D2", p.vec = c(0.90, 0.04, 0.04, 0.02), Se = c(0.99, 0.99), Sp = c(0.99, 0.99), group.sz = 3:50) summary(results5) # 2.2) Calculate the operating characteristics for informative # five-stage hierarchical testing, given # alpha.vec, a vector of shape parameters for the Dirichlet distribution; # Se, a matrix of sensitivity values; and # Sp, a matrix of specificity values. Se <- matrix(data = rep(0.95, 10), nrow = 2, ncol = 5, byrow = TRUE) Sp <- matrix(data = rep(0.99, 10), nrow = 2, ncol = 5, byrow = TRUE) config.mat <- matrix(data = c(rep(1, 24), rep(1, 18), rep(2, 6), rep(1, 9), rep(2, 9), rep(3, 4), 4, 5, rep(1, 6), rep(2, 3), rep(3, 5), rep(4, 4), rep(5, 3), 6, rep(NA, 2), 1:21, rep(NA, 3)), nrow = 5, ncol = 24, byrow = TRUE) results6 <- opChar2(algorithm = "ID5", alpha = c(18.25, 0.75, 0.75, 0.25), Se = Se, Sp = Sp, hier.config = config.mat) summary(results6) # 3) Estimation of the overall disease prevalence and calculation # of confidence intervals. # 3.1) Suppose 3 groups out of 24 test positively. # Each group has a size of 7. propCI(x = 3, m = 7, n = 24, ci.method = "CP") propCI(x = 3, m = 7, n = 24, ci.method = "Blaker") propCI(x = 3, m = 7, n = 24, ci.method = "score") propCI(x = 3, m = 7, n = 24, ci.method = "soc") # 3.2) Consider the following situation: # 0 out of 5 groups test positively with groups # of size 1 (individual testing), # 0 out of 5 groups test positively with groups of size 5, # 1 out of 5 groups test positively with groups of size 10, # 2 out of 5 groups test positively with groups of size 50 propCI(x = c(0, 0, 1, 2), m = c(1, 5, 10, 50), n = c(5, 5, 5, 5), pt.method = "Gart", ci.method = "skew-score") # 4) Estimate a group testing regression model. # 4.1) Fit a group testing regression model with # simple pooling using the "hivsurv" dataset. data(hivsurv) fit1 <- gtReg(type = "sp", formula = groupres ~ AGE + EDUC., data = hivsurv, groupn = gnum, sens = 0.9, spec = 0.9, method = "Xie") summary(fit1) # 4.2) Simulate data for the halving protocol, and # fit a group testing regression model. set.seed(46) gt.data <- gtSim(type = "halving", par = c(-6, 0.1), gshape = 17, gscale = 1.4, size1 = 1000, size2 = 5, sens = 0.95, spec = 0.95) fit2 <- gtReg(type = "halving", formula = gres ~ x, data = gt.data, groupn = groupn, subg = subgroup, retest = retest, sens = 0.95, spec = 0.95, start = c(-6, 0.1), trace = TRUE) summary(fit2) # This example takes approximately 20 seconds to run. # 4.3) Simulate data in 5x6 array testing form, and # fit a group testing regression model. set.seed(9128) array.sim <- gtSim(type = "array", par = c(-7, 0.1), size1 = c(5, 6), size2 = c(4, 5), sens = 0.95, spec = 0.95) set1 <- array.sim$dframe fit3 <- gtReg(type = "array", formula = cbind(col.resp, row.resp)~x, data = set1, coln = coln, rown = rown, arrayn = arrayn, sens = 0.95, spec = 0.95, tol = 0.005, n.gibbs = 2000, trace = TRUE) summary(fit3)

[Package *binGroup2* version 1.1.0 Index]