Title: | ROC Curve Inference with and without Covariates |
---|---|
Description: | Estimates the pooled (unadjusted) Receiver Operating Characteristic (ROC) curve, the covariate-adjusted ROC (AROC) curve, and the covariate-specific/conditional ROC (cROC) curve by different methods, both Bayesian and frequentist. Also, it provides functions to obtain ROC-based optimal cutpoints utilizing several criteria. Based on Erkanli, A. et al. (2006) <doi:10.1002/sim.2496>; Faraggi, D. (2003) <doi:10.1111/1467-9884.00350>; Gu, J. et al. (2008) <doi:10.1002/sim.3366>; Inacio de Carvalho, V. et al. (2013) <doi:10.1214/13-BA825>; Inacio de Carvalho, V., and Rodriguez-Alvarez, M.X. (2022) <doi:10.1214/21-STS839>; Janes, H., and Pepe, M.S. (2009) <doi:10.1093/biomet/asp002>; Pepe, M.S. (1998) <http://www.jstor.org/stable/2534001?seq=1>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1016/j.csda.2010.07.018>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1007/s11222-010-9184-1>. Please see Rodriguez-Alvarez, M.X. and Inacio, V. (2021) <doi:10.32614/RJ-2021-066> for more details. |
Authors: | Maria Xose Rodriguez-Alvarez [aut, cre]
|
Maintainer: | Maria Xose Rodriguez-Alvarez <[email protected]> |
License: | GPL |
Version: | 1.0-9 |
Built: | 2025-03-09 04:56:36 UTC |
Source: | https://github.com/cran/ROCnReg |
Estimates the pooled (unadjusted) Receiver Operating Characteristic (ROC) curve, the covariate-adjusted ROC (AROC) curve, and the covariate-specific/conditional ROC (cROC) curve by different methods, both Bayesian and frequentist. Also, it provides functions to obtain ROC-based optimal cutpoints utilizing several criteria. Based on Erkanli, A. et al. (2006) <doi:10.1002/sim.2496>; Faraggi, D. (2003) <doi:10.1111/1467-9884.00350>; Gu, J. et al. (2008) <doi:10.1002/sim.3366>; Inacio de Carvalho, V. et al. (2013) <doi:10.1214/13-BA825>; Inacio de Carvalho, V., and Rodriguez-Alvarez, M.X. (2022) <doi:10.1214/21-STS839>; Janes, H., and Pepe, M.S. (2009) <doi:10.1093/biomet/asp002>; Pepe, M.S. (1998) <http://www.jstor.org/stable/2534001?seq=1>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1016/j.csda.2010.07.018>; Rodriguez-Alvarez, M.X. et al. (2011a) <doi:10.1007/s11222-010-9184-1>. Please see Rodriguez-Alvarez, M.X. and Inacio, V. (2021) <doi:10.32614/RJ-2021-066> for more details.
Package: | ROCnReg |
Type: | Package |
Version: | 1.0-9 |
Date: | 2024-05-30 |
License: | GPL |
Maria Xose Rodriguez-Alvarez and Vanda Inacio Maintainer: Maria Xose Rodriguez-Alvarez <[email protected]>
Erkanli, A., Sung M., Jane Costello, E., and Angold, A. (2006). Bayesian semi-parametric ROC analysis. Statistics in Medicine, 25, 3905–3928.
Faraggi, D. (2003). Adjusting receiver operating characteristic curves and related indices for covariates. The Statistician 52, 179–192.
Gu, J., Ghosal, S., and Roy, A. (2008). Bayesian bootstrap estimation of ROC curve. Statistics in Medicine, 27, 5407–5420.
Inacio de Carvalho, V., Jara, A., Hanson, T. E., and de Carvalho, M. (2013). Bayesian nonparametric ROC regression modeling. Bayesian Analysis, 8, 623–646.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561
Janes, H., and Pepe, M.S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika, 96, 371–382.
Pepe, M.S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics 54, 124–135.
Rodriguez-Alvarez, M. X. and Inacio, V., and (2021). ROCnReg: An R Package for Receiver Operating Characteristic Curve Inference with and without Covariate Information. The R Journal
Rodriguez-Alvarez, M.X., Tahoces, P. G., Cadarso-Suarez, C., and Lado, M.J. (2011). Comparative study of ROC regression techniques–Applications for the computer-aided diagnostic system in breast cancer detection. Computational Statistics and Data Analysis, 55, 888–902.
Rodriguez-Alvarez, M.X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
This function estimates the covariate-adjusted ROC curve (AROC) using the nonparametric Bayesian approach proposed by Inacio de Carvalho and Rodriguez-Alvarez (2018).
AROC.bnp(formula.h, group, tag.h, data, standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95, compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE, pauc = pauccontrol(), density = densitycontrol.aroc(), prior.h = priorcontrol.bnp(), mcmc = mcmccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
AROC.bnp(formula.h, group, tag.h, data, standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95, compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE, pauc = pauccontrol(), density = densitycontrol.aroc(), prior.h = priorcontrol.bnp(), mcmc = mcmccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
formula.h |
A |
group |
A character string with the name of the variable that distinguishes healthy/nondiseased from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
A data frame representing the data and containing all needed variables. |
standardise |
A logical value. If TRUE both the test outcomes and the continuous covariates assumed to have a linear effect are standardised (i.e., the resulting variables have mean zero and standard deviation of one). The default is TRUE. |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion (DIC)(Celeux et al., 2006, Spiegelhalter et al., 2002) is computed. |
pauc |
A list of control values to replace the default values returned by the function |
density |
A list of control values to replace the default values returned by the function |
prior.h |
A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the covariate-adjusted ROC curve (AROC) defined as
where denotes the distribution function of
conditional on the vector of covariates
.
The method implemented in this function combines a single-weights linear dependent Dirichlet process mixture model (De Iorio et al., 2009) to estimate and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting
be a random sample from the nondiseased population, our postulated model for the conditional distribution function takes the following form
where denotes the cumulative distribution function of the normal distribution, evaluated at
, with mean
and variance
. The regression function
can incorportate both linear and nonlinear (through B-splines) effects of continuous covariates, categorical covariates (factors) as well as interactions. Interactions between categorical and (nonlinear) continuous covariates are also allowed (factor-by curve interactions). For the sake of simplicity we write
(
), where
is the
th column of the design matrix (possibly containing a basis representation of some/all continuous covariates). Here
is a pre-specified upper bound on the number of mixture components. The
's result from a truncated version of the stick-breaking construction (
;
,
;
Beta
;
,
),
, and
. It is further assumed that
and
. Here
denotes a Gamma distribution with shape parameter
and rate parameter
,
denotes a Wishart distribution with
degrees of freedom and expectation
, and
denotes the dimension of the vector
. It is worth mentioning that when
, the model for the conditional distribution of the test outcomes (in the healthy population) reduces to a normal regression model (where continuous covariates effects are modelled either parametrically or nonparametrically). For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).
Regarding the area under the curve, we note that
where . In our implementation, the expectation is computed using the Bayesian bootstrap (using the same weights as those used to estimate the AROC, see Inacio de Carvalho and Rodriguez-Alvarez (2018) for details). As far as the partial area under the curve is concerned, when
focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
where . Again, the expectation is computed using the Bayesian bootstrap. The returned value is the normalised pAAUC,
so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
Here, the computation of the integral is done numerically. The returned value is the normalised pAAUC, , so that it ranges from
(useless test) to 1 (perfect test).
Finally, it is important referring that with respect to the computation of the DIC, when , it is computed as in Spiegelhalter et al. (2002), and when
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
As a result, the function provides a list with the following components:
call |
The matched call. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-adjusted ROC curve (AROC) has been estimated. |
ci.level |
The value of the argument |
prior |
A list returning the hyperparameter values. |
ROC |
Estimated covariate-adjusted ROC curve (AROC) (posterior mean) and |
AUC |
Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean) and |
newdata |
If |
reg.fun.h |
If |
dens |
If |
lpml |
If computed, a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, deviance information criterion (DIC) and associated complexity penalty (pD). |
fit |
Results of the fitting process. A list with the following components: (1) |
data_model |
List with the data used in the fit: observed diagnostic test outcome and design matrices, separately for the healthy and diseased groups. |
The input argument formula.h
is similar to that used for the glm
function, except that flexible specifications can be added by means of the function f()
. For instance, specification would assume a linear effect of
x1
(if x1
continuous) and the effect of x2
would be modeled using B-splines basis functions. The argument K = 3
indicates that 3
internal knots will be used, with the quantiles of x2
used for their location. Categorical variables (factors) can be also incorporated, as well as interaction terms. For example, to include the factor-by-curve interaction between age
and gender
we need to specify, e.g., . Note that, in this case, the number of knots can be different for each level of the factor. The order of the vector
K
of knots should match the levels of the factor.
Celeux, G., Forbes, F., Robert C. P., and Titerrington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1, 651–674.
De Iorio, M., Johnson, W. O., Muller, P., and Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling. Biometrics, 65, 762–775.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection. Journal of the American Statistical Association, 74, 153–160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1010.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9, 130–134.
Speigelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of model comparison and fit. Journal of the Royal Statistical Society, Ser. B, 64, 583–639.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5), density = densitycontrol.aroc(compute = TRUE, grid.h = NA, newdata = NA), prior.h = priorcontrol.bnp(m0 = rep(0, 4), S0 = 10*diag(4), nu = 6, Psi = diag(4), a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) summary(AROC_bnp) plot(AROC_bnp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5), density = densitycontrol.aroc(compute = TRUE, grid.h = NA, newdata = NA), prior.h = priorcontrol.bnp(m0 = rep(0, 4), S0 = 10*diag(4), nu = 6, Psi = diag(4), a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) summary(AROC_bnp) plot(AROC_bnp)
This function estimates the covariate-adjusted ROC curve (AROC) using the nonparametric kernel-based method proposed by Rodriguez-Alvarez et al. (2011). The method, as it stands now, can only deal with one continuous covariate.
AROC.kernel(marker, covariate, group, tag.h, bw = c("LS", "AIC"), regtype = c("LC", "LL"), pauc = pauccontrol(), data, p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
AROC.kernel(marker, covariate, group, tag.h, bw = c("LS", "AIC"), regtype = c("LC", "LL"), pauc = pauccontrol(), data, p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
marker |
A character string with the name of the diagnostic test variable. |
covariate |
A character string with the name of the continuous covariate. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
bw |
A character string specifying which method to use to select the bandwidths. AIC specifies expected Kullback-Leibler cross-validation, and LS specifies least-squares cross-validation. Defaults to LS. For details see |
regtype |
A character string specifying which type of kernel estimator to use for the regression function (see Details). LC specifies a local-constant estimator (Nadaraya-Watson) and LL specifies a local-linear estimator. Defaults to LC. For details see |
pauc |
A list of control values to replace the default values returned by the function |
data |
A data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve. This set is also used to compute the area under the covariate-adjusted ROC curve (AAUC) using Simpson's rule. Thus, the length of the set should be an odd number and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the covariate-adjusted ROC curve (AROC) defined as
where . In particular, the method implemented in this function estimates the outer probability empirically (see Janes and Pepe, 2009) and
is estimated assuming a nonparametric location-scale regression model for
, i.e.,
where is the regression funcion,
is the variance function, and
has zero mean, variance one, and distribution function
. As a consequence,
By default, both the regression and variance functions are estimated using the Nadaraya-Watson estimator (LC), and the bandwidths are selected using least-squares cross-validation (LS). Implementation relies on the R
-package np
. No assumption is made about , which is empirically estimated on the basis of the standardised residuals.
The area under the AROC curve is
and there exists a closed-form estimator. With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
where again there exists a closed-form estimator. The returned value is the normalised pAAUC, so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
Here, the computation of the integral is done numerically. The returned value is the normalised pAAUC, , so that it ranges from
(useless test) to 1 (perfect test).
As a result, the function provides a list with the following components:
call |
The matched call. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
covariate |
The value of the argument |
group |
The value of the argument |
tag.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-adjusted ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated covariate-adjusted ROC curve (AROC), and |
AUC |
Estimated area under the covariate-adjusted ROC curve (AAUC), and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (pAAUC) and |
fit |
List with the following components: (1) |
Hayfield, T., and Racine, J. S. (2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software, 27(5). URL http://www.jstatsoft.org/v27/i05/.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Janes, H., and Pepe, M.S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika, 96, 371–382.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m2 <- AROC.kernel(marker = "l_marker1", covariate = "age", group = "status", tag.h = 0, data = newpsa, bw = "LS", regtype = "LC", pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5), B = 500) summary(m2) plot(m2)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m2 <- AROC.kernel(marker = "l_marker1", covariate = "age", group = "status", tag.h = 0, data = newpsa, bw = "LS", regtype = "LC", pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5), B = 500) summary(m2) plot(m2)
This function estimates the covariate-adjusted ROC curve (AROC) using the semiparametric approach proposed by Janes and Pepe (2009).
AROC.sp(formula.h, group, tag.h, data, est.cdf.h = c("normal", "empirical"), pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
AROC.sp(formula.h, group, tag.h, data, est.cdf.h = c("normal", "empirical"), pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
formula.h |
A |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
A data frame representing the data and containing all needed variables. |
est.cdf.h |
A character string. It indicates how the conditional distribution function of the diagnostic test in the healthy population is estimated. Options are |
pauc |
A list of control values to replace the default values returned by the function |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve. This set is also used to compute the area under the covariate-adjusted ROC curve (AAUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the covariate-adjusted ROC curve (AROC) defined as
The method implemented in this function estimates the outer probability empirically (see Janes and Pepe, 2009) and is estimated assuming a semiparametric location regression model for
, i.e.,
where has zero mean, variance one, and distribution function
. As a consequence, we have
In line with the assumptions made about the distribution of , estimators will be referred to as: (a) "normal", where a standard Gaussian error is assumed, i.e.,
; and, (b) "empirical", where no assumption is made about the distribution (in this case,
is empirically estimated on the basis of standardised residuals).
The area under the AROC curve is
and there exists a closed-form estimator. With regard to the partial area under the AROC curve, when focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
where again there exists a closed-form estimator. The returned value is the normalised pAAUC, so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
Here, the computation of the integral is done numerically. The returned value is the normalised pAAUC, , so that it ranges from
(useless test) to 1 (perfect test).
As a result, the function provides a list with the following components:
call |
The matched call. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
formula |
The value of the argument |
est.cdf.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-adjusted ROC (AROC) curve has been estimated |
ci.level |
The value of the argument |
ROC |
Estimated covariate-adjusted ROC curve (AROC), and |
AUC |
Estimated area under the covariate-adjusted ROC curve (AAUC), and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (pAAUC) and |
fit |
Object of class |
coeff |
Estimated regression coefficients (and |
Janes, H., and Pepe, M.S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika, 96(2), 371 - 382.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m3 <- AROC.sp(formula.h = l_marker1 ~ age, group = "status", tag.h = 0, data = newpsa, est.cdf.h = "normal", pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5), p = seq(0,1,l=101), B = 500) summary(m3) plot(m3)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m3 <- AROC.sp(formula.h = l_marker1 ~ age, group = "status", tag.h = 0, data = newpsa, est.cdf.h = "normal", pauc = pauccontrol(compute = TRUE, focus = "FPF", value = 0.5), p = seq(0,1,l=101), B = 500) summary(m3) plot(m3)
This function implements methods for estimating AROC-based threshold values.
compute.threshold.AROC(object, criterion = c("FPF", "YI"), FPF, newdata, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
compute.threshold.AROC(object, criterion = c("FPF", "YI"), FPF, newdata, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
object |
An object of class |
criterion |
A character string indicating whether the covariate-adjusted threshold values should be computed based on the Youden index (“YI”) or for a fixed set of false positive fractions (“FPF”). |
FPF |
For criterion = FPF, a numeric vector with the FPF at which to calculate the AROC-based threshold values. Atomic values are also valid. |
newdata |
Optional data frame containing the values of the covariates at which the AROC-based threshold values will be computed. If not supplied, the function |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates AROC-based threshold values based on two different criteria, namely, the Youden index (YI) and the one that gives rise to a pre-specified FPF. Before proceeding, we would like to mention that when the accuracy of a test is not affected by covariates, this does not necessarily imply that the covariate-specific ROC curve (which in this case is the same for all covariate values) coincides with the pooled ROC curve. It does coincide, however, with the AROC curve. Consequently, in all cases where covariates affect the test, even though they might not affect its discriminatory capacity, inferences based on the pooled ROC curve might be misleading. In such cases the AROC curve should be used instead. This also applies to the selection of (optimal) threshold values, which, as will be seen, might be covariate-specific (i.e., possibly different for different covariate values).
For the AROC curve, the Youden Index is defined as
The value (FPF) that achieves the maximum is then used to calculate the optimal (covariate-specific) YI threshold as follows
where
In a similar way, when using the criterion for a fixed FPF, the covariate-specific threshold values are obtained as follows
In both cases, we use the notation to emphasise that this value depends on covariate
.
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
Data frame containing the values of the covariates at which the AROC-based thresholds were computed. |
thresholds |
If method = "YI", the estimated AROC-based threshold corresponding to the Youden index, and if method = "FPF", AROC-based threshold corresponding to the specified FPF. For the Bayesian approach ( |
YI |
If criterion = "YI", the AROC-based Youden index. For the Bayesian approach ( |
FPF |
If criterion = "YI", the FPF where the Youden index is attained, and if criterion = "FPF", the supplied FPF argument. For the Bayesian approach ( |
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
Rutter, C.M. and Miglioretti, D. L. (2003). Estimating the Accuracy of Psychological Scales Using Longitudinal Data. Biostatistics, 4, 97–107.
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.
AROC.bnp
, AROC.kernel
or AROC.sp
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), prior = priorcontrol.bnp(m0 = rep(0, 4), S0 = 10*diag(4), nu = 6, Psi = diag(4), a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) ### Threshold values based on the YI th_AROC_bnp_yi <- compute.threshold.AROC(AROC_bnp, criterion = "YI") # Plot results plot(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values based on the Youden Index") lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"qh"], lty = 2) lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"ql"], lty = 2) ### Threshold values for a fixed FPF th_AROC_bnp_fpf <- compute.threshold.AROC(AROC_bnp, criterion = "FPF", FPF = 0.1) # Plot results plot(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values for a FPF = 0.1") lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"qh"], lty = 2) lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"ql"], lty = 2)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), prior = priorcontrol.bnp(m0 = rep(0, 4), S0 = 10*diag(4), nu = 6, Psi = diag(4), a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) ### Threshold values based on the YI th_AROC_bnp_yi <- compute.threshold.AROC(AROC_bnp, criterion = "YI") # Plot results plot(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values based on the Youden Index") lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"qh"], lty = 2) lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_yi$thresholds[,"ql"], lty = 2) ### Threshold values for a fixed FPF th_AROC_bnp_fpf <- compute.threshold.AROC(AROC_bnp, criterion = "FPF", FPF = 0.1) # Plot results plot(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values for a FPF = 0.1") lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"qh"], lty = 2) lines(th_AROC_bnp_yi$newdata$age, th_AROC_bnp_fpf$thresholds[["0.1"]][,"ql"], lty = 2)
This function implements methods for estimating covariate-specific ROC-based threshold values.
compute.threshold.cROC(object, criterion = c("FPF", "TPF", "YI"), FPF, TPF, newdata, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
compute.threshold.cROC(object, criterion = c("FPF", "TPF", "YI"), FPF, TPF, newdata, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
object |
An object of class |
criterion |
A character string indicating whether the covariate-specific threshold values should be computed based on the Youden index (“YI”) or for fixed false positive fractions (“FPF”) or true positive fractions (“TPF”). |
FPF |
For |
TPF |
For |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific threshold values will be computed. If not supplied, the function |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates covariate-specific ROC-based threshold values based on three different criteria, namely, the Youden index (YI), one that gives rise to a pre-specified FPF, and one that gives rise to a pre-specified TPF.
In the conditional case, the Youden index is defined as
where
The value that achieves the maximum is called the optimal covariate-specific YI threshold. Regarding the criterion for a fixed FPF, the covariate-specific threshold values are obtained as follows
and for a fixed TPF we have
In all cases, we use the notation to emphasise that this value depends on covariate
.
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
Data frame containing the values of the covariates at which the covariate-specific thresholds were computed. |
thresholds |
If method = "YI", the estimated covariate-specific (optimal) threshold corresponding to the covariate-specific Youden index (the one that maximises TPF/sensitivity + TNF/specificity). If method = "FPF", the covariate-specific threshold corresponding to the specified FPF, and if method = "TPF", the covariate-specific threshold corresponding to the specified TPF. For the Bayesian approach ( |
YI |
If method = "YI", the estimated covariate-specific Youden index. For the Bayesian approach ( |
FPF |
If method = "YI" or method = "TPF", the FPF corresponding to the estimated (optimal) covariate-specific thresholds (for the Bayesian approach ( |
TPF |
If method = "YI" or method = "FPF", the covariate-specific TPF/sensitivity corresponding to the estimated covariate-specific (optimal) threshold. For the Bayesian approach ( |
Inacio de Carvalho, V., de Carvalho, M. and Branscum, A. J. (2017). Nonparametric Bayesian Covariate-Adjusted Estimation of the Youden Index. Biometrics, 73, 1279-1288.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
Rutter, C.M. and Miglioretti, D. L. (2003). Estimating the Accuracy of Psychological Scales Using Longitudinal Data. Biostatistics, 4, 97–107.
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.
cROC.bnp
, cROC.kernel
or cROC.sp
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) ### Threshold values based on the YI th_cROC_bnp_yi <- compute.threshold.cROC(cROC_bnp, criterion = "YI") # Plot results # Threshold values plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values based on the Youden Index") lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"qh"], lty = 2) lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"ql"], lty = 2) # Youden Index plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,1), main = "Threshold values based on the Youden Index") lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"qh"], lty = 2) lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"ql"], lty = 2) ### Threshold values for a fixed FPF th_cROC_bnp_fpf <- compute.threshold.cROC(cROC_bnp, criterion = "FPF", FPF = 0.1) # Plot results # Threshold values plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values for a FPF = 0.1") lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"qh"], lty = 2) lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"ql"], lty = 2)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) ### Threshold values based on the YI th_cROC_bnp_yi <- compute.threshold.cROC(cROC_bnp, criterion = "YI") # Plot results # Threshold values plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values based on the Youden Index") lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"qh"], lty = 2) lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$thresholds[,"ql"], lty = 2) # Youden Index plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,1), main = "Threshold values based on the Youden Index") lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"qh"], lty = 2) lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_yi$YI[,"ql"], lty = 2) ### Threshold values for a fixed FPF th_cROC_bnp_fpf <- compute.threshold.cROC(cROC_bnp, criterion = "FPF", FPF = 0.1) # Plot results # Threshold values plot(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"est"], type = "l", xlab = "Age", ylab = "log(PSA)", ylim = c(0,3), main = "Threshold values for a FPF = 0.1") lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"qh"], lty = 2) lines(th_cROC_bnp_yi$newdata$age, th_cROC_bnp_fpf$thresholds[["0.1"]][,"ql"], lty = 2)
This function implements methods for estimating pooled ROC-based threshold values.
compute.threshold.pooledROC(object, criterion = c("FPF", "TPF", "YI"), FPF, TPF, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
compute.threshold.pooledROC(object, criterion = c("FPF", "TPF", "YI"), FPF, TPF, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
object |
An object of class |
criterion |
A character string indicating if the threshold value should be computed based on the Youden index (“YI”), or for fixed false positive fractions (“FPF”) or true positive fractions (“TPF”). |
FPF |
For |
TPF |
For |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates pooled ROC-based threshold values based on three different criteria, namely, the Youden index (YI), one that gives rise to a pre-specified FPF, and one that gives rise to a pre-specified TPF.
The Youden Index is defined as
where
The value that achieves the maximum is called the optimal YI threshold. Regarding the criterion for a fixed FPF, the threshold value is obtained as follows
and for a fixed TPF we have
As a result, the function provides a list with the following components:
call |
The matched call. |
threshold |
If method = "YI", the estimated (optimal) threshold corresponding to the Youden index (the one that maximises TPF/sensitivity + TNF/specificity). If method = "FPF", the estimated threshold corresponding to the specified FPF, and if method = "TPF", the estimated threshold corresponding to the specified TPF. For the Bayesian approaches ( |
YI |
If method = "YI", the estimated Youden index. For the Bayesian approaches ( |
FPF |
If method = "YI" or method = "TPF", the FPF corresponding to the estimated (optimal) threshold (For the Bayesian approaches ( |
TPF |
If method = "YI" or method = "FPF", the TPF/sensitivity corresponding to the estimated (optimal) threshold. For the Bayesian approaches ( |
Rutter, C.M. and Miglioretti, D. L. (2003). Estimating the Accuracy of Psychological Scales Using Longitudinal Data. Biostatistics, 4, 97–107.
Youden, W. J. (1ci.level
*1000). Index for rating diagnostic tests. Cancer, 3, 32–35.
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_dpm <- pooledROC.dpm(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.WAIC = TRUE, compute.lpml = TRUE, compute.DIC = TRUE, prior.h = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), prior.d = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 400, nburn = 100, nskip = 1)) ## Threshold values based on the YI th_m0_dpm_yi <- compute.threshold.pooledROC(m0_dpm, criterion = "YI") th_m0_dpm_yi$threshold th_m0_dpm_yi$YI ### Threshold values for a fixed FPF th_m0_dpm_fpf <- compute.threshold.pooledROC(m0_dpm, criterion = "FPF", FPF = 0.1) th_m0_dpm_fpf$threshold
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_dpm <- pooledROC.dpm(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.WAIC = TRUE, compute.lpml = TRUE, compute.DIC = TRUE, prior.h = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), prior.d = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 400, nburn = 100, nskip = 1)) ## Threshold values based on the YI th_m0_dpm_yi <- compute.threshold.pooledROC(m0_dpm, criterion = "YI") th_m0_dpm_yi$threshold th_m0_dpm_yi$YI ### Threshold values for a fixed FPF th_m0_dpm_fpf <- compute.threshold.pooledROC(m0_dpm, criterion = "FPF", FPF = 0.1) th_m0_dpm_fpf$threshold
This function estimates the covariate-specific ROC curve (cROC) using the nonparametric Bayesian approach proposed by Inacio de Carvalho et al. (2013).
cROC.bnp(formula.h, formula.d, group, tag.h, data, newdata, standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95, compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE, pauc = pauccontrol(), density = densitycontrol(), prior.h = priorcontrol.bnp(), prior.d = priorcontrol.bnp(), mcmc = mcmccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
cROC.bnp(formula.h, formula.d, group, tag.h, data, newdata, standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95, compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE, pauc = pauccontrol(), density = densitycontrol(), prior.h = priorcontrol.bnp(), prior.d = priorcontrol.bnp(), mcmc = mcmccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
formula.h |
A |
formula.d |
A |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
A data frame representing the data and containing all needed variables. |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if computed) will be computed. If not supplied, the function |
standardise |
A logical value. If TRUE both the test outcomes and the continuous covariates assumed to have a linear effect are standardised (i.e., the resulting variables have mean zero and standard deviation of one). The default is TRUE. |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion (DIC)(Celeux et al., 2006, Spiegelhalter et al., 2002) is computed. |
pauc |
A list of control values to replace the default values returned by the function |
density |
A list of control values to replace the default values returned by the function |
prior.h |
Hyparameter specification for the healthy population. A list of control values to replace the default values returned by the function |
prior.d |
Hyparameter specification for the diseased population. A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the covariate-specific ROC curve (cROC) defined as
where
Note that, for the sake of clarity, we assume that the covariates of interest are the same in both the healthy and diseased populations. The method implemented in this function estimates and
by means of a single-weights linear dependent Dirichlet process mixture of normals model (De Iorio et al., 2009). More precisely, and letting
and
be two independent random samples from the nondiseased and diseased populations, respectively, our postulated model for the conditional distribution in each group function takes the following form
where denotes the cumulative distribution function of the normal distribution, evaluated at
, with mean
and variance
. The regression function
can incorportate both linear and nonlinear (through B-splines) effects of continuous covariates, categorical covariates (factors) as well as interactions. Interactions between categorical and (nonlinear) continuous covariates are also allowed (factor-by curve interactions). For the sake of simplicity we write
, where
is the
th column of the design matrix (possibly containing a basis representation of some/all continuous covariates),
. Here
is a pre-specified upper bound on the number of mixture components. The
's result from a truncated version of the stick-breaking construction (
;
,
;
Beta
;
,
),
, and
. It is further assumed that
and
. Here
denotes a Wishart distribution with
degrees of freedom and expectation
, Here
denotes a Gamma distribution with shape parameter
and rate parameter
, and
denotes the dimension of the vector
. It is worth mentioning that when
, the model for the conditional distribution of the test outcomes reduces to a normal regression model (where continuous covariates effects are modelled either parametrically or nonparametrically). For a detailed description, we refer to Inacio de Carvalho et al. (2013).
The covariate-specific area under the curve is
When the upper bound on the number of mixture components is 1, i.e., (
), there is a closed-form expression for the covariate-specific AUC (binormal model), which is used in the package. In contrast, when
or
, the integral is computed numerically using Simpson's rule. With regard to the partial area under the curve, when
focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
As for the AUC, when (
), there is a closed-form expression for the
(Hillis and Metz, 2012), and when
or
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC,
so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
where is a
rotation of the ROC curve, and it can be expressed as
Again, when
(
), there is a closed-form expression for the
(Hillis and Metz, 2012), and when
or
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC,
, so that it ranges from
(useless test) to 1 (perfect test).
It is worth referring that with respect to the computation of the DIC, when , it is computed as in Spiegelhalter et al. (2002), and when
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
A data frame containing the values of the covariates at which the covariate-specific ROC curve (as well as the AUC, pAUC, dens and reg.fun, if required) was computed. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-specific ROC curve has been estimated. |
ci.level |
The value of the argument |
prior |
A list returning the hyperparameter values in the healthy and diseased populations. |
ROC |
A list returning the |
AUC |
Estimated area under the covariate-specific ROC curve (posterior mean) and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve (posterior mean) and |
dens |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: |
reg.fun |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a data frame containing the predicted regression function (posterior mean) and |
lpml |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: deviance information criterion (DIC) and associated complexity penalty (pD). |
fit |
Named list of length two, with components |
data_model |
A list with the data used in the fit: observed diagnostic test outcome and design matrices, separately for the healthy and diseased groups. |
The input arguments formula.h
and formula.d
are similar to that used for the glm
function, except that flexible specifications can be added by means of the function f()
. For instance, specification would assume a linear effect of
x1
(if x1
continuous) and the effect of x2
would be modeled using B-splines basis functions. The argument K = 3
indicates that 3
internal knots will be used, with the quantiles of x2
used for their location. Categorical variables (factors) can be also incorporated, as well as interaction terms. For example, to include the factor-by-curve interaction between age
and gender
we need to specify, e.g., . Note that, in this case, the number of knots can be different for each level of the factor. The order of the vector
K
of knots should match the levels of the factor.
Celeux, G., Forbes, F., Robert C. P., and Titerrington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1, 651–674.
De Iorio, M., Johnson, W. O., Muller, P., and Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling. Biometrics, 65, 762–775.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection, Journal of the American Statistical Association, 74, 153–160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1010.
Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.
Inacio de Carvalho, V., Jara, A., Hanson, T. E., and de Carvalho, M. (2013). Bayesian nonparametric ROC regression modeling. Bayesian Analysis, 8, 623–646.
Speigelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of model comparison and fit. Journal of the Royal Statistical Society, Ser. B, 64, 583–639.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) summary(cROC_bnp) plot(cROC_bnp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) summary(cROC_bnp) plot(cROC_bnp)
This function estimates the covariate-specific ROC curve (cROC) using the nonparametric kernel-based method proposed by Rodriguez-Alvarez et al. (2011). The method, as it stands now, can only deal with one continuous covariate.
cROC.kernel(marker, covariate, group, tag.h, bw = c("LS", "AIC"), regtype = c("LC", "LL"), data, newdata, pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
cROC.kernel(marker, covariate, group, tag.h, bw = c("LS", "AIC"), regtype = c("LC", "LL"), data, newdata, pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
marker |
A character string with the name of the diagnostic test variable. |
covariate |
A character string with the name of the continuous covariate. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
bw |
A character string specifying which method to use to select the bandwidths. AIC specifies expected Kullback-Leibler cross-validation, and LS specifies least-squares cross-validation. Defaults to LS. For details see |
regtype |
A character string specifying which type of kernel estimator to use for the regression function (see Details). LC specifies a local-constant estimator (Nadaraya-Watson) and LL specifies a local-linear estimator. Defaults to LC. For details see |
data |
Data frame representing the data and containing all needed variables. |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if computed) will be computed. If not supplied, the function |
pauc |
A list of control values to replace the default values returned by the function |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. This set is also used to compute the area under the covariate-specific ROC curve using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the covariate-specific ROC curve (cROC) defined as
where
Note that, for the sake of clarity, we assume that the covariate of interest is the same in both healthy and diseased populations. In particular, the method implemented in this function estimates and
assuming a nonparametric location-scale regression model for
in each population separately, i.e.,
where ,
(regression function),
,
(variance functions), and
and
have zero mean, variance one, and distribution functions
and
, respectively. In this case, the covariate-specific ROC curve can be expressed as
where ,
, and
and
are the distribution functions of
and
, respectively.
By default, for both the healthy and diseased population, both the regression and variance functions are estimated using the Nadaraya-Watson estimator (LC), and the bandwidth are selected using least-squares cross-validation (LS). Implementation relies on the
R
-package np
. No assumptions are made about and
, which are empirically estimated on the basis of standardised residuals.
The covariate-specific area under the curve is
and is computed numerically (using Simpson's rule). With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
where again the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
where is a
rotation of the ROC curve, and it can be expressed as
Again, the computation of the integral is done via Simpson's rule. The returned value is the normalised pAUC,
, so that it ranges from
(useless test) to 1 (perfect test).
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
A data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) was computed. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
covariate |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-specific ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated covariate-specific ROC curve (AROC), and |
AUC |
Estimated area under the covariate-specific ROC curve, and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve and |
fit |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component of the list contains the following information: (1) |
Hayfield, T., and Racine, J. S.(2008). Nonparametric Econometrics: The np Package. Journal of Statistical Software 27(5). URL http://www.jstatsoft.org/v27/i05/.
Rodriguez-Alvarez, M. X., Roca-Pardinas, J., and Cadarso-Suarez, C. (2011). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21, 483–499.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.kernel
or cROC.sp
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_kernel <- cROC.kernel(marker = "l_marker1", covariate = "age", group = "status", tag.h = 0, data = newpsa, bw = "LS", regtype = "LC", p = seq(0, 1, len = 101), pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), B = 500) plot(cROC_kernel) summary(cROC_kernel )
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_kernel <- cROC.kernel(marker = "l_marker1", covariate = "age", group = "status", tag.h = 0, data = newpsa, bw = "LS", regtype = "LC", p = seq(0, 1, len = 101), pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), B = 500) plot(cROC_kernel) summary(cROC_kernel )
This function estimates the covariate-specific ROC curve (cROC) using the parametric approach proposed by Faraggi (2003) and the semiparametric approach proposed by Pepe (1998).
cROC.sp(formula.h, formula.d, group, tag.h, data, newdata, est.cdf = c("normal", "empirical"), pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
cROC.sp(formula.h, formula.d, group, tag.h, data, newdata, est.cdf = c("normal", "empirical"), pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
formula.h |
A |
formula.d |
A |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying the healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
newdata |
Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) will be computed. If not supplied, the function |
est.cdf |
A character string. It indicates how the conditional distribution functions of the diagnostic test in healthy and diseased populations are estimated. Options are "normal" and "empirical" (see Details). The default is "normal". |
pauc |
A list of control values to replace the default values returned by the function |
p |
Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. This set is also used to compute the area under the covariate-specific ROC curve using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. By default 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the covariate-specific ROC curve (cROC) defined as
where
Note that, for the sake of clarity, we assume that the covariates of interest are the same in both healthy and diseased populations. In particular, the method implemented in this function estimates and
assuming a (semiparametric) location regression model for
in each population separately, i.e.,
such that the covariate-specific ROC curve can be expressed as
where ,
, and
and
are the distribution functions of
and
, respectively. In line with the assumptions made about the distributions of
and
, estimators will be referred to as: (a) "normal", where Gaussian errors are assumed, i.e.,
(Faraggi, 2003); and, (b) "empirical", where no assumptios are made about the distribution (in this case,
and
are empirically estimated on the basis of standardised residuals (Pepe, 1998)).
The covariate-specific area under the curve is
When Gaussian errors are assumed, there is a closed-form expression for the covariate-specific AUC, which is used in the package. In contrast, when no assumptios are made about the distributionis of the errors, the integral is computed numerically using Simpson's rule. With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
Again, when Gaussian errors are assumed, there is a closed-form expression (Hillis and Metz, 2012). Otherwise, the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
where is a
rotation of the ROC curve, and it can be expressed as
Again, when Gaussian errors are assumed, there is a closed-form expression (Hillis and Metz, 2012). Otherwise, the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC,
, so that it ranges from
(useless test) to 1 (perfect test).
As a result, the function provides a list with the following components:
call |
The matched call. |
newdata |
A data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) was computed. |
data |
The original supplied data argument. |
missing.ind |
A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur. |
marker |
The name of the diagnostic test variable in the dataframe. |
group |
The value of the argument |
tag.h |
The value of the argument |
formula |
Named list of length two with the value of the arguments |
est.cdf |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the covariate-specific ROC curves have been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated covariate-specific ROC curve, and |
AUC |
Estimated area under the covariate-specific ROC curve, and |
pAUC |
If computed, estimated partial area under the covariate-adjusted ROC curve and |
fit |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component contains an object of class |
coeff |
Estimated regression coefficients (and |
Faraggi, D. (2003). Adjusting receiver operating characteristic curves and related indices for covariates. The Statistician 52, 179–192.
Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.
Pepe, M.S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics 54, 124–135.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) # Covariate for prediction agep <- seq(min(newpsa$age), max(newpsa$age), length = 50) df.pred <- data.frame(age = agep) cROC_sp_normal <- cROC.sp(formula.h = l_marker1 ~ age, formula.d = l_marker1 ~ age, group = "status", tag.h = 0, data = newpsa, newdata = df.pred, est.cdf = "normal", pauc = list(compute = TRUE, value = 0.5, focus = "FPF"), p = seq(0, 1, l = 101), B = 500) summary(cROC_sp_normal) plot(cROC_sp_normal)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) # Covariate for prediction agep <- seq(min(newpsa$age), max(newpsa$age), length = 50) df.pred <- data.frame(age = agep) cROC_sp_normal <- cROC.sp(formula.h = l_marker1 ~ age, formula.d = l_marker1 ~ age, group = "status", tag.h = 0, data = newpsa, newdata = df.pred, est.cdf = "normal", pauc = list(compute = TRUE, value = 0.5, focus = "FPF"), p = seq(0, 1, l = 101), B = 500) summary(cROC_sp_normal) plot(cROC_sp_normal)
Selects an adequate set of points from a data set for obtaining predictions
cROCData(data, names.cov, group)
cROCData(data, names.cov, group)
data |
Data set from which the new set of covariate values is obtained. |
names.cov |
Character vector with the names of the covariates to be included in the new data set. |
group |
A character string with the name of the variable in the original data set that distinguishes healthy from diseased individuals. |
A data frame containing selected values of all needed covariates. For those that are continuous, 30 different values are selected.
AROC.bnp
, cROC.bnp
, cROC.sp
, cROC.kernel
, compute.threshold.cROC
or compute.threshold.AROC
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) newdf <- cROCData(newpsa, "age", "status") summary(newdf)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) newdf <- cROCData(newpsa, "age", "status") summary(newdf)
This function is used to set various parameters controlling the estimation of the (conditional) density (densities) of test outcomes in both the healthy and diseased groups.
densitycontrol(compute = FALSE, grid.h = NA, grid.d = NA)
densitycontrol(compute = FALSE, grid.h = NA, grid.d = NA)
compute |
Logical value. If TRUE the (conditional) density (densities) of test outcomes in each group, healthy and diseased, are estimated. |
grid.h |
Grid of test outcomes in the healthy group where the (conditional) density (densities) estimates are to be evaluated. Value |
grid.d |
Grid of test outcomes in the diseased group where the (conditional) density (densities) estimates are to be evaluated. Value |
The value returned by this function is used as a control argument of the cROC.bnp
and pooledROC.dpm
functions.
A list with components for each of the possible arguments.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
This function is used to set various parameters controlling the estimation of the conditional densities of test outcomes in the healthy group.
densitycontrol.aroc(compute = FALSE, grid.h = NA, newdata = NA)
densitycontrol.aroc(compute = FALSE, grid.h = NA, newdata = NA)
compute |
Logical value. If TRUE the conditional densities of test outcomes in the healthy group are estimated. |
grid.h |
Grid of test outcomes in the healthy group where the conditional density estimates are to be evaluated. Value |
newdata |
Data frame containing the values of the covariates at which the conditional density estimates are computed. |
The value returned by this function is used as a control argument of the AROC.bnp
function.
A list with components for each of the possible arguments.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) # Covariate for prediction agep <- seq(min(newpsa$age), max(newpsa$age), length = 5) df.pred <- data.frame(age = agep) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol.aroc(compute = TRUE, grid.h = NA, newdata = df.pred), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1) )
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) # Covariate for prediction agep <- seq(min(newpsa$age), max(newpsa$age), length = 5) df.pred <- data.frame(age = agep) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol.aroc(compute = TRUE, grid.h = NA, newdata = df.pred), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1) )
The endosyn
data set was simulated based on the data analysed in Rodriguez-Alvarez et al. (2011a,b) and Inacio de Carvalho and Rodriguez-Alvarez (2018); and presented in Botana et al. (2007) and Tome et al. (2008). The aim of these studies was to use the body mass index (BMI) to detect patients having a higher risk of cardiovascular problems, ascertaining the possible effect of age and gender on the accuracy of this measure.
data("endosyn")
data("endosyn")
A data frame with 2840 observations on the following 4 variables.
gender
patient's gender. Factor with Men
and Women
levels.
age
patient's age.
cvd_idf
true disease status (presence/absence of two of more cardiovascular risk factors according to the International Diabetes Federation). Numerical vector (0 = absence, 1 = presence).
bmi
patient's body mass index.
Botana, M.A., Mato, J.A., Cadarso-Suarez, C., Tome, M.A., Perez-Fernandez, R., Fernandez-Mario, A., Rego-Iraeta, A., Solache, I. (2007). Overweight, obesity and central obesity prevalences in the region of Galicia in Northwest Spain. Obesity and Metabolism, 3, 106–115.
Tome, M.A., Botana, M.A., Cadarso-Suarez, C., Rego-Iraeta, A., Fernandez-Mario, A., Mato, J.A, Solache, I., Perez-Fernandez, R. (2008). Prevalence of metabolic syndrome in Galicia (NW Spain) on four alternative definitions and association with insulin resistance. Journal of Endocrinological Investigation, 32, 505–511.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
Rodriguez-Alvarez, M.X., Roca-Pardinas, J. and Cadarso-Suarez, C. (2011a). ROC curve and covariates: extending induced methodology to the non-parametric framework. Statistics and Computing, 21(4), 483–49.
Rodriguez- Alvarez, M.X., Roca-Pardinas, J. and Cadarso-Suarez, C. (2011b). A new flexible direct ROC regression model - Application to the detection of cardiovascular risk factors by anthropometric measures. Computational Statistics and Data Analysis, 55(12), 3257–3270.
data(endosyn) summary(endosyn)
data(endosyn) summary(endosyn)
This function is used to set various parameters controlling the Markov chain Monte Carlo (MCMC) parameters.
mcmccontrol(nsave = 8000, nburn = 2000, nskip = 1)
mcmccontrol(nsave = 8000, nburn = 2000, nskip = 1)
nsave |
An integer giving the total number of scans to be saved (does not include the burn-in and thinning iterations). |
nburn |
An integer giving the number of burn-in scans. |
nskip |
An integer giving the thinning interval, |
The value returned by this function is used as a control argument of the AROC.bnp
, cROC.bnp
, and pooledROC.dpm
functions.
A list with components for each of the possible arguments.
pooledROC.dpm
, AROC.bnp
and cROC.bnp
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1))
Used to set various parameters controlling the estimation of the partial area under the covariate-adjusted/covariate-specific/pooled ROC curve .
pauccontrol(compute = FALSE, focus = c("FPF", "TPF"), value = 1)
pauccontrol(compute = FALSE, focus = c("FPF", "TPF"), value = 1)
compute |
Logical value. If TRUE the partial area under the covariate-adjusted/covariate-specific/pooled ROC curve is estimated. |
focus |
Whether computation should be done over a restricted range of false positive fractions (FPF) or a restricted range of true positive fractions (TPF). |
value |
Numeric value. Pre-specified upper bound for the FPF (if focus = "FPF") or lower bound for the TPF (if focus = "TPF"). |
The value returned by this function is used as a control argument of the AROC.bnp
, AROC.sp
, AROC.kernel
, cROC.bnp
, cROC.sp
, cROC.kernel
functions.
A list with components for each of the possible arguments.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) # Covariate for prediction agep <- seq(min(newpsa$age), max(newpsa$age), length = 5) df.pred <- data.frame(age = agep) cROC_sp_normal <- cROC.sp(formula.h = l_marker1 ~ age, formula.d = l_marker1 ~ age, group = "status", tag.h = 0, data = newpsa, newdata = df.pred, est.cdf = "normal", pauc = list(compute = TRUE, value = 0.5, focus = "FPF"), p = seq(0, 1, l = 101), B = 10)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) # Covariate for prediction agep <- seq(min(newpsa$age), max(newpsa$age), length = 5) df.pred <- data.frame(age = agep) cROC_sp_normal <- cROC.sp(formula.h = l_marker1 ~ age, formula.d = l_marker1 ~ age, group = "status", tag.h = 0, data = newpsa, newdata = df.pred, est.cdf = "normal", pauc = list(compute = TRUE, value = 0.5, focus = "FPF"), p = seq(0, 1, l = 101), B = 10)
Takes a fitted AROC
object produced by AROC.bnp()
, AROC.sp()
, or AROC.kernel()
and plots the covariate-adjusted ROC curve (AROC) and associated area under the AROC (AAUC).
## S3 method for class 'AROC' plot(x, main = NULL, ...)
## S3 method for class 'AROC' plot(x, main = NULL, ...)
x |
An object of class |
main |
Character string with the overall title for the plot. If NULL, the default, the method used to estimate the AROC curve is depicted. |
... |
Further arguments passed to or from other methods. |
AROC.bnp
, AROC.sp
or AROC.kernel
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE) plot(AROC_bnp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE) plot(AROC_bnp)
Takes a fitted cROC
object produced by cROC.bnp()
, cROC.sp()
, or cROC.kernel()
and plots the covariate-specific ROC curve (cROC) and associated area under the cROC curve. The suitable type of graphic is chosen according to the number and nature of the covariates.
## S3 method for class 'cROC' plot(x, ask = TRUE, ...)
## S3 method for class 'cROC' plot(x, ask = TRUE, ...)
x |
An object of class |
ask |
A logical value. If TRUE, the default, the user is asked for confirmation, before a new figure is drawn. |
... |
Further arguments passed to or from other methods. |
cROC.bnp
, cROC.sp
or cROC.kernel
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) plot(cROC_bnp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) plot(cROC_bnp)
Takes a fitted pooledROC
object produced by pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, or pooledROC.dpm
and plots the pooled ROC curve and associated area under the ROC curve (AUC).
## S3 method for class 'pooledROC' plot(x, main = NULL, ...)
## S3 method for class 'pooledROC' plot(x, main = NULL, ...)
x |
An object of class |
main |
Character string with the overall title for the plot. If NULL, the default, the method used to estimate the pooled ROC curve is depicted. |
... |
Further arguments passed to or from other methods. |
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500) summary(m0_emp) plot(m0_emp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500) summary(m0_emp) plot(m0_emp)
This function estimates the pooled ROC curve using the Bayesian bootstrap estimator proposed by Gu et al. (2008).
pooledROC.BB(marker, group, tag.h, data, p = seq(0, 1, l = 101), B = 5000, ci.level = 0.95, pauc = pauccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
pooledROC.BB(marker, group, tag.h, data, p = seq(0, 1, l = 101), B = 5000, ci.level = 0.95, pauc = pauccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. |
B |
An integer value specifying the number of Bayesian bootstrap resamples. By default 5000. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
pauc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the pooled ROC curve (ROC) defined as
where
The method implemented in this function makes use of the equivalence (see Gu et al., 2008)
and estimates both and the outer probability using the Bayesian bootstrap resampling distribution.
Regarding the area under the curve, we note that
where . In our implementation, the expectation is computed using the Bayesian bootstrap (using the same weights as those used to estimate the pooled ROC). As far as the partial area under the curve is concerned, when
focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
where . Again, the expectation is computed using the Bayesian bootstrap. The returned value is the normalised pAUC,
so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
where is a
rotation of the ROC curve, and it can be expressed as
. Thus
where , and the expectation is computed using the Bayesian bootstrap. The returned value is the normalised pAUC,
, so that it ranges from
(useless test) to 1 (perfect test).
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve (posterior mean) and |
weights |
list with the Dirichlet weights (involved in the estimation) in the healthy (h) and diseased (d) groups. These are matrices of dimension n0 x B and n1 x B, where n0 is the number of healthy individuals and n1 is the number of diseased individuals. |
Gu, J., Ghosal, S., and Roy, A. (2008). Bayesian bootstrap estimation of ROC curve. Statistics in Medicine, 27, 5407–5420.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_BB <- pooledROC.BB(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 5000) summary(m0_BB) plot(m0_BB)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_BB <- pooledROC.BB(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 5000) summary(m0_BB) plot(m0_BB)
This function estimates the pooled ROC curve using a Dirichlet process mixture of normals model as proposed by Erkanli et al. (2006).
pooledROC.dpm(marker, group, tag.h, data, standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95, compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE, pauc = pauccontrol(), density = densitycontrol(), prior.h = priorcontrol.dpm(), prior.d = priorcontrol.dpm(), mcmc = mcmccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
pooledROC.dpm(marker, group, tag.h, data, standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95, compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE, pauc = pauccontrol(), density = densitycontrol(), prior.h = priorcontrol.dpm(), prior.d = priorcontrol.dpm(), mcmc = mcmccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
standardise |
A logical value. If TRUE the test outcomes are standardised (so that the resulting test outcomes have mean zero and standard deviation of one). The default is TRUE. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. This set is also used to compute the area under the ROC curve (AUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion is computed. |
pauc |
A list of control values to replace the default values returned by the function |
density |
A list of control values to replace the default values returned by the function |
prior.h |
Hyparameter specification for the healthy population. A list of control values to replace the default values returned by the function |
prior.d |
Hyparameter specification for the diseased population. A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the pooled ROC curve (ROC) defined as
where
The method implemented in this function estimates and
by means of a Dirichlet process mixture of normals model. More precisely, and letting
and
be two independent random samples from the nondiseased and diseased populations, respectively, the model postulated for the distribution function is as follows
where is pre-specified is a pre-specified upper bound on the number of mixture components (
). The
's result from a truncated version of the stick-breaking construction (
;
,
;
Beta
;
,
),
, and
.
The area under the curve is
When the upper bound on the number of mixture components is 1, i.e., (
), there is a closed-form expression for the AUC (binormal model), which is used in the package. In contrast, when
or
, the AUC is computed using results presented in Erkanli et al. (2006). With regard to the partial area under the curve, when
focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
As for the AUC, when (
), there is a closed-form expression for the
(Hillis and Metz, 2012), and when
or
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC,
so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
where is a
rotation of the ROC curve, and it can be expressed as
Again, when
(
), there is a closed-form expression for the
(Hillis and Metz, 2012), and when
or
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC,
, so that it ranges from
(useless test) to 1 (perfect test).
It is worth referring that with respect to the computation of the DIC, when , it is computed as in Spiegelhalter et al. (2002), and when
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
prior |
A list returning the hyperparameter values in the healthy and diseased populations. |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve (posterior mean) and |
dens |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: |
lpml |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: deviance information criterion (DIC) and associated complexity penalty (pD). |
fit |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with the following information: (1) |
Erkanli, A., Sung M., Jane Costello, E., and Angold, A. (2006). Bayesian semi-parametric ROC analysis. Statistics in Medicine, 25, 3905–3928.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection, Journal of the American Statistical Association, 74, 153–160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1010.
Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_dpm <- pooledROC.dpm(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.WAIC = TRUE, compute.lpml = TRUE, compute.DIC = TRUE, prior.h = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), prior.d = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 400, nburn = 100, nskip = 1)) summary(m0_dpm) plot(m0_dpm)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_dpm <- pooledROC.dpm(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.WAIC = TRUE, compute.lpml = TRUE, compute.DIC = TRUE, prior.h = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), prior.d = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1, L =10), mcmc = mcmccontrol(nsave = 400, nburn = 100, nskip = 1)) summary(m0_dpm) plot(m0_dpm)
This function estimates the pooled ROC curve using the empirical estimator proposed by Hsieh and Turnbull (1996).
pooledROC.emp(marker, group, tag.h, data, p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, method = c("ncoutcome", "coutcome"), pauc = pauccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
pooledROC.emp(marker, group, tag.h, data, p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95, method = c("ncoutcome", "coutcome"), pauc = pauccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
method |
A character string specifying if bootstrap resampling (for the confidence intervals) should be done with or without regard to the disease status (“coutcome” or “noutcome”). In both cases, a naive bootstrap is used. By default, the resampling is done conditionally on the disease status. |
pauc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the pooled ROC curve (ROC) defined as
where
The method implemented in this function estimates and
by means of the empirical dsitributions. More precisely, and letting
and
be two independent random samples from the nondiseased and diseased populations, respectively, the distribution functions in each group take the form
The area under the curve is
and is estimated empirically by means of the Mann-Whitney U-statistic. With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
where again is estimated empirically. The returned value is the normalised pAUC, so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
where is a
rotation of the ROC curve, and it can be expressed as
Again,
is estimated empirically. The returned value is the normalised pAUC,
, so that it ranges from
(useless test) to 1 (perfect test).
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve along with its |
Hsieh, F., and Turnbull, B.W. (1996). Nonparametric and semiparametric estimation of the receiver operating characteristic curve, The Annals of Statistics, 24, 25–40.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 10, method = "coutcome", pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF")) summary(m0_emp) plot(m0_emp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 10, method = "coutcome", pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF")) summary(m0_emp) plot(m0_emp)
This function estimates the pooled ROC curve using the kernel-based density estimator proposed by Zhou et al. (1997).
pooledROC.kernel(marker, group, tag.h, data, p = seq(0, 1, l = 101), bw = c("SRT", "UCV"), B = 1000, ci.level = 0.95, method = c("ncoutcome", "coutcome"), pauc = pauccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
pooledROC.kernel(marker, group, tag.h, data, p = seq(0, 1, l = 101), bw = c("SRT", "UCV"), B = 1000, ci.level = 0.95, method = c("ncoutcome", "coutcome"), pauc = pauccontrol(), parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. This set is also used to compute the area under the ROC curve (AUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
bw |
A character string specifying the density bandwidth selection method. “SRT”: Silverman's rule-of-thumb; “UCV”: unbiased cross-validation. The default is “SRT”. |
B |
An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. The default is 1000. |
ci.level |
An integer value (between 0 and 1) specifying the confidence level. The default is 0.95. |
method |
A character string specifying if bootstrap resampling (for the confidence intervals) should be done with or without regard to the disease status (“coutcome” or “noutcome”). In both cases, a naive bootstrap is used. By default, the resampling is done conditionally on the disease status. |
pauc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Estimates the pooled ROC curve (ROC) defined as
where
The method implemented in this function estimates and
by means of kernel methods. More precisely, and letting
and
be two independent random samples from the nondiseased and diseased populations, respectively, the distribution functions in each group take the form
where stands for the standard normal distribution function evaluated at
. For the bandwidth
,
which controls the amount of smoothing, two options are available. When
bw = "SRT"
,
where and
are the standard deviation and interquantile range, respectively, of
. In turn, when
bw = "UCV"
, the bandwidth is selected via unbiased cross-validation, for further details we refer to bw.ucv
from the base package stats
.
The area under the curve is
and is computed numerically (using Simpson's rule). With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound for the FPF, what it is computed is
where again the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, so that it ranges from
(useless test) to 1 (perfect marker). Conversely, when
focus = "TPF"
, and assuming a lower bound for the TPF of , the partial area corresponding to TPFs lying in the interval
is computed as
where is a
rotation of the ROC curve, and it can be expressed as
Again, the computation of the integral is done via Simpson's rule. The returned value is the normalised pAUC,
, so that it ranges from
(useless test) to 1 (perfect test).
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
bws |
Named list of length two with components 'h' (healthy) and 'd' (diseased). Each component is a numeric value with the selected bandwidth. |
bw |
The value of the argument |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve along with its |
Zou, K.H., Hall, W.J., Shapiro, D.E. (1997) Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine, 16, 2143–2156.
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_kernel <- pooledROC.kernel(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), bw = "SRT", B = 500, method = "coutcome", pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF")) summary(m0_kernel) plot(m0_kernel)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_kernel <- pooledROC.kernel(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), bw = "SRT", B = 500, method = "coutcome", pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF")) summary(m0_kernel) plot(m0_kernel)
Implements posterior predictive checks for objects of AROC
or cROC
as produced by AROC.bnp
or cROC.bnp
.
predictive.checks(object, statistics = c("min", "max", "kurtosis", "skewness"), ndensity = 512, devnew = TRUE)
predictive.checks(object, statistics = c("min", "max", "kurtosis", "skewness"), ndensity = 512, devnew = TRUE)
object |
An object of class |
statistics |
Character vector. Statistics to be used for the posterior predictive checking. By default, "min", "max", "kurtosis" and "skewness" |
ndensity |
An integer giving the number of equally spaced points at which the density of the test outcomes and of the simulated datasets from the posterior predictive distribution (see more on Details) is to be estimated (for more details see the help of the function |
devnew |
A logical value. If TRUE, each plot is depicted in a new graphic device. |
Compares a selected test statistic computed based on the observed diagnostic test outcome (either in the nondiseased group, AROC
object, or in both the nondiseased and diseased groups, cROC
object) against the same test statistics computed based on simulated data from the posterior predictive distribution of the diagnostic test outcome obtained using a single-weights linear dependent Dirichlet process mixture of normals model.
The following graphics are depicted: (1) histograms of the desired statistics computed from simulated datasets (nsave of them) from the posterior predictive distribution of the diagnostic test outcome. In these plots, the estimated statistics from the observed diagnostic test outcome are also depicted. (2) Kernel density estimates computed computed from simulated datasets (nsave of them) from the posterior predictive distribution of the diagnostic test outcome. In these plots, the kernel density estimate of the observed diagnostic test outcome is also depicted. In the case of an object of class AROC
, the abovementioned graphics are depicted only for the diagnostic test outcome in the nondiseased group. However, for an object of class cROC
, the graphics are depicted, separately, for both the nondiseased and diseased groups. For a detailed discussion about predictive checks, see Gabry et al. (2019).
As a result, the function provides a list with the following components:
yrep |
List of matrices associated with the diseased (d) and nondiseased (h) groups. Each column of the matrix (there are nsave of them) corresponds to a dataset generated from the posterior predictive distribution of the diagnostic test outcomes. For |
y |
List of numeric vectors associated with the diseased (d) and nondiseased (h) groups. The vector contains the observed diagnostic test outcomes. For |
Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., and Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society, Series A, 182, 1–14.
Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2022). The Covariate-Adjusted ROC Curve: The Concept and Its Importance, Review of Inferential Methods, and a New Bayesian Estimator. Statistical Science, 37, 541 -561.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE) predictive.checks(AROC_bnp, statistics = "skewness")
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE) predictive.checks(AROC_bnp, statistics = "skewness")
AROC
objects
Default print method for objects fitted with AROC.bnp()
, AROC.sp()
, and AROC.kernel()
functions.
## S3 method for class 'AROC' print(x, ...)
## S3 method for class 'AROC' print(x, ...)
x |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
A short summary is printed including the area under the covariate-adjusted ROC curve (AAUC), and if required, the partial area under the covariate-adjusted ROC curve (pAAUC).
AROC.bnp
, AROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE) AROC_bnp
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) AROC_bnp <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE) AROC_bnp
cROC
objects
Default print method for objects fitted with cROC.bnp()
, cROC.sp()
and cROC.kernel()
functions.
## S3 method for class 'cROC' print(x, ...)
## S3 method for class 'cROC' print(x, ...)
x |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
A short summary is printed.
cROC.bnp
, cROC.sp
or cROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) cROC_bnp
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) cROC_bnp
pooledROC
objects
Default print method for objects fitted with pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
functions.
## S3 method for class 'pooledROC' print(x, ...)
## S3 method for class 'pooledROC' print(x, ...)
x |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
A short summary is printed including the area under the pooled ROC curve (AUC).
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500) summary(m0_emp) plot(m0_emp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500) summary(m0_emp) plot(m0_emp)
AROC.bnp
and cROC.bnp
This function is used to set various parameters controlling the prior information to be used in the AROC.bnp
and cROC.bnp
functions.
priorcontrol.bnp(m0 = NA, S0 = NA, nu = NA, Psi = NA, a = 2, b = NA, alpha = 1, L = 10)
priorcontrol.bnp(m0 = NA, S0 = NA, nu = NA, Psi = NA, a = 2, b = NA, alpha = 1, L = 10)
m0 |
A numeric vector. Hyperparameter; mean vector of the (multivariate) normal prior distribution for the mean of the normal component of the centring distribution. |
S0 |
A numeric matrix. Hyperparameter; covariance matrix of the (multivariate) normal prior distribution for the mean of the normal component of the centring distribution. |
nu |
A numeric value. Hyperparameter; degrees of freedom of the Wishart prior distribution for the precision matrix of the the normal component of the centring distribution. |
Psi |
A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the the normal component of the centring distribution. |
a |
A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precisions (inverse variances) of each component. The default is 2. |
b |
A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precisions (inverse variances) of each component. |
alpha |
A numeric value. Precision parameter of the Dirichlet Process. The default is 1. |
L |
A numeric value. Upper bound on the number of mixture components. Setting L = 1 corresponds to a normal model. The default is 10. |
A list with components for each of the possible arguments.
pooledROC.dpm
This function is used to set various parameters controlling the prior information to be used in the pooledROC.dpm
function.
priorcontrol.dpm(m0 = NA, S0 = NA, a = 2, b = NA, alpha = 1, L = 10)
priorcontrol.dpm(m0 = NA, S0 = NA, a = 2, b = NA, alpha = 1, L = 10)
m0 |
A numeric value. Hyperparameter; mean of the normal prior distribution for the means of each component. |
S0 |
A numeric value. Hyperparameter; variance of the normal prior distribution for the means of each component. |
a |
A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precisions (inverse variances) of each component. The default is 2. |
b |
A numeric value. Hyperparameter; rate parameter of the gamma prior distribution for the precisions (inverse variances) of each component. |
alpha |
A numeric value. Precision parameter of the Dirichlet Process. The default is 1. |
L |
A numeric value. Upper bound on the number of mixture components. Setting L=1 corresponds to a normal model. The default is 10. |
A list with components for each of the possible arguments.
The dataset contains 71 prostate cases and 71 controls who participated in a lung cancer prevention trial (CARET, Beta-carotene and retinol trial). For details, see Etzioni et al. (1999) and Pepe (2003).
data("psa")
data("psa")
A data frame with 683 observations on the following 6 variables.
id
Patient identifier.
marker1
Total prostate specific antigen (PSA).
marker2
Free prostate specific antigen (PSA)
status
Presence/absence of prostate cancer. The non-cancer patients are controls matched to cases on age and number of serum samples available for analysis (see Details).
age
Patient age at blood draw (serum sample).
t
Time (years) relative to prostate cancer diagnosis.
The CARET enrolled 12000 men, aged between 50 and 65 years, at high risk of lung cancer. For each subject on the study, serum samples were drawn at baseline and at two-year intervals after that. The data presented here represent a subsample of the original sample, and it was reported by Etzioni et al. (1999). It contains 71 cases of prostate cancer that occurred during the study. All these cases had, at least, three and up to eight serum samples. As far as controls are concerned, they were selected from the participants of the CARET study verifying that they had not been diagnosed with prostate cancer by the time of the original study, and the selection was done by matching to cases on date of birth and number of serum samples available for analysis.
The dataset can be downloaded from https://research.fredhutch.org/diagnostic-biomarkers-center/en/datasets.html.
Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series. Oxford University Press, New York.
Etzioni, R., Pepe, M. S., Longton, G., Hu. C., and Goodman, G. (1999). Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making, 19(3), 242-251.
data(psa) summary(psa)
data(psa) summary(psa)
AROC
objects
Default summary method for objects fitted with AROC.bnp()
, AROC.sp()
, or AROC.kernel()
functions.
## S3 method for class 'AROC' summary(object, ...)
## S3 method for class 'AROC' summary(object, ...)
object |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
The information printed depends on the method. In all cases, the call to the function, the method, the area under the covariate-adjusted ROC curve (AAUC), the partial area under the covariate-adjusted ROC curve (if required) (AAUC), and the sample sizes are printed. For the semiparametric approach (AROC.sp()
), the estimated coefficients (and 95% confidence intervals, if required) of the model for the healthy population are printed. In addition, the function provides the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). For the nonparametric Bayesian approach (AROC.bnp()
), and if required, the function provides the log pseudo marginal likelihood (LPML), the widely applicable information criterion (WAIC) and/or the deviance information criterion (DIC). For the kernel-based approach (AROC.kernel()
), information regarding the selected bandwidth and the type of kernel estimator (for both regression and variance functions) is printed.
AROC.bnp
, AROC.sp
or AROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0 <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE) summary(m0)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0 <- AROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE) summary(m0)
cROC
objects
Default summary method for objects fitted with cROC.bnp()
, cROC.sp()
, or cROC.kernel()
functions.
## S3 method for class 'cROC' summary(object, ...)
## S3 method for class 'cROC' summary(object, ...)
object |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
The information printed depends on the method. In all cases, the call to the function, the method, and the sample sizes are printed. For the semiparametric approach (cROC.sp()
), the estimated coefficients (and 95% confidence intervals, if required) of the model for the healthy population, the diseased population and the conditional ROC curve, are printed. In addition, the function provides the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). For the nonparametric Bayesian approach (cROC.bnp()
), and if required, the function provides the log pseudo marginal likelihood (LPML), the widely applicable information criterion (WAIC) and/or the deviance information criterion (DIC) (for both healthy and diseased populations). For the kernel-based approach (cROC.kernel()
), information regarding the selected bandwidths and the type of kernel estimator(for both healthy and diseased populations and for both regression and variance functions) is printed.
Rodriguez-Alvarez, M.X., Tahoces, P.G., Cadarso-Suarez, C. and Lado, M.J. (2011). Comparative study of ROC regression techniques. Applications for the computer-aided diagnostic system in breast cancer detection. Computational Statistics and Data Analysis, 55, 888–902.
cROC.bnp
, cROC.sp
or cROC.kernel
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) summary(cROC_bnp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) cROC_bnp <- cROC.bnp(formula.h = l_marker1 ~ f(age, K = 0), formula.d = l_marker1 ~ f(age, K = 0), group = "status", tag.h = 0, data = newpsa, standardise = TRUE, p = seq(0, 1, len = 101), compute.lpml = TRUE, compute.WAIC = TRUE, compute.DIC = TRUE, pauc = pauccontrol(compute = TRUE, value = 0.5, focus = "FPF"), density = densitycontrol(compute = TRUE, grid.h = NA, grid.d = NA), mcmc = mcmccontrol(nsave = 500, nburn = 100, nskip = 1)) summary(cROC_bnp)
pooledROC
objects
Default summary method for objects fitted with pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, or pooledROC.dpm
functions.
## S3 method for class 'pooledROC' summary(object, ...)
## S3 method for class 'pooledROC' summary(object, ...)
object |
An object of class |
... |
Further arguments passed to or from other methods. Not yet implemented. |
A short summary is printed including the call to the function, the method, samples sizes, the area under the pooled ROC curve (AUC), and if required, the partial area under the pooled ROC curve. For the nonparametric Bayesian approach (pooledROC.dpm()
), and if required, the function provides the log pseudo marginal likelihood (LPML), the widely applicable information criterion (WAIC) and/or the deviance information criterion (DIC). For the kernel-based approach (pooledROC.dpm()
), information regarding the selected bandwidths and the density bandwidth selection method is presented.
pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
or pooledROC.dpm
.
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500) summary(m0_emp) plot(m0_emp)
library(ROCnReg) data(psa) # Select the last measurement newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),] # Log-transform the biomarker newpsa$l_marker1 <- log(newpsa$marker1) m0_emp <- pooledROC.emp(marker = "l_marker1", group = "status", tag.h = 0, data = newpsa, p = seq(0,1,l=101), B = 500) summary(m0_emp) plot(m0_emp)