Leave-One-Out fit of elastic nets for Functional Connectivity data with nested crossvalidation

This function is a wrapper around glmnet::glmnet() as called from FCnet::cv_FCnet(). For extended documentation, the readers are encouraged to consult the original source of glmnet::glmnet() and its vignette. glmnet::glmnet() fits a robust linear model through penalized maximum likelihood computed via the lasso or elastic net regularization path. FCnetLOO() requires two objects at minimum: y is a vector or data.frame with exactly one column, corresponding to the (behavioral) score to predict; x is a data.frame or a list of lists with an entry named "Weights", which includes the independent variables. x can be - and is meant to be - one object created by reduce_featuresFC(), but this is not strictly necessary. The best model and hyperparameters are retrieved in inner loops through cv_FCnet(); details of the crossvalidation procedures can be passed as ... arguments to FCnetLOO() if necessary. A call to FCnetLOO() returns a list including goodness of fit measures for the outer loop, a data.frame including coefficients for all nested models, vectors of crossvalidated parameters, and predicted scores. Note that dependent variables are scaled by default through scale_y. The ParallelLOO option is recommended for speed. In order to use parallel computing, future.apply must be installed, your machine should have multiple cores available, and parallel computing should be prompted by the user (e.g. via plan(multisession).

FCnetLOO(
  y,
  x,
  alpha = seq(0, 1, by = 0.1),
  lambda = rev(10^seq(-5, 5, length.out = 200)),
  cv_Ncomp = NULL,
  cv_Ncomp_method = c("order", "R"),
  parallelLOO = F,
  scale_y = T,
  scale_x = T,
  family = optionsFCnet("family"),
  type.measure = optionsFCnet("cv.type.measure"),
  intercept = optionsFCnet("intercept"),
  standardize = optionsFCnet("standardize"),
  thresh = optionsFCnet("thresh"),
  ...
)

Arguments

y	The dependent variable, typically behavioral scores to predict. This can be a vector or a single data.frame column.
x	The independent variables, typically neural measures that have been already summarised through data reduction techniques (e.g. ICA, PCA): an object created by `reduce_featuresFC()` will do. If such an object is passed to this function, the "Weights" slot is taken as x. A list can be passed to this function: in this case the function needs an entry named "Weights". Otherwise, a data.frame can be passed to x.
alpha	Value(s) that bias the elastic net toward ridge regression (alpha== 0) or LASSO regression (alpha== 1). If a vector of alpha values is supplied, the value is optimized through nested crossvalidation. It defaults to a vector ranging from 0 to 1 with steps of 0.1. The crossvalidated alpha is returned.
lambda	Regularization parameter for the regression, see `glmnet::glmnet()`. Lambda must be a vector with length>1. When a vector of lambda values is supplied, the value of lambda is optimized through internal nested crossvalidation. It defaults to a vector ranging from 10^-5 to 10^5 with 200 values in logarithmic steps. The crossvalidated optimal lambda is returned.
cv_Ncomp	Whether to crossvalidate the number of components or not. It defaults to NULL, but a vector can be supplied specifing the number (range) of components to test in the inner loops.
cv_Ncomp_method	Whether the number of components to optimize means components are ordered (e.g. according to the explained variance of neuroimaging data) or - somehow experimental - whether to use the N best components ranked according to their relationship (pearson's R) with y.
parallelLOO	If TRUE - recommended, but not the default - uses `future.apply::future_lapply()` for the outer loops: `future.apply` must be installed, the machine should have multiple cores available for use, and threads should be defined explicitly beforehand by the user (e.g. by calling `plan(multisession)`).
scale_y	Whether y should be scaled prior to fit. Default, TRUE, scales and center y with `scale()`.
scale_x	Whether x should be scaled prior to fit. Default, TRUE, subtracts the mean matrix value and divides each entry for the matrix variance. Beware that this adds to `optionsFCnet("standardize")`.
family	Defaults to "gaussian." Experimental support for "binomial" on the way.
intercept	whether to fit (TRUE) or not (FALSE) an intercept to the model.
standardize	Whether x must be standardized.
thresh	Threshold for glmnet to stop converging to the solution.
...	Other parameters passed to `glmnetUtils::cva.glmnet()` or `glmnet::glmnet()`.
cv.type.measure	The measure to minimize in crossvalidation inner loops. Differently from `glmnetUtils::cva.glmnet()` the default is the mean absolute error.

Value

Goodness of fit statistics for the outer loops, as well as LOO predictions. Crossvalidated best alpha and lambda values for the inner loops as well as all inner models' coefficients combined..