3 pbj software overview

3.1 Installation

The latest stable version of the pbj package can be installed from github.

# install the latest version of the package from github.
devtools::install_github('statimagcoll/pbj')

3.2 Analysis setup

This first step is to load R packages for performing interactive image analysis.

  • The pbj package includes our functions needed to perform group level neuroimage analysis.
  • The papayaWidget package is used to embed interactive visualizations into this html document.

The pbj package includes dependencies for the following Neuroconductor packages

  • The RNifti package for Nifti image I/O.
  • oro.nifti package for Nifti image I/O.
  • The fslr package includes wrappers for command line tools included in FSL, which will be used to perform some basic preprocessing.

In addition, we setup some variables used for all analyses: ncores specifies the number of cores to use when running parallel processes and setting the seed for the RNG ensures the results are reproducible.

### LIBRARIES ###
library(pbj) # pbj package
library(RNifti) # Nifti I/O. Loaded with pbj
library(fslr) # for FSL utilities
library(papayaWidget) # image viewer
library(splines) # ns
library(table1) # convenient table creation
library(parallel) # for parallelization
library(magrittr) # %>%

# number of cores for parallel processing
ncores = 16
# seed for randomization
set.seed(555)

It is easiest to compile all the files needed for the analysis into a few R objects. To perform the analysis, we need the following data

  • The data file that includes
    1. All covariates required for the analysis.
    2. Paths to the .nii.gz images we will analyze, assumed to be spatially registered to template space.
  • The mask to identify voxels to include in the analysis.
  • A template image to use as the background for visualizing the results.

We recommend storing the image paths as a column in a data frame with the covariates, so that correspondence between the outcome and covariates is retained throughout the analysis.

3.3 Software structure and general workflow

The key functions of the package are for model estimation, statistical inference, summarization, visualization, and I/O.

Model estimation:

  • lmPBJ – the main function to estimate test statistics and coefficients for comparing two models.

Statistical inference:

  • pbjInference – the main function to perform inference on topological features of the test statistic image.
  • cluster – computes cluster statistics, such as cluster extent and cluster mass from a niftiImage object.
  • maxima – computes global or local maxima statistics from a niftiImage object.
  • mmeStat – computes maxima, mass, or extent statistics from a niftiImage object.

Summarization:

  • table.statMap – create maxima/mass/cluster summary table.
  • summary.statMap – summarize the statMap object in terminal.

Visualization:

  • image.statMap – lightbox visualization (and others) of statMap object.
  • image.niftiImage – lightbox visualization (and others) of niftiImage class, defined in RNifti package.

I/O:

  • stat.statMap – extract statistical image as -log10(p), effect size (robust effect size index; RESI), or chi-squared statistic from statMap object as a niftiImage object.
  • coef.statMap – extract 4d coefficient image from statMap object as a niftiImage object. Coefficients correspond to the parameters tested by comparing the full and reduced models.
  • write.statMap – write out files in statMap object to a specified directory.

3.3.1 Model estimation using the lmPBJ function

The lmPBJ function fits linear models to imaging data and computes test statistics for parameters of interest. The pbj package uses R’s formula syntax to specify models of imaging data using the full/reduced model specification to test variables in the data set. The full model, lmfull, specifies a formula that includes all model terms. There is no left side to the equation (the dependent/outcome variable) because it is determined by the imaging variables. Comparing lmfull to lmred will test the terms of the full model that are related to the interaction. The mask image identifies which voxels in the image to include in the analysis. The template image is used as the background image for visualization later.

There are many other options to adjust how to fit the model and compute test statistics; statistical details are given above in PBJ analysis methods. The Unlike the parametric Gaussian random field methods implemented in many software packages, the methods used in pbj yield cluster extent p-values that are valid regardless of data smoothness, or cluster-forming threshold (CFT) and perform well in small samples. pbj also supports the permutation procedure similar to what is implemented in FSL’s randomise (Winkler et al. 2014).

  • id – for longitudinal or repeated measurements an integer or factor variable indicating which rows come from the same study participants.
  • robust – a logical specifying whether robust standard errors should be used, which makes the test statistics valid even under some types of model misspecification. The robust test statistic ensures that the standard errors are consistently estimated, even if homoskedasticity is violated. It reduces replicability slightly at the cost of reducing bias of the test statistic, \(p\)-values, and effect sizes estimates (Section REF).
  • transform – Use and inverse CDF transformation to make the test statistics appear closer to a normal distribution in small sample sizes. This is a sensible transformation if the null hypothesis is true, but might not be a good idea if the null is false.
  • HC3 – whether to use a small sample adjustment to the robust standard error estimates that improves coverage performance (it makes the test statistics appear closer to a normal distribution).

The result of lmPBJ is a statMap object, which is a list containing all the elements needed for visualization and downstream analysis. At this stage, uncorrected results can be summarized and visualized using the functions described above. The statistical values are stored as chi-squared statistics because this is the most general random variable use for statistical inference.

3.3.2 Statistical inference using the pbjInference function

There are many different options for performing inference of topological features for neuroimaging data and the pbjInference function can accommodate a wide range of possibilities. It is best to include any inference methods you want to consider all together, so that you don’t have to rerun the bootstrap or permutation procedure each time. Here are some key arguments to the pbjInference:

  • statMap – the object returned by lmPBJ.
  • statistic – a function used to compute common topological features of the image.
  • null – whether to perform resampling under the null or alternative hypothesis.
  • nboot – the number of bootstraps to perform.
  • method – controls what resampling procedure to use.
  • cft_s, cft_p, cft_chisq – cluster forming threshold, used to compute a cft variable that is an argument to the statistic function.
  • ... – arguments passed to the statistic function.
  • rdata_rds – a file to write the output to. If specified, it will run the bootstrap in the background.
  • mc.cores – the number of cores to use for parallelization.

After the statMap argument, the statistic function is the most important argument, because it determines what type of topological features to perform inference on. The statistic argument is a function itself, that takes a niftiImage object as the first argument and computes some topological feature of the image and returns them as a named list. It, optionally, includes other arguments to modify its behavior. These arguments can be specified when running pbjInference by passing them as named arguments. The mmeStat function is the default value for the statistic argument of pbjInference and includes options to run inference on the most common types of topological features, including local maxima, cluster mass inference, and cluster extent inference. There are also the functions maxima and cluster that can be used to compute local maxima or cluster-based statistics separately. The statistic function can take mask and cft arguments. The mask is automatically assigned from the statMap object and the cft argument is determined automatically by the cft_s and cft_p arguments in pbjInference. Because cluster based inference is so common, there are convenience arguments that take a cluster forming threshold as an effect size (cft_s; on the RESI scale (S. Vandekar, Tao, and Blume 2020; Kang et al. 2023)), a \(p\)-value (cft_p), or a chi-squared value (cft_chisq). These arguments are converted to a cft variable internally that is on the chi-squared statistical scale. These arguments are optional if the topological feature is not a function of an image threshold. For topological features that use the cft variable this is stored as an attribute of that element of the statistic output, which is used for visualization later.

The result is the input statMap object, with a pbj object added as an element in the list. Summarization and visualization functions check for the pbj field and provide inference results based on what fields are available.

References

Kang, Kaidi, Megan T. Jones, Kristan Armstrong, Suzanne Avery, Maureen McHugo, Stephan Heckers, and Simon Vandekar. 2023. “Accurate Confidence and Bayesian Interval Estimation for Non-Centrality Parameters and Effect Size Indices.” Psychometrika, February. https://doi.org/10.1007/s11336-022-09899-x.
Vandekar, Simon, Ran Tao, and Jeffrey Blume. 2020. “A Robust Effect Size Index.” Psychometrika 85 (1): 232–46. https://doi.org/10.1007/s11336-020-09698-2.
Winkler, Anderson M., Gerard R. Ridgway, Matthew A. Webster, Stephen M. Smith, and Thomas E. Nichols. 2014. “Permutation Inference for the General Linear Model.” Neuroimage 92: 381–97.