3 pbj
software overview
3.1 Installation
The latest stable version of the pbj
package can be installed from github.
# install the latest version of the package from github.
::install_github('statimagcoll/pbj') devtools
3.2 Analysis setup
This first step is to load R
packages for performing interactive image analysis.
- The
pbj
package includes our functions needed to perform group level neuroimage analysis. - The
papayaWidget
package is used to embed interactive visualizations into this html document.
The pbj
package includes dependencies for the following Neuroconductor packages
- The
RNifti
package for Nifti image I/O. oro.nifti
package for Nifti image I/O.- The
fslr
package includes wrappers for command line tools included in FSL, which will be used to perform some basic preprocessing.
In addition, we setup some variables used for all analyses: ncores
specifies the number of cores to use when running parallel processes and setting the seed for the RNG ensures the results are reproducible.
### LIBRARIES ###
library(pbj) # pbj package
library(RNifti) # Nifti I/O. Loaded with pbj
library(fslr) # for FSL utilities
library(papayaWidget) # image viewer
library(splines) # ns
library(table1) # convenient table creation
library(parallel) # for parallelization
library(magrittr) # %>%
# number of cores for parallel processing
= 16
ncores # seed for randomization
set.seed(555)
It is easiest to compile all the files needed for the analysis into a few R
objects.
To perform the analysis, we need the following data
- The data file that includes
- All covariates required for the analysis.
- Paths to the
.nii.gz
images we will analyze, assumed to be spatially registered to template space.
- The mask to identify voxels to include in the analysis.
- A template image to use as the background for visualizing the results.
We recommend storing the image paths as a column in a data frame with the covariates, so that correspondence between the outcome and covariates is retained throughout the analysis.
3.3 Software structure and general workflow
The key functions of the package are for model estimation, statistical inference, summarization, visualization, and I/O.
Model estimation:
lmPBJ
– the main function to estimate test statistics and coefficients for comparing two models.
Statistical inference:
pbjInference
– the main function to perform inference on topological features of the test statistic image.cluster
– computes cluster statistics, such as cluster extent and cluster mass from aniftiImage
object.maxima
– computes global or local maxima statistics from aniftiImage
object.mmeStat
– computes maxima, mass, or extent statistics from aniftiImage
object.
Summarization:
table.statMap
– create maxima/mass/cluster summary table.summary.statMap
– summarize thestatMap
object in terminal.
Visualization:
image.statMap
– lightbox visualization (and others) ofstatMap
object.image.niftiImage
– lightbox visualization (and others) ofniftiImage
class, defined inRNifti
package.
I/O:
stat.statMap
– extract statistical image as-log10(p)
, effect size (robust effect size index; RESI), or chi-squared statistic fromstatMap
object as aniftiImage
object.coef.statMap
– extract 4d coefficient image fromstatMap
object as aniftiImage
object. Coefficients correspond to the parameters tested by comparing the full and reduced models.write.statMap
– write out files instatMap
object to a specified directory.
3.3.1 Model estimation using the lmPBJ
function
The lmPBJ
function fits linear models to imaging data and computes test statistics for parameters of interest.
The pbj
package uses R
’s formula
syntax to specify models of imaging data using the full/reduced model specification to test variables in the data set.
The full model, lmfull
, specifies a formula that includes all model terms.
There is no left side to the equation (the dependent/outcome variable) because it is determined by the imaging variables.
Comparing lmfull
to lmred
will test the terms of the full model that are related to the interaction.
The mask
image identifies which voxels in the image to include in the analysis.
The template
image is used as the background image for visualization later.
There are many other options to adjust how to fit the model and compute test statistics; statistical details are given above in PBJ analysis methods.
The
Unlike the parametric Gaussian random field methods implemented in many software packages, the methods used in pbj
yield cluster extent p-values that are valid regardless of data smoothness, or cluster-forming threshold (CFT) and perform well in small samples.
pbj
also supports the permutation procedure similar to what is implemented in FSL’s randomise
(Winkler et al. 2014).
id
– for longitudinal or repeated measurements an integer or factor variable indicating which rows come from the same study participants.robust
– a logical specifying whether robust standard errors should be used, which makes the test statistics valid even under some types of model misspecification. The robust test statistic ensures that the standard errors are consistently estimated, even if homoskedasticity is violated. It reduces replicability slightly at the cost of reducing bias of the test statistic, \(p\)-values, and effect sizes estimates (Section REF).transform
– Use and inverse CDF transformation to make the test statistics appear closer to a normal distribution in small sample sizes. This is a sensible transformation if the null hypothesis is true, but might not be a good idea if the null is false.HC3
– whether to use a small sample adjustment to the robust standard error estimates that improves coverage performance (it makes the test statistics appear closer to a normal distribution).
The result of lmPBJ
is a statMap
object, which is a list
containing all the elements needed for visualization and downstream analysis. At this stage, uncorrected results can be summarized and visualized using the functions described above.
The statistical values are stored as chi-squared statistics because this is the most general random variable use for statistical inference.
3.3.2 Statistical inference using the pbjInference
function
There are many different options for performing inference of topological features for neuroimaging data and the pbjInference
function can accommodate a wide range of possibilities.
It is best to include any inference methods you want to consider all together, so that you don’t have to rerun the bootstrap or permutation procedure each time.
Here are some key arguments to the pbjInference
:
statMap
– the object returned by lmPBJ.statistic
– a function used to compute common topological features of the image.null
– whether to perform resampling under the null or alternative hypothesis.nboot
– the number of bootstraps to perform.method
– controls what resampling procedure to use.cft_s
,cft_p
,cft_chisq
– cluster forming threshold, used to compute acft
variable that is an argument to thestatistic
function....
– arguments passed to thestatistic
function.rdata_rds
– a file to write the output to. If specified, it will run the bootstrap in the background.mc.cores
– the number of cores to use for parallelization.
After the statMap
argument, the statistic
function is the most important argument, because it determines what type of topological features to perform inference on.
The statistic
argument is a function itself, that takes a niftiImage
object as the first argument and computes some topological feature of the image and returns them as a named list.
It, optionally, includes other arguments to modify its behavior. These arguments can be specified when running pbjInference
by passing them as named arguments.
The mmeStat
function is the default value for the statistic
argument of pbjInference
and includes options to run inference on the most common types of topological features, including local maxima, cluster mass inference, and cluster extent inference. There are also the functions maxima
and cluster
that can be used to compute local maxima or cluster-based statistics separately.
The statistic
function can take mask
and cft
arguments.
The mask
is automatically assigned from the statMap
object and the cft
argument is determined automatically by the cft_s
and cft_p
arguments in pbjInference
.
Because cluster based inference is so common, there are convenience arguments that take a cluster forming threshold as an effect size (cft_s
; on the RESI scale (S. Vandekar, Tao, and Blume 2020; Kang et al. 2023)), a \(p\)-value (cft_p
), or a chi-squared value (cft_chisq
).
These arguments are converted to a cft
variable internally that is on the chi-squared statistical scale.
These arguments are optional if the topological feature is not a function of an image threshold.
For topological features that use the cft
variable this is stored as an attribute of that element of the statistic
output, which is used for visualization later.
The result is the input statMap
object, with a pbj
object added as an element in the list.
Summarization and visualization functions check for the pbj
field and provide inference results based on what fields are available.