2 Methods
2.1 Data and data availability
2.1.1 Preprocessed Autism Brain Imaging Data Exchange
We downloaded fully processed amplitude of low frequency fluctuations (ALFF) (Zang et al. 2007) from resting state functional magnetic resonance imaging scans for the ABIDE data set from the preprocessed connectomes project REF (). The ABIDE is a collaboration of 16 international sites that have aggregated and openly share neuroimaging data from 1,112 scanning sessions, including 539 individuals with autism spectrum disorder (Di Martino et al. 2014). A total of nine subjects were excluded due to poor image quality yielding a study mask with 30,272 voxels. This minimal data quality screening was performed in order to maximize the data available for the simulation analyses presented in prior work (S. N. Vandekar and Stephens 2021). These data are available for download through the open connectome website and code to perform the download is available below (Section XX).
2.1.2 Twenty-one pain studies
We will use the pbj
package to perform a meta-analysis of statistical maps from 21 pain studies (Maumet et al. 2016; Gorgolewski et al. 2015).
These data are available in the pain21
R package that is available for download from Neuroconductor (Muschelli et al. 2018) using the command source("https://neuroconductor.org/neurocLite.R"); neuro_install('pain21')
.
2.1.3 Nathan-Kline Institute Rockland Sample
Details of data processing and quality assurance are provided in the Supplement. Briefly, prior to processing, 695 people with demographics and T1 images were visually inspected for quality. A total of 14 scans were excluded because they failed processing or had significant motion or artifacts. VBM was conducted in the Computational Anatomy Toolbox 12 (CAT12: Version 12.5) in Statistical Parametric Mapping 12 (Ashburner and Friston 2005; Ashburner 2009). After estimation the VBM images were registered to the MNI template and downsampled to 2mm isotropic resolution [ashburner_fast_2007]. To further speed computation for the simulations, we restrict the study region to 11 axial slices in the center of the image, which included 51,925 voxels.
2.3 Evaluation under the global null hypothesis
We use two approaches for assessing the accuracy of the resampling methods when the global null hypothesis is true. First we assess the type 1 error rate at thresholds for \(\alpha \in \{0.01, 0.05, 0.1, 0.25\}\). This metric is useful to assess how appropriate the resampling method is for performing hypothesis testing. For the marginal distributions, we apply BH procedure to the uncorrected p-values and compare the estimated and expected type 1 error rate for each threshold. This approach assesses the weak control of the procedure, since the null is true everywhere (Lehmann and Romano 2005). For the global distributions we report the type 1 error rate of the adjusted p-values found by comparing the minimum adjusted p-value across all observed maxima or clusters in each simulation to the \(\alpha\) level.
2.3.1 Null synthetic simulations
We simulated data under known heteroskedasticity to evaluate how the different resampling inference procedures perform in reproducing the distributions of TFs under the null. All resampling methods had near nominal type 1 error rates under homoskedasticity for parametric and robust test statistics, with the permutation and Rademacher bootstrap methods having slightly better accuracy than the normal bootstrap ({}). Under heteroskedasticity the permutation procedure had inflated error rates for parametric statistics, but not for robust test statistics. The two bootstrap methods had near nominal type 1 error rates. Simply using a robust covariance estimator resolves the problem with the permutation inference procedure because the procedure is scaling out the differences in the variances across subjects in each permutation by computing the square root covariance matrix as a function of the permuted data \(\eqref{eq:permutationSampleExchangeability}\).
2.3.2 Null bootstrap simulations
We use the VBM images form the NKI-RS dataset to generate realistic data under the null: first, we regress out the effects of covariates in the full sample of 682 subjects with complete scans using the model
\[\begin{equation}
\label{eq:simModel}
Y_{i}(v) = \beta_0(v) + \beta_1(v)\text{sex}_i + \beta_2(v)\text{race}_{i1} +\beta_3(v)\text{race}_{i2} + f(v, \text{age}_i) + E_i(v),
\end{equation}\]
where \(\text{sex}_{i}\) and \(\text{race}_{ij}\) were indicator variables identifying sex and race (Black, Other, White), and \(f\) was a nonlinear function fit with unpenalized natural cubic splines with 10 degrees of freedom using the ns
package in R
.
In each simulation we draw a bootstrap sample \(Y^b_i(v)\) from the image residuals, \(E_i(v)\), and covariates of model \(\eqref{eq:simModel}\).
Here, we consider one scenario for testing the association between a covariate and the imaging outcome, and two other types of covariates are described and presented in the Supplementary Material.
In each bootstrap, we fit the model
\[
Y^b_i(v) = \beta_0(v) + \beta_1(v)\text{sex}^b_i + \beta_2(v) \text{age}^b_i + f(\text{fake}_i, v) + E^b_i(v),
\]
where \(f\) is estimated with unpenalized natural cubic splines on 4 degrees of freedom and tested it against a model with a linear effect of the artificially generated fake covariate yielding a test statistic with 3 degrees of freedom.
Because we residualized the images to the covariates in the full sample, there is no association between the mean of the images in the sample and the covariate. Because the fake covariate is unassociated with the outcome, homoskedasticity is satisfied (the variance of \(E^b_i(v)\) is unassociated with the covariate), but \(E^b_i(v)\) still has a realistic covariance structure of the original data. Our goal is to understand the how the different resampling methods perform under this realistic setting of heteroskedasticity.
We use 500 bootstrap-based simulations using 682 Voxel-based morphometry (VBM) outcome images from the NKI-RS to evaluate the methods described above in terms of their finite sample performance in two different scenarios for the three different procedures described above. % (Table \(\ref{tab:simulationSetup}\)). This low number of simulations is presented with 90% probability intervals because of the computational demand of performing 500 bootstraps for each sample size, resampling method, and test statistic value. We consider three sample sizes \(n\in \{ 25, 100, 200\}\) and all other simulation settings are as described in Section \(\ref{sec:syntheticSimulation}\).