2 Methods

2.1 Data and data availability

2.1.1 Preprocessed Autism Brain Imaging Data Exchange

We downloaded fully processed amplitude of low frequency fluctuations (ALFF) (Zang et al. 2007) from resting state functional magnetic resonance imaging scans for the ABIDE data set from the preprocessed connectomes project REF (). The ABIDE is a collaboration of 16 international sites that have aggregated and openly share neuroimaging data from 1,112 scanning sessions, including 539 individuals with autism spectrum disorder (Di Martino et al. 2014). A total of nine subjects were excluded due to poor image quality yielding a study mask with 30,272 voxels. This minimal data quality screening was performed in order to maximize the data available for the simulation analyses presented in prior work (S. N. Vandekar and Stephens 2021). These data are available for download through the open connectome website and code to perform the download is available below (Section XX).

2.1.2 Twenty-one pain studies

We will use the pbj package to perform a meta-analysis of statistical maps from 21 pain studies (Maumet et al. 2016; Gorgolewski et al. 2015). These data are available in the pain21 R package that is available for download from Neuroconductor (Muschelli et al. 2018) using the command source("https://neuroconductor.org/neurocLite.R"); neuro_install('pain21').

2.1.3 Nathan-Kline Institute Rockland Sample

Details of data processing and quality assurance are provided in the Supplement. Briefly, prior to processing, 695 people with demographics and T1 images were visually inspected for quality. A total of 14 scans were excluded because they failed processing or had significant motion or artifacts. VBM was conducted in the Computational Anatomy Toolbox 12 (CAT12: Version 12.5) in Statistical Parametric Mapping 12 (Ashburner and Friston 2005; Ashburner 2009). After estimation the VBM images were registered to the MNI template and downsampled to 2mm isotropic resolution [ashburner_fast_2007]. To further speed computation for the simulations, we restrict the study region to 11 axial slices in the center of the image, which included 51,925 voxels.

2.2 PBJ analysis methods

2.2.1 Model estimation

2.2.2 Statistical inference

2.2.2.1 Selecting effect size CFTs

2.3 Evaluation under the global null hypothesis

We use two approaches for assessing the accuracy of the resampling methods when the global null hypothesis is true. First we assess the type 1 error rate at thresholds for \(\alpha \in \{0.01, 0.05, 0.1, 0.25\}\). This metric is useful to assess how appropriate the resampling method is for performing hypothesis testing. For the marginal distributions, we apply BH procedure to the uncorrected p-values and compare the estimated and expected type 1 error rate for each threshold. This approach assesses the weak control of the procedure, since the null is true everywhere (Lehmann and Romano 2005). For the global distributions we report the type 1 error rate of the adjusted p-values found by comparing the minimum adjusted p-value across all observed maxima or clusters in each simulation to the \(\alpha\) level.

2.3.1 Null synthetic simulations

We simulated data under known heteroskedasticity to evaluate how the different resampling inference procedures perform in reproducing the distributions of TFs under the null. All resampling methods had near nominal type 1 error rates under homoskedasticity for parametric and robust test statistics, with the permutation and Rademacher bootstrap methods having slightly better accuracy than the normal bootstrap ({}). Under heteroskedasticity the permutation procedure had inflated error rates for parametric statistics, but not for robust test statistics. The two bootstrap methods had near nominal type 1 error rates. Simply using a robust covariance estimator resolves the problem with the permutation inference procedure because the procedure is scaling out the differences in the variances across subjects in each permutation by computing the square root covariance matrix as a function of the permuted data \(\eqref{eq:permutationSampleExchangeability}\).

2.3.2 Null bootstrap simulations

We use the VBM images form the NKI-RS dataset to generate realistic data under the null: first, we regress out the effects of covariates in the full sample of 682 subjects with complete scans using the model \[\begin{equation} \label{eq:simModel} Y_{i}(v) = \beta_0(v) + \beta_1(v)\text{sex}_i + \beta_2(v)\text{race}_{i1} +\beta_3(v)\text{race}_{i2} + f(v, \text{age}_i) + E_i(v), \end{equation}\] where \(\text{sex}_{i}\) and \(\text{race}_{ij}\) were indicator variables identifying sex and race (Black, Other, White), and \(f\) was a nonlinear function fit with unpenalized natural cubic splines with 10 degrees of freedom using the ns package in R. In each simulation we draw a bootstrap sample \(Y^b_i(v)\) from the image residuals, \(E_i(v)\), and covariates of model \(\eqref{eq:simModel}\). Here, we consider one scenario for testing the association between a covariate and the imaging outcome, and two other types of covariates are described and presented in the Supplementary Material. In each bootstrap, we fit the model \[ Y^b_i(v) = \beta_0(v) + \beta_1(v)\text{sex}^b_i + \beta_2(v) \text{age}^b_i + f(\text{fake}_i, v) + E^b_i(v), \] where \(f\) is estimated with unpenalized natural cubic splines on 4 degrees of freedom and tested it against a model with a linear effect of the artificially generated fake covariate yielding a test statistic with 3 degrees of freedom.

Because we residualized the images to the covariates in the full sample, there is no association between the mean of the images in the sample and the covariate. Because the fake covariate is unassociated with the outcome, homoskedasticity is satisfied (the variance of \(E^b_i(v)\) is unassociated with the covariate), but \(E^b_i(v)\) still has a realistic covariance structure of the original data. Our goal is to understand the how the different resampling methods perform under this realistic setting of heteroskedasticity.

We use 500 bootstrap-based simulations using 682 Voxel-based morphometry (VBM) outcome images from the NKI-RS to evaluate the methods described above in terms of their finite sample performance in two different scenarios for the three different procedures described above. % (Table \(\ref{tab:simulationSetup}\)). This low number of simulations is presented with 90% probability intervals because of the computational demand of performing 500 bootstraps for each sample size, resampling method, and test statistic value. We consider three sample sizes \(n\in \{ 25, 100, 200\}\) and all other simulation settings are as described in Section \(\ref{sec:syntheticSimulation}\).

2.4 Evaluation under an alternative hypothesis

2.5 Synthetic simulations

2.6 Bootstrap simulations

References

———. 2009. “Computational Anatomy with the SPM Software.” Magnetic Resonance Imaging 27 (8): 1163–74.

Ashburner, John, and Karl J. Friston. 2005. “Unified Segmentation.” NeuroImage 26 (3): 839–51. https://doi.org/10.1016/j.neuroimage.2005.02.018.

Di Martino, A., C. G. Yan, Q. Li, E. Denio, F. X. Castellanos, K. Alaerts, J. S. Anderson, et al. 2014. “The Autism Brain Imaging Data Exchange: Towards a Large-Scale Evaluation of the Intrinsic Brain Architecture in Autism.” Molecular Psychiatry 19 (6): 659–67. https://doi.org/10.1038/mp.2013.78.

Gorgolewski, Krzysztof J., Gael Varoquaux, Gabriel Rivera, Yannick Schwarz, Satrajit S. Ghosh, Camille Maumet, Vanessa V. Sochat, et al. 2015. “NeuroVault.org: A Web-Based Repository for Collecting and Sharing Unthresholded Statistical Maps of the Human Brain.” Frontiers in Neuroinformatics 9. https://doi.org/10.3389/fninf.2015.00008.

Lehmann, Erich L., and Joseph P. Romano. 2005. Testing Statistical Hypotheses. 3rd ed. Springer Texts in Statistics. New York: Springer-Verlag. https://doi.org/10.1007/0-387-27605-X.

Maumet, Camille, Tibor Auer, Alexander Bowring, Gang Chen, Samir Das, Guillaume Flandin, Satrajit Ghosh, et al. 2016. “Sharing Brain Mapping Statistical Results with the Neuroimaging Data Model.” Scientific Data 3 (December). https://doi.org/10.1038/sdata.2016.102.

Muschelli, John, Adrian Gherman, Jean-Philippe Fortin, Brian Avants, Brandon Whitcher, Jonathan D. Clayden, Brian S. Caffo, and Ciprian M. Crainiceanu. 2018. “Neuroconductor: An R Platform for Medical Imaging Analysis.” Biostatistics. https://doi.org/10.1093/biostatistics/kxx068.

Vandekar, Simon N., and Jeremy Stephens. 2021. “Improving the Replicability of Neuroimaging Findings by Thresholding Effect Sizes Instead of p-Values.” Human Brain Mapping 42 (8): 2393–98. https://doi.org/10.1002/hbm.25374.

Zang, Yu-Feng, He Yong, Zhu Chao-Zhe, Cao Qing-Jiu, Sui Man-Qiu, Liang Meng, Tian Li-Xia, Jiang Tian-Zi, and Wang Yu-Feng. 2007. “Altered Baseline Brain Activity in Children with ADHD Revealed by Resting-State Functional MRI.” Brain and Development 29 (2): 83–91.