The MetaXcan Software hosts a suite of tools i.e PrediXcan, SPrediXcan, MultiXcan and SMultiXcan. This post describes the file format output from each tool.

PrediXcan

Individual-level data method to compute gene-trait associations. Detailed info

The output is a tab delimited file which contains Individual predicted expression on the rows and gene predicted in the columns.

The first two columns contain the FID and IID for every observation.

Association

Gives the association between predicted expression and an outcome.(PrediXcanAssociation.py)

Each output has the following columns;

  • gene: ENSEMBLE ID or intron id
  • effect: estimated effect size
  • se: estimated effect size standard error
  • zscore: predicted association z-score
  • pvalue: association p-value
  • n_samples: number of samples used
  • status: If there was any error in the computation, it is stated here

SPrediXcan

Runs association between the gene models and summary statistics.

Each output file is a CSV, with each row containing a gene association at a given trait-tissue combination:

  • gene: ENSEMBLE ID or intron id
  • gene_name: HUGO name or intron id
  • zscore: predicted association z-score
  • effect_size: estimated effect size
  • pvalue: association p-value
  • var_g: estimated variance of predicted expression or splicing, calculated as W' * G * W (where W is the vector of SNP weights in a gene’s model, W' is its transpose, and G is the covariance matrix)
  • pred_perf_r2: prediction model cross-validated performance
  • pred_perf_pval: prediction model cross-validated performance
  • pred_perf_qval: deprecated, empty field left for compatibility
  • n_snps_used: number of snps in the intersection of GWAS and model
  • n_snps_in_cov: number of snps in the LD compilation
  • n_snps_in_model: number of snps in the model
  • best_gwas_p: smallest p-value acros GWAS snps used in this model
  • largest_weight: largest prediction model weight

MultiXcan

Multi-Tissue PrediXcan, takes multiple gene expression files as input.

This script computes a gene-level association from predicted gene expression to a human trait, using multiple studies for each gene jointly. It supports adjusting for covariates. It inputs predicted expression files as generated by Predict.py

The results look like:

  • gene: a gene’s id: as listed in the Tissue Transcriptome model. Ensemble Id for most gene model releases. Can also be a intron’s id for splicing model releases.
  • pvalue: significance p-value of MultiXcan association
  • n_models: number of models (tissues) available for this gene
  • n_samples: number of individuals available to this gene-phenotype combination (k.e. inner join of phenotype and predictions)
  • p_i_best: best p-value of single-tissue PrediXcan association.
  • m_i_best: name of best single-tissue PrediXcan association.
  • p_i_worst: worst p-value of single-tissue PrediXcan association.
  • m_i_worst: name of worst single-tissue PrediXcan association.
  • status: If there was any error in the computation, it is stated here
  • n_used: number of independent components of variation kept among the tissues' predictions. (Synthetic independent tissues)
  • max_eigen: In the PCA decomposition of predicted expression, the maximum eigenvalue.
  • min_eigen: In the PCA decomposition of predicted expression, the minimum eigenvalue.
  • min_eigen_kept: In the PCA decomposition of predicted expression, the minimum eigenvalue kept (i.e. surviving SVD)

If you specify --loadings_output, you’ll get a file specify the loadings of the PC decomposition of predicted expressions for each gene:

  • gene: Ensemble Id (or intron id) being analized
  • pc: identifier of principal component
  • tissue: tissue being analyzed
  • weight: coefficient of loading from tissues to PC

If you specify --coefficient_output, you get a file with effect sizes for the tissues involved in each gene:

  • param: effect size of the PCA-regularized regression. (i.e. effect sizes of the PC components, converted to tissue-space)
  • variable: tissue being analyzed
  • gene: ensemble ID (or intron id)

SMultiXcan

Summary-stats based Multi-Tissue PrediXcan.

The results contain the following columns;

  • gene: a gene’s id: as listed in the Tissue Transcriptome model.
  • gene_name: gene name as listed by the Transcriptome Model, typically HUGO for a gene. It can also be an intron’s id.
  • pvalue: significance p-value of S-MultiXcan association
  • n: number of “tissues” available for this gene
  • n_indep: number of independent components of variation kept among the tissues' predictions. (Synthetic independent tissues)
  • p_i_best: best p-value of single-tissue S-PrediXcan association.
  • t_i_best: name of best single-tissue S-PrediXcan association.
  • p_i_worst: worst p-value of single-tissue S-PrediXcan association.
  • t_i_worst: name of worst single-tissue S-PrediXcan association.
  • eigen_max: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the top independent component
  • eigen_min: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the last independent component
  • eigen_min_kept: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the smalles independent component that was kept.
  • z_min: minimum z-score among single-tissue S-PrediXcan associations.
  • z_max: maximum z-score among single-tissue S-PrediXcan associations.
  • z_mean: mean z-score among single-tissue S-PrediXcan associations.
  • z_sd: standard deviation of the mean z-score among single-tissue S-PrediXcan associations.
  • tmi: trace of T * T', where This correlation of predicted expression levels for different tissues multiplied by its SVD pseudo-inverse. It is an estimate for number of indepent components of variation in predicted expresison across tissues (typically close to n_indep)
  • status: If there was any error in the computation, it is stated here

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

Citation

For attribution, please cite this work as

Festus (2022). MetaXcan output file formats. PredictDB. /post/2022/03/08/metaxcan-output-file-formats/

BibTeX citation

@misc{
  title = "MetaXcan output file formats",
  author = "Festus",
  year = "2022",
  journal = "PredictDB",
  note = "/post/2022/03/08/metaxcan-output-file-formats/"
}