The MetaXcan Software hosts a suite of tools i.e PrediXcan, SPrediXcan, MultiXcan and SMultiXcan. This post describes the file format output from each tool.

## PrediXcan

Individual-level data method to compute gene-trait associations. Detailed info

The output is a tab delimited file which contains Individual predicted expression on the rows and gene predicted in the columns.

The first two columns contain the FID and IID for every observation.

## Association

Gives the association between predicted expression and an outcome.(`PrediXcanAssociation.py`

)

Each output has the following columns;

`gene`

: ENSEMBLE ID or intron id`effect`

: estimated effect size`se`

: estimated effect size standard error`zscore`

: predicted association z-score`pvalue`

: association p-value`n_samples`

: number of samples used`status`

: If there was any error in the computation, it is stated here

## SPrediXcan

Runs association between the gene models and summary statistics.

Each output file is a CSV, with each row containing a gene association at a given trait-tissue combination:

`gene`

: ENSEMBLE ID or intron id`gene_name`

: HUGO name or intron id`zscore`

: predicted association z-score`effect_size`

: estimated effect size`pvalue`

: association p-value`var_g`

: estimated variance of predicted expression or splicing, calculated as W' * G * W (where W is the vector of SNP weights in a gene’s model, W' is its transpose, and G is the covariance matrix)`pred_perf_r2`

: prediction model cross-validated performance`pred_perf_pval`

: prediction model cross-validated performance`pred_perf_qval`

: deprecated, empty field left for compatibility`n_snps_used`

: number of snps in the intersection of GWAS and model`n_snps_in_cov`

: number of snps in the LD compilation`n_snps_in_model`

: number of snps in the model`best_gwas_p`

: smallest p-value acros GWAS snps used in this model`largest_weight`

: largest prediction model weight

## MultiXcan

Multi-Tissue PrediXcan, takes multiple gene expression files as input.

This script computes a gene-level association from predicted gene expression to a human trait, using multiple studies for each gene jointly. It supports adjusting for covariates. It inputs predicted expression files as generated by Predict.py

The results look like:

`gene`

: a gene’s id: as listed in the Tissue Transcriptome model. Ensemble Id for most gene model releases. Can also be a intron’s id for splicing model releases.`pvalue`

: significance p-value of MultiXcan association`n_models`

: number of models (tissues) available for this gene`n_samples`

: number of individuals available to this gene-phenotype combination (k.e. inner join of phenotype and predictions)`p_i_best`

: best p-value of single-tissue PrediXcan association.`m_i_best`

: name of best single-tissue PrediXcan association.`p_i_worst`

: worst p-value of single-tissue PrediXcan association.`m_i_worst`

: name of worst single-tissue PrediXcan association.`status`

: If there was any error in the computation, it is stated here`n_used`

: number of independent components of variation kept among the tissues' predictions. (Synthetic independent tissues)`max_eigen`

: In the PCA decomposition of predicted expression, the maximum eigenvalue.`min_eigen`

: In the PCA decomposition of predicted expression, the minimum eigenvalue.`min_eigen_kept`

: In the PCA decomposition of predicted expression, the minimum eigenvalue kept (i.e. surviving SVD)

If you specify `--loadings_output`

, you’ll get a file specify the loadings of the PC decomposition of predicted expressions for each gene:

`gene`

: Ensemble Id (or intron id) being analized`pc`

: identifier of principal component`tissue`

: tissue being analyzed`weight`

: coefficient of loading from tissues to PC

If you specify `--coefficient_output`

, you get a file with effect sizes for the tissues involved in each gene:

`param`

: effect size of the PCA-regularized regression. (i.e. effect sizes of the PC components, converted to tissue-space)`variable`

: tissue being analyzed`gene`

: ensemble ID (or intron id)

## SMultiXcan

Summary-stats based Multi-Tissue PrediXcan.

The results contain the following columns;

`gene`

: a gene’s id: as listed in the Tissue Transcriptome model.`gene_name`

: gene name as listed by the Transcriptome Model, typically HUGO for a gene. It can also be an intron’s id.`pvalue`

: significance p-value of S-MultiXcan association`n`

: number of “tissues” available for this gene`n_indep`

: number of independent components of variation kept among the tissues' predictions. (Synthetic independent tissues)`p_i_best`

: best p-value of single-tissue S-PrediXcan association.`t_i_best`

: name of best single-tissue S-PrediXcan association.`p_i_worst`

: worst p-value of single-tissue S-PrediXcan association.`t_i_worst`

: name of worst single-tissue S-PrediXcan association.`eigen_max`

: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the top independent component`eigen_min`

: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the last independent component`eigen_min_kept`

: In the SVD decomposition of predicted expression correlation matrix: eigenvalue (variance explained) of the smalles independent component that was kept.`z_min`

: minimum z-score among single-tissue S-PrediXcan associations.`z_max`

: maximum z-score among single-tissue S-PrediXcan associations.`z_mean`

: mean z-score among single-tissue S-PrediXcan associations.`z_sd`

: standard deviation of the mean z-score among single-tissue S-PrediXcan associations.`tmi`

: trace of T * T', where This correlation of predicted expression levels for different tissues multiplied by its SVD pseudo-inverse. It is an estimate for number of indepent components of variation in predicted expresison across tissues (typically close to n_indep)`status`

: If there was any error in the computation, it is stated here