A1: GTEx v8 MASHR-based models are parsimonious and exhibit the greatest power. These are the best option. However, they require GWAS preprocessing on older GWAS as detailed here.
A2: GTEx v8 UTMOST models are a robust but less effective option. They are based on HapMap snps and as such have a good overlap with most GWAS' snp sets.
A3: If your GWAS population is non-European, you might be interested in population-specific models like the MESA models.
A: 1KG and HapMap are the labels for the set of SNPs which were used to train the prediction models. As you probably have guessed, 1KG refers to the 1000 Genomes SNP set and the HapMap refers to the HapMap SNP set. Careful consideration of the SNPs in your genotype data should help you decide which of the models to use. The 1000 Genomes models make use of many more SNPs, so we believe these models to have greater prediction accuracy and some preliminary investigation into out-of-sample data suggests this to be the case. If you are working with older data though, or you would have to impute a significant amount of your genotype data to achieve more complete coverage of the models, we recommend using the HapMap models. For most use cases, the HapMap models are a good starting point. Bear in mind that v6 models are deprecated
Click on the following links to download the complete SNP annotation files we used to train the different models. These are tab-delimited text files with info on chromosome, position and allele info in addition to the rsid numbers.
1000 Genomes (150MB)
Alternatively, you can query the sqlite database to get a list of all the SNPs it uses for prediction. See here for a python function to create a file containing all SNPs in a database.
The older Elastic Net pipeline for v6 models is available here. It is deprecated and exists solely for reference purposes.