VEP and LOFTEE Plugin
Hail is an open-source, general-purpose, Python-based data analysis library with additional data types and methods for working with genomic data. Hail’s been built to scale well horizontally as the workloads do, and has strong support for multi-dimensional, structured data like the genomic data in a genome-wide association study. Maintained by the Broad Institute, Hail has been widely adopted in academia and industry.
Hail can be used to annotate variants with the
vep() method, which in turn leverages a plugin called LOFTEE (Loss-Of-Function Transcript Effect Estimator). These packages (VEP and LOFTEE) are required for certain deployments of Hail on Amazon EMR, and are hosted on Amazon Web Services in S3.
Variant Effect Predictor (VEP) Cache
The Variant Effect Predictor (VEP) from Ensembl, “determines the effects of your variants (SNPs, insertions, deletions, CNVs, or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.” Using a cache is the most efficient way to leverage VEP.
vep folder in this dataset contains caches for:
- Zebrafish (Danio rerio) GRCz11
- Human (Homo sapiens) GRCh38
- Human (Homo sapiens) GRCh37
- Rat (Rattus norvegicus) Rnor_6.0
for several recent versions of VEP.
Loss-Of-Function Transcript Effect Estimator (LOFTEE)
loftee-data folder in this dataset contains optional data from the LOFTEE project for use by the Hail on Amazon EMR project. Further instructions on usage can be found in the project repository.