SNP Annotation Tool - User Guide

User Guide

Data Source
Input Format

Genome Assembly
Single Query

Genomic Position
Chromosomal Region
dbSNP rs#

Batch Query

Tab-delimited Text
Variant Call Format (VCF) File

Output Format

Genomic Mapping
Gene/Protein Consequences
Effect on Protein Function
Population Data
Regulatory Elements
Conservation
Phenotype & Disease Association
Structural Variations
Non-coding Variation Scoring
Pathway Analysis
Biological/Clinical Interpretation

Filtering Results

Data Source

Currently SNPnexus supports the two most recent human genome assemblies: GRCh37/hg19 and GRCh38/hg38. The underlying database gathers data from different sources however the main sources are UCSC and Ensembl. You can download a table with links to the original sources here.

The table below describes all the data sources for this SNPnexus release:

Category		GRCh37/hg19		GRCh38/hg38
Category		Source	Update time	Source	Update time
Known SNP information		Ensembl Variation 95;dbSNP 151	Jan 2019	Ensembl Variation 95;dbSNP 151	Jan 2019
Gene Annotation	RefSeq	UCSC hg19	Nov 2018	UCSC hg38	Mar 2019
	Ensembl	Ensembl 95	Jan 2019	Ensembl 95	Jan 2019
	Acembly	UCSC hg19	May 2011
	Vega	Vega 43;UCSC hg19	Oct 2010
	UCSC	UCSC hg19	Jun 2013	UCSC hg38	Nov 2018
	CCDS	UCSC hg19	Nov 2018	UCSC hg38	Mar 2019
	H-inv	UCSC hg19	Apr 2010
Protein Effect	SIFT	SIFT (Ensembl Variation 95)	Jan 2019	SIFT (Ensembl Variation 95)	Jan 2019
Protein Effect	PolyPhen	PolyPhen-2 (Ensembl Variation 95)	Jan 2019	PolyPhen-2 (Ensembl Variation 95)	Jan 2019
Population Data	HapMap	HapMap (Ensembl Variation 95)	Dec 2018	HapMap (Ensembl Variation 95)	Nov 2018
	1000 Genomes	1000 Genomes (Ensembl Variation 95)	Dec 2018	1000 Genomes (Ensembl Variation 95)	Nov 2018
	gnomAD Exome Data	Ensembl Variation Genotype (gnomad v2.1)	Mar 2019	Ensembl Variation Genotype (gnomad v2.1)	Mar 2019
	gnomAD Genome Data	Ensembl Variation Genotype (gnomad v2.1)	Mar 2019	Ensembl Variation Genotype (gnomad v2.1)	Mar 2019
Regulatory Elements	TFBS	UCSC hg19	May 2011
	miRBASE	v.20	Mar 2014	v.22.1	Mar 2018
	Vista	UCSC hg19	Dec 2010
	CpG Islands	UCSC hg19	Apr 2009	UCSC hg38	Mar 2019
	TargetScan	UCSC hg19	Dec 2010
	TarBase miRNA	Ensembl Variation 95	Dec 2018	Ensembl Variation 95	Dec 2018
	Other RNAs	UCSC hg19	Oct 2010	UCSC hg38	Nov 2018
	ENCODE regions	Ensembl Regulatory Building 95	Dec 2018	Ensembl Regulatory Building 95	Dec 2018
	RoadMap Epigenomics	Ensembl Regulatory Building 95	Dec 2018	Ensembl Regulatory Building 95	Dec 2018
	Ensembl Regulatory Build	Ensembl Regulatory Building 95	Dec 2018	Ensembl Regulatory Building 95	Dec 2018
Phenotype/Disease Association	GAD	UCSC hg19	Feb 2014
	COSMIC	Version 90	Sep 2019	Version 90	Sep 2019
	GWAS	UCSC hg19	Oct 2020	UCSC hg38	Oct 2020
	ClinVar	NCBI hg19	Mar 2020	NCBI hg38	Mar 2020
Conserved Elements	PhastConsElements	UCSC hg19	Apr 2014	UCSC hg38	Sep 2015
Conserved Elements	GERP++	GERP	May 2011	Ensembl Compara 95	Nov 2018
Structural Variations		UCSC hg19	Sep 2016	UCSC hg38	Sep 2016
Non-coding Variation Scoring	CADD		v1.4		v1.5
	fitCons		v1.01
	EIGEN		v1.1
	FATHMM		v2.3
	GWAVA		v1.0
	DeepSEA		v0.94
	FunSeq2		v2.1.6		v2.1.6
	ReMM		v0.3.1
Reactome Pathways			Aug 2019		Aug 2019
Cancer Genome Interpreter			v1.0.3		v1.0.3 (Using liftOver)

Input Format

SNPnexus currently accepts query input data in three different forms (genomic position, chromosomal region or dbSNP id) and two different human genome assemblies. Users can annotate a single SNP, insertion/deletion (InDel) or block substitution by selecting one of the input formats and supplying the required data into the graphical interface. It also allows users to run batch queries by uploading the appropriately formatted input file or pasting the queries into the interface. The formats are explained in more details below.

Human Genome Assembly

Users should first select the genome assembly (hg19 or hg38) in the user interface. The query input form and the set of annotations available for each assembly will be displayed after the user selects the relevant assembly.

Single Query

Genomic Position

Users can annotate a newly discovered variant by providing the following data into the interface: type (Chromosome/Contig/Clone), name, relative position, reference nucleotide/s (Allele1), observed nucleotide/s (Allele2), positive (1) or negative (-1) strand. One-based coordinate system is used to describe genomic position. Multi-allelic variations are supported where users can provide "/" separated alleles in the Allele2 field.

Insertions and Deletions (InDels) and Block Substitutions. The tool supports insertions and deletions by using "-" as the placeholder. Users need to insert Allele1=- to indicate Allele2 insertion in the corresponding genomic position. Similarly, Allele2=- can be used to denote deletion of Allele1 from the given genomic position. Similar to single nucleotide substitution, the tool also supports block substitution when the user provides Allele1 and Allele2 data of same or different length.

Here are few examples on hg38 assembly:

Type	Id	Position	Alelle1	Allele2	Strand
Chromosome	1	942451	T	C	1
Chromosome	3	9810376	-	GAT	1
Chromosome	7	25226951	TA	GTT	1
Contig	GL000006.2	21916451	A	G/T	1
Clone	AL606500.8	119473	GCT	-	1

Note that, the tool supports multiple nucleotides in place of Allele1 and Allele2. However, for practical reasons, users are not encouraged to provide very large blocks that can possibly positioned over more than one adjacent functional regions, i.e., adjacent intronic and exonic region, in which case the predicted functionality of the SNP provided by our tool will be based on the first functional region.

IUPAC code submission. Finally, users can annotate reference and observed nucleotides complying with IUPAC nucleotide nomenclature to denote ambiguous nucleotides in certain position following the translation table shown below:

IUPAC Code	Meaning
G	G
A	A
T	T
C	C
R	G or A
Y	T or C
M	A or C
K	G or T
S	G or C
W	A or T
H	A or C or T
B	G or T or C
V	G or C or A
D	G or A or T
N	G or A or T or C

Here are few examples:

Type	Id	Position	Alelle1	Allele2	Strand	#Comment
Chromosome	1	100002626	A	S	1	# G or C substitution with A
Chromosome	3	9798773	-	R	1	# G or A insertion

Chromosomal Region

Users can query for known SNPs in a given chromosomal region (up to 1Mb) by providing the following data: Chromosome, start position, end position. The tool will identify and annotate all the known SNPs defined in the selected region. Here are few examples on hg38 assembly:

Chromosome	Start	End
3	9798000	9799000
1	100000000	100050000

dbSNP rs#

Users can also query for known SNPs by providing the corresponding dbSNP rs identifiers. Here are few examples of dbSNP rs#:

dbSNP rs#

rs293794

rs1052133

rs3136820

rs2272615

rs2953993

rs1799782

rs25487

rs2248690

rs4918

rs1071592

Note that, depending on the genome assembly, the functional annotation for a given SNP can be quite different. Users are therefore requested to take caution regarding the choice of genome assembly.

Batch Query

SNPnexus allows users to submit batch query when dealing with large numbers of variations. Users can either paste the variants list directly into the designed text space or upload a file containing the queries. Currently we limit the maximum number of variants in a single batch query to 10,000. SNPnexus supports uploading files in the formats specified below (Tab-delimited or VCF). A size limit of 20 Mb is impossed but for larger files, the user can uploaded them compressed using the Zip or Gzip formats.

Tab-delimited Text

We only allow batch query using genomic position and/or dbsnp rs# formats. No chromosomal region query data is allowed. Each variant must be on a new line with tab-delimited data in one of the following formats:

< Type    Name    Position    Allele1    Allele2    Strand >         # Genomic position data for novel SNPs
< "dbsnp"    rs# >                                                   # dbSNP rs number for known SNPs

Example of a batch query is shown below, which one can paste directly into the textarea provided in the interface:

Chromosome  1   100002626   A   T   1
Contig  NT_023736   2025395 A   T   1
Clone   AC105270    154799  A   T   1
dbsnp   rs293794
dbsnp   rs1052133

Alternatively, users can upload batch query files (.txt) like this example. Note that, known SNPs must be preceded by keyword "dbsnp" to be recognized as dbSNP rs#.

Variant Call Format (VCF) File

Variant Call Format (VCF) is a flexible and extendable standard format for variation data. SNPnexus allows users to upload VCF files (.vcf), containing SNPs,InDels and Block substitutions, directly onto the server. An example input VCF file is shown below:

##fileformat=VCFv4.1
##fileDate=20121001
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr3	9798773	rs1052133	C	G	.	.	.
chr1	114377568	.	A	G,T	.	.	.
chr3	9791667	.	AGA	-	.	.	.
chr16	50763779	.	-	C	.	.	.
chr20	1230237	.	T	.	.	.	.
chr20	1234567	.	GTC	G	.	.	.
chr20	1234568	.	T	TA	.	.	.

This example shows in order a simple SNP, a variant at which two alternate alleles are called, a deletion of 3 bases (AGA), an insertion of one base (C), a monomorphic reference with no alternate alleles which will eventually be ignored by SNPnexus, a deletion of 2 bases (TC), and an insertion of one base (A).

A VCF file should contain 8 fixed, mandatory columns as shown by third header lines in the example. SNPnexus only uses genomic positions (CHROM,POS fields) and allele information (REF, ALT fields) from the input; the other information contained in the input file will be ignored and have no effect on the SNPnexus annotated outcome. Like the standard SNPnexus input format, the NULL values for insertion and deletion can be presented by '-'. The missing values in the VCF file is presented by '.'. SNPnexus will ignore the input line if missing values occur in any of the CHROM,POS,REF and ALT fields. Please consult here to know detail about the format.

Output Format

Genomic Mapping and other information

The table containing genomic annotations has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
ID: Genomic Position ID <chromosome/contig/clone id,":",position,":","allele",":",strand>
dbSNP: link to dbSNP, if known
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele (IUPAC): Observed allele
Minor Allele: Minor allele observed in global population, if known
Minor Allele Frequency: Minor allele frequency observed in global population, if known
Contig: Variant mapped contig location
contigPosition: Variant start position on contig
Band: SNP cytogenetic location

The table containing information on overlapped or nearest genes has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
Overlapped Gene: Name of the gene (HGNC system) to which the variant is overlapped
Type: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Annotation: Summary of whether the variant overlapped with the coding, intronic or untranslated regions of the various transcript isoforms of the gene, as annotated from Ensembl gene system.
Nearest Upstream Gene: If variant is not overlapped with any gene, then the gene whose end position is nearest to the variant on the left (considering the alignment of genes on the positive strand as left-to-right)
Type of Nearest Upstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Distance to Nearest Upstream Gene: distance from the end position of the nearest upstream gene.
Nearest Downstream Gene: If variant is not overlapped with any gene, then the gene whose start position is nearest to the variant on the right (considering the alignment of genes on the positive strand as left-to-right)
Type of Nearest Downstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Distance to Nearest Downstream Gene: distance from the start position of the nearest downstream gene.

Gene/Protein Consequences

The result table containing gene/protein consequences on a particular gene annotation system may have following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Variant: Examined alleles <reference allele,"|", observed allele(s) >. For Insertion, reference allele is "-". For other cases, reference allele is the allele found in reference genome sequence. Observed allele(s) can be multi-allelic separated by "|" depending on the input Allele2. If input Allele1 does not match with reference allele, then Allele1 becomes the first observed allele.
Strand: On which strand the variant is observed (1 or -1)
Symbol: Gene symbol
Gene: Gene name in the corresponding annotation system
Transcript: Transcript name in the corresponding annotation system
Entrez Gene: Entrez gene id
Predicted Function: Predicted function of the SNP/InDel/block substitution based on its location on the transcript. The result is based on the first nucleotide position of the variation. Possible categories: coding, intronic, intronic (splice_site), 5utr, 3utr, 5upstream, 3downstream, non-coding, non-coding intronic, non-coding intronic (splice_site). More detailed information on the predicted function is available on the "Note" column.
CDNA Position: SNP position on cdna, if the predicted function is coding, 3'UTR or 5'UTR
CDS Position: SNP position on cds, if the predicted function is coding
AA Position: Position of the first amino acid (possibly) effected in the resultant peptide chain, if the predicted function is coding
AA Change: Peptide <reference amino acid(s),">", observed amino acid(s)_1 [,"|", observed amino acid(s)_2, ... ] >
Detail (previously Note column): Detailed functional type for the variation. If the variation occurs over a single coding exon of a transcript, the type of the consequences on the corresponding protein is given. Possible values: syn (synonymous), nonsyn (non-synonymous) [stop-gain or stop-loss], frameshift [stop-gain or stop-loss], pepshift (peptide shift, block substitution). Preceded by "*", if the reference protein is found incomplete (missing stop-codon).
However, if the variation occurs over more than one functional regions on the transcript, the corresponding regions are given separated by "-".
Splice Distance: Distance to splice junction, if the predicted function is intronic
Proteins: Reference and observed peptide sequences separated by "|", if the predicted function is coding. Available only in the downloadable text files.

Effect on Protein function

The SIFT result table containing the predicted effect on protein has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
SNP: SNP name
Variant: <reference allele,"/",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Gene: Gene name
AA Position: Position of the amino acid affected in the resultant peptide chain
Wild AA: Reference amino acid
Mutant AA: Observed amino acid
Score: SIFT prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: SIFT predicted effect on protein based on the score and SIFT median. Possible values: Deleterious (score <= 0.5 and median > 3.25); Deleterious - Low Confidence (score <= 0.5 and median <= 3.25); Tolerated (score > 0.5 and median > 3.25); Tolerated - Low Confidence (score > 0.5 and median <= 3.25).

The PolyPhen result table containing the predicted effect on protein has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
SNP: SNP name
Variant: <reference allele,"/",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Gene: Protein name in the Ensembl gene annotation system
AA Position: Position of the amino acid affected in the resultant peptide chain
Wild AA: Reference amino acid
Mutant AA: Observed amino acid
Score: PolyPhen prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: PolyPhen predicted effect on protein based on the score. Possible values: Probably Damaging (score > 0.908), Possibly Damaging (0.446 < score <= 0.908), Benign (score <= 0.446).

Population Data

The result table containing the specific Hapmap population data has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Observed allele
ASW Frequency: Percentage of observed samples with the allele in population with African Ancestry in SouthWestern US
CEU Frequency: Percentage of observed samples with the allele in population with Northern and Western Europe Ancenstry from Utah residents
CHB Frequency: Percentage of observed samples with the allele in the Han Chinese in Beijing, China from HapMap phase 3 population
CHD Frequency: Percentage of observed samples with the allele in population with Chinese Ancestry in Metropolitan Denver, US
GIH Frequency: Percentage of observed samples with the allele in the Gujarati Indians in Houston, Texas population
HCB Frequency: Percentage of observed samples with the allele in population from Unrelated Han Chinese in Beijing, China from the International HapMap project
JPT Frequency: Percentage of observed samples with the allele in the Japanese in Tokyo, Japan population
LWK Frequency: Percentage of observed samples with the allele in the Luhya in Webuye, Kenya population
MEX Frequency: Percentage of observed samples with the allele in population with Mexican Ancestery in Los Angeles, US
MKK Frequency: Percentage of observed samples with the allele in the Masai in Kinyawa, Kenya (MKK) population
TSI Frequency: Percentage of observed samples with the allele in the Toscani in Italia population
YRI Frequency: Percentage of observed samples with the allele in the Yoruba in Ibadan, Nigeria population

The result table containing the specific 1000 Genomes Super Population data has following columns:

The result table containing the specific exome gnomAD Population data has following columns:

The result table containing the specific genome gnomAD Population data has following columns:

Regulatory Elements

The Transcription Factor Binding Sites (TFBS) result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
TFBS Name: Transcription factor name
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
TFBS Accession: TFBS accession number. Note that, browsing the link provided in the html and excel file requires free registration with TRANSFAC website.
Species: Transcription factor species
SwissProt Accession: SwissProt accession number

The miRBASE result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
TFBS ID: TFBS id
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Name: microRNA name
Accession: miRBASE accession number
Strand: + or -
Type / Description: miRNA type. Possible values: mature miRNA, miRNA_primary_transcript

The Vista Enhancer prediction result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Vista Item: Name of the Vista element
Score: Prediction score. Possible values: 900 (Positive-enhancer), 200 (Negative-enhancer)

The CpG Island prediction result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
CpG Island: Name of the CpG Island
Length: Island Length
Cpg%: Percentage of island that is CpG
C/G%: Percentage of island that is C or G
Ratio: Ratio of observed to expected CpG in island

The TargetScan miRNA regulatory sites result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Item_Name: Name of the predicted target site
Score: Prediction scores by TargetScanS. Possible values: 0 to 1000
Strand: + or -

The TargetBase (TarBase) miRNA target sites result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Strand: + or -
miRNA: miRNA targeting the site
Accession: miRBASE accession number
Gene: Gene name

The miRNAs/snoRNAs/scaRNAs result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Name: Name of the miRNA/snoRNA/scaRNa
Score: Prediction scores. Possible values: 0 to 1000
Strand: + or -
Type: Type of RNA

The ENCODE and Roadmap Epigenomics result tables has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Feature Type Class: Regulatory feature class
Feature Type: Regulatory feature name
Epigenome: Epigenome or cell name

The Ensembl Regulatory Build result table has following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Feature Type Class: Regulatory feature class
Epigenome: Epigenome or cell name
Activity: State of activity (hg38)

Conservation

The Vertebrate Alignment and Conservation (Phast) result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Id: Name of the aligned element
Score: Estimated probability score for conservation as determined from PHAST package. Possible values: 0 to 1000

The Genomic Evolutionary Rate Profiling (GERP++) result table contains the following information:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the TFBS site in the chromosome
Region End: End position of the TFBS site in the chromosome
Element RS Score: Rejected Substitutions score for the conserved element as determined from GERP++ package.
Base RS Score: Rejected Substitutions score calculated per base as determined from GERP++ package.

Phenotype & Disease Association

The Genetic Association Database (GAD) result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
GAD ID: GAD id
Association: Confirmed association
Phenotype: Phenotype description
Disease_Class: Type of disease
Gene: Gene name
Reference: Reference of publication of the study
PubMed: Pubmed id of publication of the study
Associated SNPs: SNPs associated with the disease as reported in the study
p-Value: Statistical significance of the association study
Population: Sample population
Entrez gene: Entrez gene id

The COSMIC result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Mutation Id: Cosmic mutation id
Sample: Cosmic sample id
Site: Primary Effected site
Histology: Primary Histology
Histology Subtype : Subtype of primary histology
Symbol: Gene symbol
Pubmed: Pubmed id of publication of the study

The GWAS catalogue result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Catalogue ID: ID of SNP associated with trait
Region: Chromosome band/region of SNP
Genes: Reported Gene(s)
Allele_frequency: Risk Allele Frequency
Trait: Disease or trait assessed in study
Population: Initial sample population for the study
Platform: Platform and [SNPs passing Quality Control]
Pubmed: Pubmed id of publication of the study

The ClinVar result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
Position: Variant start position on chromosome
Variation: Reference to Observed Allele
Type: Type of Variant
Clinical Significance: Whether identified as Pathogenic or Benign or uncertain
Phenotypes: List of phenotypes associated with the variant

Structural Variations

The structural variations result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Chrom Start: Start position of the structural variation in the chromosome
Chrom End: End position of the structural variation in the chromosome
Type: Type of structural variation
Reference: Literature reference for the study that included this variant
PubMed: Pubmed id of publication of the study
Method: Brief description of method/platform
Sample: Description of sample population for the study
Gain: Copy number gains
Loss: Copy number losses

Non-coding Variation Scoring

The CADD result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Position: Variant start position on chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
Raw Score: "Raw" unaltered CADD-score for the variation. It has relative meaning, with higher values indicating that a variant is more likely to be simulated (or "not observed") and therefore more likely to have deleterious effects. <
PHRED: PHRED-like (-10*log₁₀(rank/total)) scaled CADD-score ranking a variant relative to all possible substitutions of the human genome. A score≥10 indicates that it is predicted to be in the 10% most deleterious substitutions that you can do to the human genome, a score≥20 indicates the 1% most deleterious and so on.

The FitCons result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Region Start: Start position of the non-coding region
Region End: End position of the non-coding region
Fitness Score: In the range [0-1]. Relative indicator of the potential for interesting genomic function, with higher scores indicating more potential. The range .05 to .35 may be most appealing as nearly all non-coding classes have scores in this range, while nearly all coding classes have scores>.40
P-val: P-val indicating the statistical significance of the Fitness Score.

The EIGEN result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
Score: Aggregate functional score for variants of interest (Eigen Score). With genome-wide median score of ~0, higher score indicates more likelihood of the variant to be functional.
PC Score: An alternative score which is more sensitive than Eigen score, particularly useful for the noncoding variants. With genome-wide median score of ~0, higher score indicates more likelihood of the variant to be functional.

The FATHMM result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
Non-coding Score: Given as p-values in the range [0, 1]. Scores above 0.5 are predicted to be deleterious, while those below 0.5 are predicted to be neutral or benign. Scores close to the extremes (0 or 1) are the highest-confidence predictions that yield the highest accuracy.
Non-coding Groups: Annotation features used for the prediction score. Maximum 4 features are used labelled between A and D. See publication for more details.
Coding Score: Same as non-coding score.
Coding Groups: Annotation features used for the prediction score. Maximum 10 features are used labelled between A and J. See publication for more details.

The GWAVA result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Known SNP: Known SNP description as reported in the tool's genome-wide score
Region Score, TSS Score, Unmatched Score: prediction scores from 3 different versions of the classifier, which are all in the range [0-1] with higher scores indicating variants predicted as more likely to be functional. See publication for more details.

The DeepSEA result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
eQTL Probability: The probability of the variant being a eQTL variant given by functional variant prioritization classifier.
GWAS Probability: The probability of the variant being a trait-associated (GWAS) variant given by functional variant prioritization classifier.
HGMD Probability: The probability of the variant being a inherited disease-associated (HGMD) variant given by functional variant prioritization classifier.
Functional Significance Score: A measure in the range [0-1] depicting the significance of magnitude of predicted chromatin effect and evolutionary conservation. Lower score indicates higher likelihood of functional significance of the variant.

The funSeq2 result table contains the following columns:

The ReMM result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Position: Variant start position in the chromosome
ReMM Score: Potential of the chromosome position in the non-coding region to cause a Mendelian disease if mutated. Given as p-values in the range [0, 1], with higher scores indicating variants predicted as more likely to be deleterious.

Pathway Analysis

The Reactome Pathways result table contains the following columns:

Pathway ID: Link to Reactome Pathway
Description: Pathway description
Parent(s): Immediate parents of the Pathway
p-Value: Statistical significance of the Pathway calculated using the Fisher's Exact Test for all the genes involved in the original queryset
Genes Involved: Genes from the original queryset involved in the Pathway
Variation IDs: Variation in the original query affecting the genes involved in the Pathway. Available only in the downloadable text file.

Biological/Clinical Interpretation

The Cancer Genome Interpreter result table contains the following columns:

Variation ID: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Chromosome name
Gene: Gene name
Transcript: Transcript name in the Ensembl annotation system
Protein Change: Protein alteration in the format "Wild AA" "AA Position" "Mutant AA"
Consequence: Consequence type
Domain: Domain where the mutation is located (Pfam)
Oncogenic classification: Oncogenic potential of the mutation
Location: Location of the mutation in relation to the last exon of the gene
Tumour Driver: The gene has been identified as a driver of cancer
Role: Mechanism of Action of the driver gene
In cluster: Mutation falls in a cluster of somatic mutations of the gene (oncodriveCLUST)
Delicate Domain: The domain where the mutation is located is depleted for variants in the general population
Deleterious (CADD Score): Deleteriousness score using CADD

The Cancer Genome Interpreter - Biomarkers result table contains the following columns:

Gene: Gene name
Observed alteration: Alterations observed in genes described to affect the response to a drug
Biomarker: Genetic alteration described as a biomarker of response to a drug
Drug: Drugs influenced by the biomarker
Effect: How the presence of the biomarker affects the response to the drug
Tumour type: The tumour type in which the biomarker has been described
Evidence level: The level of evidence that supports the described biomarker
Biomarker match: Match between the observed alteration and the biomarker
Source: Source supporting the biomarker

The Cancer Genome Interpreter - Bioactivities result table contains the following columns:

Gene: Gene name
Observed alteration: Alterations observed in genes described to affect the response to a drug
Compound: Compounds interacting with the gene bearing driver alterations
Binding potential: Potency of binding between the compound and the altered gene label
MOA: Compound mechanism of action
Match: True if the mechanism of action of the drug is coherent with the role of the gene in cancer

Filtering Results

Once the results are completed, and depending on the set of annotations originally selected by the user, SNPnexus supports performing a set of filtering on the results:

Filtering by Type of Variant: The user can select to show only variants that map to a known dbSNP, show only novel variants, or both.
Filtering by MAF Global Threshold: Only show variants with a Global Allele Frequency lower than the threshold set by the user.
Filtering by Gene(s): Only show variants that overlap the specified genes.
Filtering by Genomic Consequence: Only show variants with specific genomic consequence. Options available are Coding Non-Synonymous, Coding Synonymous, UTR and Intronic.
Filtering by Predicted Effect: Filter variants based on the predicted protein consequence based on the SIFT and PolyPhen predictions. Options available are Benign and Damaging. This option is only available if SIFT or PolyPhen was an annotation selected by the user for the input query.
Filtering by Conserved Region: Filter variants that lay within or outside a conserved region. This option is only available if Phast was an annotation selected by the user for the input query.
Filtering by Phenotype Association: Filter variants that have a known or unknown phenotypic association based on COSMIC and ClinVar data. This option is only available if COSMIC or ClinVar were selected by the user for the input query.
Filtering by Pathway: Only show variants related to specific pathways. This option is only available if Reactome Pathway was selected by the user for the input query.
Filtering by Predicted Cancer Driver: Show variants based on their oncogenic classification: Known or predicted cancer driver; or Polymorphism found at a major allele frequency higher than 1% across the population.