User Guide
Currently SNPnexus supports the two most recent human genome assemblies:
1. GRCh37/hg19 (default)
2. NCBI36/hg18.
The underlying SNPnexus database is kept synchronised with the UCSC human genome annotation database. However, data for some annotation categories comes from different sources.
| Category | hg18 | hg19 | |
| Known SNP information | Ensembl Variation 54;dbSNP 129 | Ensembl Variation 63;dbSNP 132 | |
| Gene Definition | RefSeq | UCSC hg18 | UCSC hg19 |
| Ensembl | UCSC hg18;Ensembl 54 | UCSC hg19 | |
| Acembly | UCSC hg18 | UCSC hg19 | |
| Vega | UCSC hg18;Vega 34 | UCSC hg19 | |
| UCSC | UCSC hg18 | UCSC hg19 | |
| CCDS | UCSC hg18 | UCSC hg19 | |
| H-inv | UCSC hg19 | ||
| Hapmap Population | UCSC hg18 | UCSC hg19 | |
| Protein Effect | SIFT | SIFT Human DB (release 63) | |
| Regulatory Elements | miRBASE | release 18(liftover) | release 18 |
| Other | UCSC hg18 | UCSC hg19 | |
| Phenotype/Disease Association | GAD | UCSC hg18 | GAD update Oct 2011 | COSMIC | version 56 | version 56 |
| GWAS | UCSC hg18 | UCSC hg19 | |
| Conservation & Structural Variations | UCSC hg18 | UCSC hg19 | |
SNPnexus currently accepts query input data in three different forms (genomic position, chromosomal region or dbSNP id) and two different human genome assemblies. Users can annontate a single SNP, insertion/deletion (InDel) or block substitution by selecting one of the input formats and supplying the required data into the graphical interface. It also allows users to run batch queries by uploading the apprpriately formatted input file or pasting the queries into the interface. The formats are explained in more details below.
Users can annotate a newly discovered variant by providing the following data into the interface: type (Chromosome/Contig/Clone), name, relative position, reference nucleotide/s (Allele1), observed nucleotide/s (Allele2), positive (1) or negative (-1) strand. One-based coordinate system is used to describe genomic position. Here are few examples on hg18 assembly:
| Type | Id | Position | Alelle1 | Allele2 | Strand |
| Chromosome | 1 | 100002626 | A | T | 1 |
| Contig | NT_023736 | 2025395 | C | T | 1 |
| Clone | AC105270 | 154799 | A | T | 1 |
The tool has been modified to support insertions or deletions by using -
as the placeholder. Users need to insert Allele1=- to indicate Allele2 insertion in the corresponding genomic position.
Similarly, Allele2=- can be used to denote deletion of Allele1 from the given genomic position. Similar to single nucleotide substitution, the tool also supports block substitution when the user provides Allele1 and Allele2 data of same or different length.
Here are few examples for insertion and deletion on hg19 assembly:
| Type | Id | Position | Alelle1 | Allele2 | Strand | #Comment |
| Chromosome | 3 | 9798773 | C | - | 1 | # 1-nucleotide deletion |
| Chromosome | 3 | 9798773 | CCC | - | 1 | # 3-nucleotide deletion |
| Chromosome | 3 | 9798773 | - | G | 1 | # 1-nucleotide insertion |
| Chromosome | 3 | 9798773 | - | GTC | 1 | # 3-nucleotide insertion |
| Chromosome | 3 | 9798773 | CCCG | GT | 1 | # block substitution |
Note that, the tool supports multiple nucleotides in place of Allele1 and Allele2. However, for practical reasons, users are not encouraged to provide very large blocks that can possibly positioned over more than one adjacent functional regions, i.e., adjacent intronic and exonic region, in which case the predicted functionality of the SNP provided by our tool will be based on the first functional region.
Finally, users can annotate reference and observed nucleotides complying with IUPAC nucleotide nomenclature to denote ambiguous nucleotides in certain position following the translation table shown below:
| IUPAC Code | Meaning |
| G | G |
| A | A |
| T | T |
| C | C |
| R | G or A |
| Y | T or C |
| M | A or C |
| K | G or T |
| S | G or C |
| W | A or T |
| H | A or C or T |
| B | G or T or C |
| V | G or C or A |
| D | G or A or T |
| N | G or A or T or C |
Here are few examples:
| Type | Id | Position | Alelle1 | Allele2 | Strand | #Comment |
| Chromosome | 1 | 100002626 | A | S | 1 | # G or C substitution with A |
| Chromosome | 3 | 9798773 | - | R | 1 | # G or A insertion |
Users can query for known SNPs in a given chromosomal region by providing the following data: Chromosome, start position, end position. The tool will identify and annotate all the known SNPs defined in the selected region. Here are few examples on hg18 assembly:
| Chromosome | Start | End |
| 3 | 9798000 | 9799000 |
| 1 | 100000000 | 100050000 |
Currently we limit users to query for known SNPs in the genomic region of maximum size 1 Mb.
Users can also query for known SNPs by providing the corresponding dbSNP rs identifiers. Here are few examples of dbSNP rs#:
| dbSNP rs# |
| rs293794 |
| rs1052133 |
| rs3136820 |
| rs2272615 |
| rs2953993 |
| rs1799782 |
| rs25487 |
| rs2248690 |
| rs4918 |
| rs1071592 |
Note that, depending on the genome assembly, the functional annotation for a given SNP can be quite different. Users are therefore requested to take caution regarding the choice of genome assembly.
SNPnexus allows users to submit batch query when dealing with large numbers of variations. Users can either paste the variants list directly into the designed text space or upload a file containg the queries. Currently we limit the maximum number of variants in a single batch query to 100,000. We only allow batch query using genomic position and/or dbsnp rs# formats. No chromosomal region query data is allowed. Each variant must be on a new line with tab-delimited data in one of the following formats:
| < Type | Name | Position | Allele1 | Allele2 | Strand > | # Genomic position data for novel SNPs |
| < "dbsnp" | rs# > | # dbSNP rs number for known SNPs |
Example of a batch query is shown below, which one can paste directly into the textarea provided in the interface:
| Chromosome | 1 | 100002626 | A | T | 1 |
| Contig | NT_023736 | 2025395 | A | T | 1 |
| Clone | AC105270 | 154799 | A | T | 1 |
| dbsnp | rs293794 | ||||
| dbsnp | rs1052133 |
Alternatively, users can upload batch query files like this example. Note that, known SNPs must be preceded by keyword "dbsnp" to be recognized as dbSNP rs#.
Genomic Mapping and other information
The result table containing genomic annotations has following columns:
SNP_name: <dbsnp rs#> or <chromosome/contig/clone id,"_",position>
Contig: SNP mapped contig location
contigStart: SNP start mapping position on contig
contigEnd: SNP end mapping position on contig
Chromosome: SNP mapped chromosome location
chromStart: SNP start mapping position on chromosome
chromEnd: SNP end mapping position on chromosome
Band: SNP cytogenetic location
dbSNP: link to dbSNP, if known
HapMap: link to Hapmap population data, if known
The result table containing gene/protein consequences on a particular gene annotation system may have following columns:
SNP_name: <dbsnp rs#> or <chromosome/contig/clone id,"_",position>
Allele: <reference allele,"|",observed allele(s)>
Symbol: Gene symbol
Gene: Gene name in the corresponding annotation system
Transcript: Transcript name in the corresponding annotation system
Entrez gene: Entrez gene id
Predicted function: Transcript location. Possible categories: coding, intronic, intronic (splice_site), 5utr, 3utr, 5upstream, 3downstream, non-coding, non-coding intronic, non-coding intronic (splice_site)
cdna_pos: SNP position on cdna, if the predicted function is coding, 3'UTR or 5'UTR
cds_pos: SNP position on cds, if the predicted function is coding
aa_pos: Position of the first amino acid (possibly) effected in the resultant peptide chain, if the predicted function is coding
aa_change: Peptide <reference amino acid(s),">", observed amino acid(s)_1 [,"|", observed amino acid(s)_2, ... ] >
Note: Functional type if the predicted function is coding. Possible values: syn (synonymous), nonsyn (non-synonymous) [stop-gain or stop-loss], frameshift [stop-gain or stop-loss], pepshift (peptide shift, block substitution).
Preceded by "*", if the protein is incomplete (missing stop-codon)
splice_dist: Distance to splice junction, if the predicted function is intronic
proteins: reference and observed peptide sequences separated by "|", if the predicted function is coding. Available only in the downloadable text and excel files.
The result table containing the predicted effect on protein has following columns:
SNP_name: SNP name
Allele: <reference allele,"|",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Protein: Protein name in the Ensembl gene annotation system
aa_pos: Position of the amino acid affected in the resultant peptide chain
wild_aa: Reference amino acid
mutant_aa: Observed amino acid
Score: SIFT prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: SIFT predicted effect on protein based on the score. Possible values: DAMAGING (score <= 0.5), TOLERATED (score > 0.5)
Confidence: Degree of reliability about the prediction. Possible values: HIGH, LOW
The result table containing the specific Hapmap population data has following columns:
Name: SNP name
Genotype(1/2/3): Observed Genotype
Count: Number of observed samples with the genotype
Frequency: Percentage of observed samples with the genotype
Allele(1/2): Observed allele
Count: Number of observed samples with the allele
Frequency: Percentage of observed samples with the allele
The Transcription Factor Binding Sites (TFBS) result table has following columns:
SNP_name: SNP name
TFBS_id: TFBS id
Chromosome: Chromosome name
chromStart: Start position of the TFBS site in the chromosome
chromEnd: End position of the TFBS site in the chromosome
TFBS_Accession: TFBS accession number
TFBS_Species: Transcription factor species
TFBS_name: Transcription factor name
SwissProt_Accession: SwissProt accession number
The First exon and promoter prediction result table has following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the prediction in the chromosome
chromEnd: End position of the prediction in the chromosome
FirstEF_Name: Name of the item containing the type of prediction (exon, promoter, CpG window)
Probability: Prediction score. Possible values: 0 to 1000
Strand: + or -
The miRBASE result table has following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the microRNA in the chromosome
chromEnd: End position of the microRNA in the chromosome
Name: microRNA name
Accession: miRBASE accession number
Strand: + or -
The Vista Enhancer prediction result table has following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the Vista element in the chromosome
chromEnd: End position of the Vista element in the chromosome
Vista_Item: Name of the Vista element
Score: Prediction score. Possible values: 900 (Positive-enhancer), 200 (Negative-enhancer)
The CpG Island prediction result table has following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the CpG island in the chromosome
chromEnd: End position of the CpG island in the chromosome
CpG_Island: Name of the CpG Island
Length: Island Length
Cpg%: Percentage of island that is CpG
C/G%: Percentage of island that is C or G
Ratio: Ratio of observed to expected CpG in island
The TargetScan miRNA regulatory sites result table has following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the site in the chromosome
chromEnd: End position of the site in the chromosome
Item_Name: Name of the predicted target site
Score: Prediction scores by TargetScanS. Possible values: 0 to 1000
Strand: + or -
The miRNAs/snoRNAs/scaRNAs result table has following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position in the chromosome
chromEnd: End position in the chromosome
Name: Name of the miRNA/snoRNA/scaRNa
Score: Prediction scores. Possible values: 0 to 1000
Strand: + or -
Type: Type of RNA
The Vertebrate Alignment and Conservation result table contains the following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the aligned element in the chromosome
chromEnd: End position of the aligned element in the chromosome
Id: Name of the aligned element
Score: Estimated probability scores for conservation. Possible values: 0 to 1000
Phenotype & Disease Association
The Genetic Association Database (GAD) result table contains the following columns:
SNP_name: SNP name
GAD Id: GAD id
Association: Confirmed association
Phenotype: Phenotype description
Disease_Class: Type of disease
Gene: Gene name
Reference: Reference of publication of the study
Pubmed: Pubmed id of publication of the study
SNP reported: Whether the known SNP is directly reported in the study. Possible values: Y(yes), N(no)
Associated SNPs: SNPs associated with the disease as reported in the study
Population: Sample population
Entrez gene: Entrez gene id
The COSMIC result table contains the following columns:
SNP_name: SNP name
Mutation Id: Cosmic mutation id
Sample: Cosmic sample id
Site: Primary Effected site
Histology: Primary Histology
Histology Subtype : Subtype of primary histology
Symbol: Gene symbol
Pubmed: Pubmed id of publication of the study
The GWAS catalogue result table contains the following columns:
SNP_name: SNP name
Catalogue Id: ID of SNP associated with trait
Region: Chromosome band/region of SNP
Genes: Reported Gene(s)
Allele_frequency: Risk Allele Frequency
Trait: Disease or trait assessed in study
Population: Initial sample population for the study
Platform: Platform and [SNPs passing Quality Control]
Pubmed: Pubmed id of publication of the study
Each of the structural variations result table contains the following columns:
SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the structural variation in the chromosome
chromEnd: End position of the structural variation in the chromosome
Reference: Literature reference for the study that included this variant
Pubmed: Pubmed id of publication of the study
Method: Brief description of method/platform
Sample: Description of sample population for the study

