SNPnexus was designed to simplify and assist in the selection of functionally relevant Single Nucleotide Polymorphisms (SNP) for large-scale genotyping studies of multifactorial disorders. The tool has been upgraded in 2011 to provide additional support for multiple nucleotide substitutions and insertions/deletions (indels) covering the wider range of variation data.
SNPnexus allows single queries using dbSNP identifiers or chromosomal regions for annotating known variants. The users are also allowed to provide novel in-house SNPs/indels using genomic coordinates on clones, contigs and chromosomes. For practical purposes, SNPnexus allows batch queries comprising SNP data using dbSNP identifiers or genomic coordinates. SNPnexus is updated on a regular basis to be synchronized with UCSC human genome annotation database and it provides the scientific community with a friendly web-interface to compute the following data:
1. Genomic Mapping and additional annotations
SNPnexus provides genomic coordinates for the queried SNPs/indels in terms of their physical (on chromosome and contig) and cytogenetic positions. When novel in-house SNPs are submitted, the tool retrieves whether these overlaps with existing publicly available known SNPs.
and subsequently provides the links, if any, to dbSNP (Sherry et al., 2001) and HapMap populations (The International HapMap Consortium 2007).
A wide range of possible functional consequences is computed on the major gene annotation systems from NCBI RefSeq (Pruitt et al., 2007), UCSC Known Genes(Hsu et al., 2006), Ensembl (Hubbard et al., 2007), Vega (Wilming et al., 2008), AceView (Thierry-Mieg and Thierry-Mieg, 2006), CCDS (Pruitt et al., 2009) and H-Invitational (Yamasaki et al., 2010). The predicted functional effect falls into one of the following consequences:
| Transcript Type | Predicted Function | Description |
| Coding | coding | In coding region |
| intronic | In intron | |
| intronic (splice_site) | Within 2-bp of an intron/exon junction | |
| 5’UTR | In 5' untranslated region | |
| 3’UTR | In 3' untranslated region | |
| 5-upstream | Within 2 kb upstream of the 5' end of a transcript | |
| 3-downstream | Within 2 kb downstream of the 3' end of a transcript | |
| Non-Coding | non-coding | In exon |
| non-coding intronic | In intron | |
| non-coding intronic (splice_site) | Within 2-bp of an intron/exon junction |
For intronic SNPs, the distance to the splicing site is reported. For coding variants, the coordinates of the first nucleotide position within the cdna and cds as well as the resultant first amino acid position in the peptide chain are reported.
Since coding variants are of special interest, we provide further information about the mutation type such as whether the single substitution is synonymous or non-synonymous. We also report whether non-synonymous substitution results in immediate stop-codon gain or loss. In case of insertion/deletion/block substitution occurring within coding region, we report the occurrence as frameshift if the total number of nucleotides to be replaced is not a multiple of 3, in which case we also report early stop or stop loss scenario. If the total number of nucleotides to be replaced is a multiple of 3, we report it as peptide shift. In all these cases, we show the change of amino acids in the reported region. The reference/altered protein sequence can be found in the resultant excel file. Transcripts with incomplete ORF (with missing or premature stop codon) and incomplete proteins are identified in the "note" column (representing the effect of mutation) by a "*" symbol. Unrecognisable alleles containing characters other than IUPAC base characters and "-" are identified in the "note" column as "Unknown". The predicted function for these cases will only be based on the SNP position on the gene.
Users can also download all the results in excel format, where we report an additional column containing the protein sequences before and after each substitution separated by '|'.
3. Effect on Protein Function
For non-synonymous single amino acid substitution, we provide the predicted effect on protein function (Tolerated or Damaging) based on the SIFT prediction (Kumar et al., 2009).
Predictions are only shown for complete Ensembl proteins. Also, no predictions are shown for non-synonymous substitution resulting in stop-gain or stop-loss as these fundamentally changes the protein sequence.
4. Hapmap Population Data
For known SNPs, the tool provides related genotypes and allele frequency estimation retrieved from the Hapmap population data provided by The HapMap Project.
for the following four population on hg18 assembly,:
The Hapmap data provided by SNPnexus is based on the combined Phase II and III data from the International HapMap Project release 27.
On hg19 assembly, seven more populations are supported:
5. Regulatory Elements
Regulatory SNPs can be queried against any overlap with the following regulatory elements:
6. Conservation
SNPnexus shows the estimated probability score that a variant belongs to a conserved region, based on the multiple alignments of 44/46 vertebrate species using phastCons method from the PHAST package.
7. Phenotype & Disease Association
SNPnexus retrieves the connection between queried SNPs/indels and the following phenotype & disease association databases:
SNPnexus checks any overlap with putative copy number polymorphisms (CNPs), insertions/deletions (InDels), inversions and inversion breakpoints determined from various methods, as annotated by the Database of Genomic Variants (DGV) via UCSC.
Sherry,S.T. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311.
The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861.
Pruitt,K.D. et al. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 35, D61–D65.
Hsu,F. et al. (2006) The UCSC Known Genes. Bioinformatics, 22, 1036–1046.
Hubbard,T.J. et al. (2007) Ensembl 2007. Nucleic Acids Res., 35, D610–D617.
Wilming,L.G. et al. (2008) The Vertebrate Genome Annotation (Vega) database. Nucleic Acids Res., 36, D753–D760.
Thierry-Mieg,D. and Thierry-Mieg,J. (2006) AceView: a comprehensive cDNA supported gene and transcripts annotation. Genome Biol., 7 (Suppl. 1), S12.
Pruitt,K.D. et al. (2009) The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res., 19, 1316–1323.
Yamasaki,C. et al. (2010) H-InvDB in 2009: extended database and data mining resources for human genes and transcripts. Nucleic Acids Res., 38, D626–D632.
Kumar,P. et al. (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc., 4, 1073–1081.
Davuluri,R.V. et al. (2001) Computational identification of promoters and first exons in the human genome. Nat Genet., 29, 412–417.
Kozomara,A. and Griffiths-Jones,S. (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res., 39, D152–D157.
Pennacchio,L.A. et al. (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature, 444, 499–502.
Lewis,B.P. et al. (2003) Prediction of mammalian microRNA targets. Cell, 115, 787–798.
Bird,A.P. (1986) CpG-rich islands and the function of DNA methylation. Nature, 321, 209–213.
Lestrade,L. and Weber,M.J. (2006) snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res., 34, D158–D162.
Becker,K.G. et al. (2004) The genetic association database. Nat Genet., 36, 431–432.
Forbes,S.A. et al. (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res., 39 (Suppl. 1), D945– D950.
Hindorff,L.A. et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci., 106, 9362–9367.

