barts-and-london_sml
SNPnexus

Barts Cancer Institute

  • Home
  • About
  • User Guide
  • Citation
  • Contact

User Guide

  • Data Source
  • Input Format
    • Genome Assembly
    • Single Query
      • Genomic Position
      • Chromosomal Region
      • dbSNP rs#
    • Batch Query
  • Output Format
    • Genomic Mapping
    • Gene/Protein Consequences
    • Effect on Protein Function
    • HapMap Population
    • Regulatory Elements
    • Conservation
    • Phenotype & Disease Association
    • Structural Variations

Data Source

Currently SNPnexus supports the two most recent human genome assemblies:
    1. GRCh37/hg19 (default)
    2. NCBI36/hg18.

The underlying SNPnexus database is kept synchronised with the UCSC human genome annotation database. However, data for some annotation categories comes from different sources.

Category hg18 hg19
Known SNP information Ensembl Variation 54;dbSNP 129 Ensembl Variation 63;dbSNP 132
Gene Definition RefSeq UCSC hg18 UCSC hg19
Ensembl UCSC hg18;Ensembl 54 UCSC hg19
Acembly UCSC hg18 UCSC hg19
Vega UCSC hg18;Vega 34 UCSC hg19
UCSC UCSC hg18 UCSC hg19
CCDS UCSC hg18 UCSC hg19
H-inv UCSC hg19
Hapmap Population UCSC hg18 UCSC hg19
Protein Effect SIFT SIFT Human DB (release 63)
Regulatory Elements miRBASE release 18(liftover) release 18
Other UCSC hg18 UCSC hg19
Phenotype/Disease Association GAD UCSC hg18 GAD update Oct 2011
COSMIC version 56 version 56
GWAS UCSC hg18 UCSC hg19
Conservation & Structural Variations UCSC hg18 UCSC hg19


Input Format

SNPnexus currently accepts query input data in three different forms (genomic position, chromosomal region or dbSNP id) and two different human genome assemblies. Users can annontate a single SNP, insertion/deletion (InDel) or block substitution by selecting one of the input formats and supplying the required data into the graphical interface. It also allows users to run batch queries by uploading the apprpriately formatted input file or pasting the queries into the interface. The formats are explained in more details below.

Human Genome Assembly

Users can select the genome assembly (hg18 or hg19) based on which the queried variants would be annotated. When querying for the novel SNPs providing genomic position, users should carefully select the correct genome assembly that reflects the intended genomic position of the SNP. Unexpected result might appear if wrong genome assembly is selected.

Single Query

Genomic Position

Users can annotate a newly discovered variant by providing the following data into the interface: type (Chromosome/Contig/Clone), name, relative position, reference nucleotide/s (Allele1), observed nucleotide/s (Allele2), positive (1) or negative (-1) strand. One-based coordinate system is used to describe genomic position. Here are few examples on hg18 assembly:

Type Id Position Alelle1 Allele2 Strand
Chromosome 1 100002626 A T 1
Contig NT_023736 2025395 C T 1
Clone AC105270 154799 A T 1

The tool has been modified to support insertions or deletions by using - as the placeholder. Users need to insert Allele1=- to indicate Allele2 insertion in the corresponding genomic position. Similarly, Allele2=- can be used to denote deletion of Allele1 from the given genomic position. Similar to single nucleotide substitution, the tool also supports block substitution when the user provides Allele1 and Allele2 data of same or different length. Here are few examples for insertion and deletion on hg19 assembly:

Type Id Position Alelle1 Allele2 Strand #Comment
Chromosome 3 9798773 C - 1 # 1-nucleotide deletion
Chromosome 3 9798773 CCC - 1 # 3-nucleotide deletion
Chromosome 3 9798773 - G 1 # 1-nucleotide insertion
Chromosome 3 9798773 - GTC 1 # 3-nucleotide insertion
Chromosome 3 9798773 CCCG GT 1 # block substitution

Note that, the tool supports multiple nucleotides in place of Allele1 and Allele2. However, for practical reasons, users are not encouraged to provide very large blocks that can possibly positioned over more than one adjacent functional regions, i.e., adjacent intronic and exonic region, in which case the predicted functionality of the SNP provided by our tool will be based on the first functional region.

Finally, users can annotate reference and observed nucleotides complying with IUPAC nucleotide nomenclature to denote ambiguous nucleotides in certain position following the translation table shown below:

IUPAC CodeMeaning
GG
A A
T T
C C
R G or A
Y T or C
M A or C
K G or T
S G or C
W A or T
H A or C or T
B G or T or C
V G or C or A
D G or A or T
N G or A or T or C

Here are few examples:

Type Id Position Alelle1 Allele2 Strand #Comment
Chromosome 1 100002626 A S 1 # G or C substitution with A
Chromosome 3 9798773 - R 1 # G or A insertion


Chromosomal Region

Users can query for known SNPs in a given chromosomal region by providing the following data: Chromosome, start position, end position. The tool will identify and annotate all the known SNPs defined in the selected region. Here are few examples on hg18 assembly:

Chromosome Start End
3 9798000 9799000
1 100000000 100050000

Currently we limit users to query for known SNPs in the genomic region of maximum size 1 Mb.


dbSNP rs#

Users can also query for known SNPs by providing the corresponding dbSNP rs identifiers. Here are few examples of dbSNP rs#:

dbSNP rs#
rs293794
rs1052133
rs3136820
rs2272615
rs2953993
rs1799782
rs25487
rs2248690
rs4918
rs1071592

Note that, depending on the genome assembly, the functional annotation for a given SNP can be quite different. Users are therefore requested to take caution regarding the choice of genome assembly.


Batch Query

SNPnexus allows users to submit batch query when dealing with large numbers of variations. Users can either paste the variants list directly into the designed text space or upload a file containg the queries. Currently we limit the maximum number of variants in a single batch query to 100,000. We only allow batch query using genomic position and/or dbsnp rs# formats. No chromosomal region query data is allowed. Each variant must be on a new line with tab-delimited data in one of the following formats:

< Type Name Position Allele1 Allele2 Strand > # Genomic position data for novel SNPs
< "dbsnp" rs# > # dbSNP rs number for known SNPs

Example of a batch query is shown below, which one can paste directly into the textarea provided in the interface:

Chromosome 1 100002626 A T 1
Contig NT_023736 2025395 A T 1
Clone AC105270 154799 A T 1
dbsnp rs293794
dbsnp rs1052133

Alternatively, users can upload batch query files like this example. Note that, known SNPs must be preceded by keyword "dbsnp" to be recognized as dbSNP rs#.



Output Format

Genomic Mapping and other information

The result table containing genomic annotations has following columns:

SNP_name: <dbsnp rs#> or <chromosome/contig/clone id,"_",position>
Contig: SNP mapped contig location
contigStart: SNP start mapping position on contig
contigEnd: SNP end mapping position on contig
Chromosome: SNP mapped chromosome location
chromStart: SNP start mapping position on chromosome
chromEnd: SNP end mapping position on chromosome
Band: SNP cytogenetic location
dbSNP: link to dbSNP, if known
HapMap: link to Hapmap population data, if known

Gene/Protein Consequences

The result table containing gene/protein consequences on a particular gene annotation system may have following columns:

SNP_name: <dbsnp rs#> or <chromosome/contig/clone id,"_",position>
Allele: <reference allele,"|",observed allele(s)>
Symbol: Gene symbol
Gene: Gene name in the corresponding annotation system
Transcript: Transcript name in the corresponding annotation system
Entrez gene: Entrez gene id
Predicted function: Transcript location. Possible categories: coding, intronic, intronic (splice_site), 5utr, 3utr, 5upstream, 3downstream, non-coding, non-coding intronic, non-coding intronic (splice_site)
cdna_pos: SNP position on cdna, if the predicted function is coding, 3'UTR or 5'UTR
cds_pos: SNP position on cds, if the predicted function is coding
aa_pos: Position of the first amino acid (possibly) effected in the resultant peptide chain, if the predicted function is coding
aa_change: Peptide <reference amino acid(s),">", observed amino acid(s)_1 [,"|", observed amino acid(s)_2, ... ] >
Note: Functional type if the predicted function is coding. Possible values: syn (synonymous), nonsyn (non-synonymous) [stop-gain or stop-loss], frameshift [stop-gain or stop-loss], pepshift (peptide shift, block substitution). Preceded by "*", if the protein is incomplete (missing stop-codon)
splice_dist: Distance to splice junction, if the predicted function is intronic
proteins: reference and observed peptide sequences separated by "|", if the predicted function is coding. Available only in the downloadable text and excel files.

Effect on Protein fucntion

The result table containing the predicted effect on protein has following columns:

SNP_name: SNP name
Allele: <reference allele,"|",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Protein: Protein name in the Ensembl gene annotation system
aa_pos: Position of the amino acid affected in the resultant peptide chain
wild_aa: Reference amino acid
mutant_aa: Observed amino acid
Score: SIFT prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: SIFT predicted effect on protein based on the score. Possible values: DAMAGING (score <= 0.5), TOLERATED (score > 0.5)
Confidence: Degree of reliability about the prediction. Possible values: HIGH, LOW

Hapmap Population

The result table containing the specific Hapmap population data has following columns:

Name: SNP name
Genotype(1/2/3): Observed Genotype
Count: Number of observed samples with the genotype
Frequency: Percentage of observed samples with the genotype
Allele(1/2): Observed allele
Count: Number of observed samples with the allele
Frequency: Percentage of observed samples with the allele

Regulatory Elements

The Transcription Factor Binding Sites (TFBS) result table has following columns:

SNP_name: SNP name
TFBS_id: TFBS id
Chromosome: Chromosome name
chromStart: Start position of the TFBS site in the chromosome
chromEnd: End position of the TFBS site in the chromosome
TFBS_Accession: TFBS accession number
TFBS_Species: Transcription factor species
TFBS_name: Transcription factor name
SwissProt_Accession: SwissProt accession number

The First exon and promoter prediction result table has following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the prediction in the chromosome
chromEnd: End position of the prediction in the chromosome
FirstEF_Name: Name of the item containing the type of prediction (exon, promoter, CpG window)
Probability: Prediction score. Possible values: 0 to 1000
Strand: + or -

The miRBASE result table has following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the microRNA in the chromosome
chromEnd: End position of the microRNA in the chromosome
Name: microRNA name
Accession: miRBASE accession number
Strand: + or -

The Vista Enhancer prediction result table has following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the Vista element in the chromosome
chromEnd: End position of the Vista element in the chromosome
Vista_Item: Name of the Vista element
Score: Prediction score. Possible values: 900 (Positive-enhancer), 200 (Negative-enhancer)

The CpG Island prediction result table has following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the CpG island in the chromosome
chromEnd: End position of the CpG island in the chromosome
CpG_Island: Name of the CpG Island
Length: Island Length
Cpg%: Percentage of island that is CpG
C/G%: Percentage of island that is C or G
Ratio: Ratio of observed to expected CpG in island

The TargetScan miRNA regulatory sites result table has following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the site in the chromosome
chromEnd: End position of the site in the chromosome
Item_Name: Name of the predicted target site
Score: Prediction scores by TargetScanS. Possible values: 0 to 1000
Strand: + or -

The miRNAs/snoRNAs/scaRNAs result table has following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position in the chromosome
chromEnd: End position in the chromosome
Name: Name of the miRNA/snoRNA/scaRNa
Score: Prediction scores. Possible values: 0 to 1000
Strand: + or -
Type: Type of RNA

Conservation

The Vertebrate Alignment and Conservation result table contains the following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the aligned element in the chromosome
chromEnd: End position of the aligned element in the chromosome
Id: Name of the aligned element
Score: Estimated probability scores for conservation. Possible values: 0 to 1000

Phenotype & Disease Association

The Genetic Association Database (GAD) result table contains the following columns:

SNP_name: SNP name
GAD Id: GAD id
Association: Confirmed association
Phenotype: Phenotype description
Disease_Class: Type of disease
Gene: Gene name
Reference: Reference of publication of the study
Pubmed: Pubmed id of publication of the study
SNP reported: Whether the known SNP is directly reported in the study. Possible values: Y(yes), N(no)
Associated SNPs: SNPs associated with the disease as reported in the study
Population: Sample population
Entrez gene: Entrez gene id

The COSMIC result table contains the following columns:

SNP_name: SNP name
Mutation Id: Cosmic mutation id
Sample: Cosmic sample id
Site: Primary Effected site
Histology: Primary Histology
Histology Subtype : Subtype of primary histology
Symbol: Gene symbol
Pubmed: Pubmed id of publication of the study

The GWAS catalogue result table contains the following columns:

SNP_name: SNP name
Catalogue Id: ID of SNP associated with trait
Region: Chromosome band/region of SNP
Genes: Reported Gene(s)
Allele_frequency: Risk Allele Frequency
Trait: Disease or trait assessed in study
Population: Initial sample population for the study
Platform: Platform and [SNPs passing Quality Control]
Pubmed: Pubmed id of publication of the study

Structural Variations

Each of the structural variations result table contains the following columns:

SNP_name: SNP name
Chromosome: Chromosome name
chromStart: Start position of the structural variation in the chromosome
chromEnd: End position of the structural variation in the chromosome
Reference: Literature reference for the study that included this variant
Pubmed: Pubmed id of publication of the study
Method: Brief description of method/platform
Sample: Description of sample population for the study

Copyright © 2008
Barts Cancer Institute