Enter a list of protein names, UniProt accessions, UniProt entry names, or protein names and amino acid positions for the "Inputs" field. The inputs
can be comma-separated or multiple-line format. For amino acid positions with a mutation, the format should be in <Protein_Name>:<Original_Residue><Position><Mutated_Residue>
(MTOR:A8S). It can be comma-separated or multiple-line format (case-senstive).
Select an organism from the select menu. The protein identitifiers have to match with the organism selected.
Click the "Submit" button
Once submitted, an excel like table and image of the first protein in the table will be generated on a new
Click on an isoform identifier to view an isoform and its features.
The disrupted features will be highlighted in red and marked as "disrupted" in the "Status" column.
Enter a protein name for "Protein Name" field. Select an organism from the select menu. If radio button for "Show Isoforms" is "Yes", a list of isoform IDs will be display in a drop down menu based on the user's protein and organism input selections. By default, "canonical" is the selected. Users can then enter a list of protein features that they want to display in the viewer in a <coordinate>:<feature_name> format.
Click on the "Submit" button
Once submitted, a protein feature viewer will be generated on the buttom of the page.
Users can view their custome features under the "Custom Features" category.
Supported sequence annotations
This tool extracts sequence features based on Gene(s) or Gene
position(s) from Human UniProt. For the "Search and Visualize Features" option, the
input takes the Gene Symbol(s) or Gene Position(s) in a list or csv format. A user can
select the name of feature(s) he or she wants to extract. If no
feature type is selected, then the tool will return all the
sequence features for that gene or affected by that position. When entering Gene Position(s),
the input takes the form of <Gene_symbol>:<Original_Residue><Position><Mutated_Residue>
(for example: MTOR:A8S). This option will return all or selected
features that are affected by the mutation(s).
chain - Extent of a polypeptide chain in the mature protein
peptide - Extent of an active peptide in the mature protein
signal - Sequence targeting proteins to the secretory pathway or periplasmic space
transit - Extent of a transit peptide for organelle targeting
init_met - Cleavage of the initiator methionine
propep - Part of a protein that is cleaved during maturation or activation
region - Region of interest in the sequence
domain - Position and type of each modular protein domain
repeat - Positions of repeated sequence motifs or repeated domains
zn_fing - Position(s) and type(s) of zinc fingers within the protein
motif - Short (up to 20 amino acids) sequence motif of biological interest
compbias - Region of compositional bias in the protein
topo_dom - Location of non-membrane regions of membrane-spanning proteins
np_bind - Nucleotide phosphate binding region
transmem - Extent of a membrane-spanning region
dna_bind - Position and type of a DNA-binding domain
ca_bind - Position(s) of calcium binding region(s) within the protein
coiled - Positions of regions of coiled coil within the protein
intramem - Extent of a region located in a membrane without crossing it
Amino acid modifications
mod_res - Modified residues excluding lipids, glycans and protein cross-links
carbohyd - Covalently attached glycan group(s)
non_std - Occurence of non-standard amino acids (selenocysteine and pyrrolysine) in the protein sequence.
disulfide - Cysteine residues participating in disulfide bonds.
crosslnk - Residues participating in covalent linkage(s) between proteins.
variant - Description of a natural variant of the protein
Natural variants - subsitution
subsitution - When original sequence length is > 1 and variant seqeunce length is > 1
Natural variants - insertion
insertion - When original sequence length is == 1 or 0 and variant seqeunce length is > 1
Natural variants - deletion
deletion - When original sequence length is > 1 or 0 and variant seqeunce length is == 0
conflict - Description of sequence discrepancies of unknown origin
mutag - Site which has been experimentally altered by mutagenesis
unsure - Regions of uncertainty in the sequence
non-cons - Indicates that two residues in a sequence are not consecutive
non-ter - The sequence is incomplete. Indicate that a residue is not the terminal residue of the complete protein
helix - Helical regions within the experimentally determined protein structure
turn - Turns within the experimentally determined protein structure
strand - Beta strand regions within the experimentally determined protein structure
site - Binding site for any chemical group (co-enzyme, prosthetic group, etc.)
act_site - Amino acid(s) directly involved in the activity of an enzyme
binding - Binding site for any chemical group (co-enzyme, prosthetic group, etc.)
metal - Binding site for a metal ion
user_variant - User inputted variants
User Inputed Feature
user_feature - User inputted feature
Uniprot version and statistics
Data source and process
We download UniProt XML files from ftp://ftp.uniprot.org/pub/databases/uniprot/
and load the data into the internal MongoDB collections. The
entire process is fully automated. We have scheduled cron jobs that
updates the database whenever an update is available on the UniProt site.
Isoform features coordinates calculation
First, features that overlap with a deletion events are removed from the graphical viewer. The remaining features' coordinates are calculated according to the splice events.
For example, in the canonical form of CASC4, there are topological domain - Cytoplasmic located at 1-14 and
transmembrane region - Helical; Signal-anchor for type II membrane protein located at 15-35.
Both of these features were discrupted on isoform Q6P4E1-5, because of a 22 amino acid deletion.
An intact coiled-coil region is transformed from 35 - 198 to 13 - 176 (35 - 22 = 13, 198 - 22 = 176) and a subsitition (20 aa in the original sequence and 23 aa in the subsitute sequence) is transformed from 414 - 433 to 392 - 414 (414-22 = 392, 433 - 22 + (23 - 20) = 414).
When visualizing isoform features, deletion and insertion are accounted in the isoform length and subsitution will be show as blue rectangles.
From HomoloGene database, we obtained 13 organisms that have common HomoloGene identifiers. We then use bioDbNet (biological DataBase network) to convert identifiers from one organism into homolog identifiers of a different organism.
Users can remove or sort the organisms based on their requirements.