PSORT II is a free web tool that can be used to predict the localization of proteins in yeast and animal cells. The program receives an amino acid sequence and its source origin as inputs. It then analyzes the entered sequence by compar1ng it to various sequence features of known protein sorting signals. Finally, the integrated data shows the cell localization sites with the highest probabilities. In addition to PSORT II, which is a revised version of the older PSORT, several other related programs are available for sequences originating in plants, fungi, or bacteria.
The original PSORT program was coded by Kenta Nakai, now working at the University of Tokyo. In collaboration with Paul Horton, he developed PSORT II by replacing the reasoning algorithm of PSORT with a simpler one. PSORT II was officially released in December 1998.
The PSORT Server
The PSORT server represents various sub-databases:
- PSORT II, applicable to animal and yeast sequences
- WoLF PSORT, an update of PSORT II; applicable to fungi, animal, and plant sequences
- PSORT, an older version of PSORT II; applicable to bacterial and plant sequences
- iPSORT, useful for the detection of N-terminal sorting signals
- PSORT-B, applicable to Gram-negative bacteria sequences
The origin of the sequence determines which database to use for analysis.
Using PSORT II
PSORT II predicts a protein's localization based on its amino acid sequence (AAS). In order to submit a query, the source origin of the protein, e.g. an animal cell, and its AAS have to be provided. The results are then listed in the following fashion:
- Input Sequence
- Repetition of the entered sequence. All characters except the standard one-letter code for the 20 amino acids are removed by the program and small cases will be changed to capital ones. A warning is issued when the sequence does not begin with a methionine residue.
- Results of the Subprograms
- Various features of the entered sequence are compared to those of known protein sorting signals stored in the database. The stored sequences were obtained from SWISS-PROT. The feature outcomes are listed individually.
- Results of the Prediction
- Results of the subprograms are integrated into probabilities of localization. The cell localization sites with the highest percentages are shown.
PSORT II calculates the final prediction of protein localization by integrating the results from various subprograms. Each program compares a feature of the provided amino acid sequence with stored sequences obtained from SWISS-PROT and then computes a score. Analyzed features include, among others:
- signal sequence
- Sequences characteristic for sorting signals, such as the net charge of the N-terminal region and the length of the hydrophobic region, are used to predict the presence of a signal sequence.
- transmembrane segments
- Sequences are analyzed for the presence of possible α-helices, indicating whether a protein could be soluble or located in a membrane.
- membrane topology
- Since there seems to be a preference of membrane topology at each localization site, analysis of protein orientation in a membrane (i.e. whether the N-terminus is cytoplasmic or exo-cytoplasmic) helps to predict a protein's localization.
- mitochondrial proteins
- Sequences are specifically searched for mitochondrial sorting signals at the N-terminus.
- nuclear proteins
- Sequences are searched for nuclear localization signals (NLS), characterized by clusters of basic amino acid residues.
- ER proteins
- PSORT recognizes the KDEL (in animals) / HDEL (in yeast) consensus motif at the C-terminus characteristic for ER luminal proteins.
The algorithm used by PSORT II for assessing the probability of protein localization at each candidate site is the k-nearest neighbor (k-NN) algorithm. Namely, for each query protein, the output values of the subprograms mentioned above are normalized and the distances to all of the data points contained in the stored protein sequences are calculated. Then, the prediction is performed using the k nearest data points, where k is a predefined integer parameter. If these k data points contain, say, mitochondrial proteins with 50%, the query is predicted to be localized in the mitochondrium with the probability of 50%.
The reference amino acid sequences have last been updated in 2003.
- Nakai, Horton. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends in Biochemical Sciences. 24(1):34-35. (1999)
- Nakai. Protein sorting signals and prediction of subcellular localization. Advances in Protein Chemistry. 54:277-344. (2000)
- Gardy, Spencer, Wang, Ester, Tusnady, Simon, Hua, deFays, Lambert, Nakai, Brinkman. PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research. 31(13):3613-7. (2003) Nucleic Acids Research paper
|Databases supported by Bioinformatic Harvester|
| UniProt | SOURCE | SMART | SOSUI | PSORT | HomoloGene | gfp-cdna | IPI | OMIM |
NCBI-BLAST | Genome-Browser | Ensembl | RZPD | STRING | iHOP | Entrez