标题:蛋白相邻类的聚簇Cluster of Orthologous Groups of proteins(COG)介绍
Cluster of Orthologous Groups of proteins(COG)
The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes.
The protein database of Clusters of Orthologous Groups (COGs) is an attempt to phylogenetically classify the complete complement of proteins (both predicted and characterized) encoded by complete genomes. Each COG is a group of three or more proteins that are inferred to be orthologs, i.e., they are direct evolutionary counterparts. The current release of the COGs database consists of 4,873 COGs, which include 136,711 proteins (~71% of all encoded proteins) from 50 bacterial genomes, 13 archaeal genomes, and 3 genomes of unicellular eukaryotes, the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, and the microsporidian Encephalitozoon cuniculi. The COG database is updated periodically as new genomes become available. The COGs for complete eukaryotic genomes are in preparation. The COGs can be applied to the task of functional annotation of newly sequenced genomes by using the COGnitor program, which is available on the COGs homepage.
“COG”是Cluster of Orthologous Groups of proteins(蛋白相邻类的聚簇)的缩写。构成每个COG的蛋白都是被假定为来自于一个祖先蛋白,并且因此或者是orthologs或者是 paralogs。Orthologs是指来自于不同物种的由垂直家系(物种形成)进化而来的蛋白,并且典型的保留与原始蛋白有相同的功能。 Paralogs是那些在一定物种中的来源于基因复制的蛋白,可能会进化出新的与原来有关的功能。请参考文献获得更多的信息。
COG是通过把所有完整测序的基因组的编码蛋白一 个一个的互相比较确定的。在考虑来自一个给定基因组的蛋白时,这种比较将给出每个其他基因组的一个最相似的蛋白(因此需要用完整的基因组来定义COG。注 1)这些基因的每一个都轮番的被考虑。如果在这些蛋白(或子集)之间一个相互的最佳匹配关系被发现,那么那些相互的最佳匹配将形成一个COG(注2)。这 样,一个COG中的成员将与这个COG中的其他成员比起被比较的基因组中的其他蛋白更相像,尽管如果绝对相似性比较的。最佳匹配原则的使用,没有了人为选 择的统计切除的限制,这就兼顾了进化慢和进化快的蛋白。然而,还有一个加的限制就是一个COG必须包含来自于3个种系发生上远的基因组的一个蛋白。
• Tatusov et al. (1997). A genomic perspective on protein families. Science 278: 631-637.
• Koonin et al. (1998). Beyond complete genomes: from sequence to structure and function. Curr. Opin. Struct. Biol. 8: 355-363.
• Galperin et al. (1999). Comparing microbial genomes: How the gene set determines the lifestyle. In Organization of the Prokaryotic Genome, R.L. Charlebois, Ed. (American Society of Microbiology, Washington, DC) pp. 91-108.
• Tatusov et al. (2000). A genomic perspective on protein families. Nucleic Acids Res. 28: 33-6.