Seven clusters in genomic triplet distributions

A. N. Gorban; A. Yu. Zinovyev; T. G. Popova

ゲノムトリプレット分布の7つのクラスター

いくつかの最近の論文では、既知の遺伝子の学習データセットを必要とせずにタンパク質コーディング領域を検出するための新しい遺伝子検出アルゴリズムが提案されました。オリゴマーの頻度分布におけるクラスター構造の存在に密接に関連して、教師なしの遺伝子検出が可能であるという事実。この論文では、純粋なデータ探索戦略を使用して、三重項周波数の空間におけるいくつかのゲノムのクラスター構造を研究します。スライディングウィンドウでのトリプレット周波数の表の視覚化を使用して、いくつかの完全なゲノムシーケンスを分析しました。トリプレット周波数の64次元ベクトルの分布は、検出可能なクラスター構造を示しています。構造は7つのクラスターで構成され、2つの相補鎖の1つと非コーディング領域の3つの可能なフェーズのタンパク質コーディング情報に対応し、高精度（ヌクレオチドレベルで90％以上）であることがわかりました。構造を視覚化して理解することにより、さまざまな遺伝子予測ツールのパフォーマンスを効果的に分析できます。このメソッドはORFの抽出を必要としないため、組み立てられていないゲノムにも適用できます。トリプレット分布の情報内容と平均場モデルの妥当性が分析されます。

In several recent papers new gene-detection algorithms were proposed for detecting protein-coding regions without requiring learning dataset of already known genes. The fact that unsupervised gene-detection is possible closely connected to existence of a cluster structure in oligomer frequency distributions. In this paper we study cluster structure of several genomes in the space of their triplet frequencies, using pure data exploration strategy. Several complete genomic sequences were analyzed, using visualization of tables of triplet frequencies in a sliding window. The distribution of 64-dimensional vectors of triplet frequencies displays a well-detectable cluster structure. The structure was found to consist of seven clusters, corresponding to protein-coding information in three possible phases in one of the two complementary strands and in the non-coding regions with high accuracy (higher than 90% on the nucleotide level). Visualizing and understanding the structure allows to analyze effectively performance of different gene-prediction tools. Since the method does not require extraction of ORFs, it can be applied even for unassembled genomes. The information content of the triplet distributions and the validity of the mean-field models are analysed.

updated: Tue Nov 23 2004 13:09:00 GMT+0000 (UTC)

published: Thu May 29 2003 11:36:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト