Protein Fragments

From Simple Sci Wiki
Jump to navigation Jump to search

Title: Protein Fragments

Main Research Question: How can we efficiently search for similar protein fragments in large databases?

Methodology:

1. The authors proposed two indexing schemes for similarity-based search in datasets of short (5-15 amino acid) protein fragments. 2. The first scheme, FMTree, decomposes the dataset into 'fibers', sets of constant self-similarity, and indexes each set using a well-known generic indexing scheme for metric spaces, the M-Tree. 3. The second scheme, FSindex, takes advantage of the intrinsic geometry of the dataset by reducing the amino acid alphabet and indexing the fragments based on their reduced representation.

Results:

1. The authors experimentally compared the performance of both schemes against datasets derived from Swis-Prot. 2. The first scheme (FMTree) did not outperform simple sequential scan. 3. In contrast, the second scheme (FSindex) performed exceptionally well, outputting 100 nearest neighbors to any possible fragment of length 10 after scanning on average less than one percent of the entire dataset.

Implications:

1. The results suggest that the FSindex scheme is more efficient for similarity search in protein fragments. 2. The study has implications for the field of bioinformatics, as it may lead to improvements in protein fragment search algorithms and contribute to the discovery of biologically important short peptide motifs. 3. Additionally, the research provides insights into the geometry of quasi-metrics and may contribute to the understanding of similarity search in general.

Link to Article: https://arxiv.org/abs/0309005v1 Authors: arXiv ID: 0309005v1