Using Programmatic Motifs and Genetic Programming to Classify Protein Sequences as to Extracellular and Membrane Cellular Location

Source:

LNCS, Springer-Verlag, Volume 1447, Mission Valley Marriott, San Diego, California, USA (1998)

ISBN:

3-540-64891-7

URL:

http://www.genetic-programming.com/jkpdf/ep1998.pdf

Keywords:

genetic algorithms; genetic programming

Abstract:

As newly sequenced proteins are deposited into the world's ever-growing archive of protein sequences, they are typically immediately tested by various algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location ­p; that is, where the protein resides in a living organism (extracellular, membrane, etc.). A human-created five-way algorithm for cellular location using statistical techniques with 76% accuracy was recently reported. This paper describes a two-way algorithm that was evolved using genetic programming with 83% accuracy for determining whether a protein is extracellular and with 89% accuracy for membrane proteins. Unlike the statistical calculation, the genetically evolved algorithm employs a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, memory, data structures, set-creating operations, macro definitions, recursion, etc. The genetically evolved classification algorithm can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif.