Classifying Proteins as Extracellular using Programmatic Motifs and Genetic Programming

Source:

Proceedings of the 1998 IEEE World Congress on Computational Intelligence, IEEE Press, Anchorage, Alaska, USA, p.212--217 (1998)

URL:

http://www.genetic-programming.com/jkpdf/icec1998.pdf

Keywords:

genetic algorithms; genetic programming

Abstract:

As newly sequenced proteins are deposited into the world' s ever-growing archive of protein sequences, they are typically immediately tested by various computerized algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location - that is, where the protein resides in a living organism (extracellular, intracellular, etc.). A 1997 paper reported a human-created five-way algorithm for cellular location created using statistical techniques with 76% accuracy. This paper describes a two-way classification algorithm that was evolved using genetic programming with 83% accuracy for determining whether a protein is extracellular. Unlike the statistical calculation, the genetically evolved algorithm employs a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, memory, data structures, set-creating operations, macro definitions, recursion, etc. The genetically evolved classification algorithm can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif. The genetically evolved program constitutes an instance of an evolutionary computation technique producing a solution to a problem that is competitive with that produced using human intelligence.