Machine learning approaches for deciphering complex pathomechanismsin cancer
ISSN der Zeitschrift
Informatik bewegt: Informatik 2002 - 32. Jahrestagung der Gesellschaft für Informatik e.v. (GI), Ergänzungsband
Regular Research Papers
Gesellschaft für Informatik e.V.
Recent years have seen a dramatic increase in the amount of genetic information stored in electronic format. It has been estimated that the amount of information in genomics and proteomics doubles every 20 months and the size and number of databases are increasing even faster. It is widely accepted that a sophisticated exploration of such data is crucial in a variety of fields such as disease genetics and pharmacogenomics. While both corporate and institutional efforts have concentrated on the integration of heterogeneous data in genomics and proteomics, a systematic data exploration is still at its beginning. Although data mining has celebrated many successes in business operations applications as retail and marketing (see e.g. ), its application to scientific and engineering data is not straightforward. Data sets in life sciences are often significantly larger in volume, structurally more complex then traditional business data, and often rapidly changing in time. In contrast to business environments, the body of existing background knowledge in life sciences is extensive. I will report on our recent efforts [2,3] to adapt data mining technology in particular from the field of machine learning for effective knowledge discovery in tumour genetics. To exemplify the power of this approach, we will describe how complex concepts such as survival or therapy response [4,5] can be learnt from heterogeneous clinical or molecular data.