Distributed and parallel data mining on the grid
ISSN der Zeitschrift
ARCS 2004 – Organic and pervasive computing
Regular Research Papers
Gesellschaft für Informatik e.V.
This paper presents the initial design and implementation of a Gridbased distributed and parallel data mining system. The Grid system, namely the Business Intelligence Grid or BIGrid, is based on heterogeneous Grid server configurations and service-oriented Grid architecture. The system follows a layered design, whose infrastructure is divided into three tiers in general - Grid tier, a service tier and a client/portal tier. Issues of design and implementation, including brokering, task scheduling, adaptive mining script preparation and parallelization are discussed. The design and implementation of BIGrid help identify the specific requirements of applying Grid-based data mining in business realm, thus pave way for future design and implementation of a real generic Gridbased data mining system.