From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System
ISSN der Zeitschrift
Datenbanksysteme in Business, Technologie und Web (BTW 2007) – 12. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS)
Regular Research Papers
Gesellschaft für Informatik e. V.
We propose a new system that is able to handle the entire Personal Datas-pace of a user. A Personal Dataspace includes all data pertaining to a user on all his disks and on remote servers such as network drives, email and web servers. This data is represented by a heterogeneous mix of files, emails, bookmarks, music, pictures, calendar data, personal information streams and so on. State-of-the-art tools such as desktop search engines and desktop operating systems (including the upcoming Vista) are not enough as they neither solve the problem of physical personal information independence (where is my data) nor format and data model independence (how is it stored and which application do I have to use in order to access that data). Our work builds on the visions presented in [DSKB05], which calls for a single system to manage the personal information jungle, and [FHM05], which advocates dataspaces as a new abstraction for information management. In contrast to [FHM05] this paper presents a concrete implementation of a Personal DataSpace Management System (PDSMS) termed iMeMex: integrated memex. We discuss the core architecture of iMeMex and services offered by our system. As we will show, a PDSMS can be seen as a system that occupies the middleground between a search engine, a database management system, and a traditional information integration system. A PDSMS has to bridge these separate worlds and requires: (1) no full control on data, i.e., data may be accessed bypassing the interfaces of a PDSMS, (2) simple keyword search on all data available in a dataspace without performing any semantic data integration, (3) rich querying able to mix structural, attribute, and content predicates, (4) pay-as-you-go integration capabilities, (5) the ability to define arbitrary logical views on all data, (6) durability and consistency guarantees to avoid loss of data assigned to a dataspace, and (7) update capabilities. iMeMex is the first implementation of a PDSMS we are aware of. This paper presents the architecture of iMeMex and reports on the current state of the iMeMex research project at ETH Zurich.