KADOP

http://www-rocq.inria.fr/~npreda/I/

KadoP is a peer to peer system, which manages semi-structural data. It can be used for the publication, indexation and query of XML data such as XML documents and Web service calls. It is based on two key peer to peer technologies: DHT and AXML. The DHT represents an overlay network application which allows the storage and the retrieval of key-value pairs. Giving a key it routes the message to a peer charged with the storage of the key.

KadoP manages XML information such as XML documents, Web services and semantic information in the form of hierarchy of concepts organized in hierarchy using the isA and partOf relationships. The XML data published may be annotated with semantic information. Fragments of XML documents (Ex: XML sub-trees, elements, words) can be defined as being relatedTo a concept. The query engine can make use of the semantic information to evaluate a query, if the query specifies this.

This turns to be useful in the case different users annotate/and use differently the same piece of data, depending on the role they have in the system.

The KadoP query language is based on tree pattern queries, and it allows the retrieval of pieces of published XML documents, based on conditions referring to the structure and the content of the searched information. The nodes of a KadoP tree pattern query represent data items, and the edges represent containment relationships among the nodes.

Each node may be annotated with:

  • (i) name conditions
  • (ii) semantic conditions of the form relatedTo c, where c denotes a concept;
  • (iii) textual conditions of the form ?contains W?, where W is a word. We distinguish a single return node of the query.
KadoP returns information at different granularities: instead of entire XML documents one can retrieve precise responses information such as XML elements (XML sub-trees). This has a positive impact on system?s performance, as published XML documents tens to be big while the requested piece of information is a sub-tree.

As EDOS project needs to publish several gigabytes of data, it also needs to make available to users almost one hundred megabytes of metadata and especially a way to query it. One of the goals is also to transform the metadata provided by Mandriva, from an internal and proprietary format, to a more suited one for distributed querying.

Existent DHTs unfortunately cannot provide a simple way to publish and interrogate the available information on packages (metadata). As systems they have different goals, so they only offer a basic searching tool: based on a key they route the message to a peer. Also the storage offered by the DHT applications is not suitable for the indexation of data (text or XML). Usually DHT applications are designed to store large file for each key and have a primitive update management. KadoP proposes a new approach for sharing content in peer to peer networks and yet, as a system build on top of DHTs, benefits greatly from their simplicity while being capable of using efficient indexation techniques to publish, store, and especially query the available information.

The choice for using KadoP was also enforced by the fact that it would be more efficient to publish some part of metadata in the form of intensional documents as the available metadata on all packages is in the order of tens of megabytes of text. As a result, the requirements for our system, of it being capable of asking fine-granularity queries and to take into account the structure and the semantics, may be no longer sufficient, we also need a system prepared to exploit the intensional or extensional character of metadata.

See also Active Xml
Version 1.16 last modified by StephaneLauriere on 30/03/2007 at 10:49

Comments 0

No comments for this document

Attachments 0

No attachments for this document

Creator: StephaneLauriere on 2005/05/12 23:13
Copyright EDOS Consortium
1.1.1