WP4 Deliverable 4.1: Distribution of code and binaries over the Internet
- Annexes
- Annex 4.1: State of the Art
- Annex 4.2: Distribution system architecture
- Annex 4.3: Distribution API
- Annex 4.4: Metadata management
- Annex 4.5: PMI - see Joint Task on API
- Annex 4.6: Metrics - see Wp5 D5.1
Summary
1.1 Introduction
The first version of the deliverable 4.1 was delivered on due date, at T0+6, when the work of the EDOS project had hardly started. The reviewers asked to have a general presentation of the work of WP4 at T0+12 when the group was already operational. This second version answers that precise demand. The partners of EDOS realized during the period that two particular aspects that were first considered in the realm of WP4 go beyond this work package.- The members of WP4 realized that the metrics of WP4 cannot be considered in isolation of other evaluation metrics. Some preliminary metrics were presented in the first version of Deliverable 4.1. The work has been pursued in cooperation with WP5. The results of this joint work are now presented in Deliverable D5.1.
- The members of WP4 articulated the notion of a general information system for the entire process of producing and distributing open source software. This resulted in a joint development within Edos that lead to the Project Management Interface (PMI) that is now presented independently of WP4. We present here the distribution API whose design was done with in mind an integration into the PMI.
1.2 Main contributions
1.2.1 State of the art
The study of the state of the art has been going on and continues. Some of the classes of system we have considered in particular:- Linux distributions such as Red Hat and Mandriva Linux
- Other classical software distribution such as Apache
- Content delivery networks such as Akamai and Coral
- File sharing systems such as BitTorrent or Kazaa
- Distributed Hashtable (DHT) systems and unstructured P2P systems
- Distributed file systems such as OceanStore
- Muticast and publish/subscribe systems
- Database transactions, focusing on integration to P2P architectures
- Security in package delivery of opensource *nix distributions
1.2.2 Preliminary system architecture
We have worked on the architecture of the global system. It is based on the idea of exchanging XML data in a P2P environment. We have specified a description of the content metadata description in XML. We are using the system KadoP that is being currently developed at INRIA, for P2P management of metadata. A number of extensions of KadoP such as the management of content persistence and replication have to be considered for EDOS. For EDOS, the most critical component of KadoP is a DHT system that connects the network of peers. Originally, KadoP used Pastry, one of the most popular one. We encountered serious problems of scaling and robustness with Pastry. We replaced the file system storage of index in Pastry by BerkeleyDB and the fix was not sufficient. We are now testing alternatives such as Jxta.Development of an XML P2P system, KadoP
The KadoP system facilitates the publication and discovery of content in P2P environment. It is built around a DHT (using a standard DHT API; so it is not tied to a particular DHT). It allows to publish and search for XML data, Web services and knowledge (based on a simple ontology). The choice of KadoP was motivated by (i) its support of XML that offers a standard way of describing the metadata of the application; and (ii) its management of intentional information since it is built on top of Active XML (AXML). AXML is a declarative language for distributed information management and an infrastructure to support the language in a peer-to-peer environment. AXML documents include calls to Web services. Our thesis is that the exchange of implicit information is useful both for flexibility (more functionalities) and better performance. For instance, in Mandriva application, some information about packages may be given extensionally and other intentionally. In the manner, we do not have to copy all the information that is available and tune the data that is exchanged to the particular needs of an actor.Development of an Information Dissemination Platform (IDiP)
The goal is the dissemination of content to users with heterogeneous requests. There are many existing systems for data dissemination. They mostly focus on efficient dissemination of individual files. The particularity here is that we have to distribute a large quantity of objects and many are small. We want to exploit dependencies between information objects and similarities between user requests. Two sub-systems have been developed:- User clustering
- Heterogeneous Multicast
1.2.3 Distribution API
During the first year of the project, the analysis of the applications convinced us that a more global approach to information than that presented in the proposal was necessary. We thus worked in parallel on: (i) an API purely dedicated to the definition and distribution of software and (ii) a Project Management Interface (PMI) for the general management of information in a F/OSS project. Key aspects in the general access to information:- Content and community
- Data and metadata (the specification of the metadata API is a joint task with WP2 team)
- Different forms of content: code, documentation, tests, etc.
- No user requires simultaneous access to all information
- Management of dependencies
- End-users search for content by properties, licenses, etc.
- Pull (specific request) and push (subscription)
- Small content (one patch) and large (new release)
- Content traceability (Who, when, where, how and why)
- Efficiency, security and consistency (transactions)
- Subscriptions (based on a channel notion ala Red Hat). Clients may subscribe to channels and are notified when the subscribed events occur in the channel (e.g. new packages in a collection of interest)
- Transactions: the download of a file (or a set of files) can be seen as a (nested, long) transaction where different pieces are downloaded within subtransactions. We want to guarantee relaxed ACID properties. The API then has methods such as StartTransaction(); AbortTransaction(); RollbackTransaction().
- Security is even more of an issue because of the P2P environment. In the Mandriva application, the problem is simpler because of the single author. We intend to consider also scenarios with more than one publishers.
1.2.4 Dissemination
Publications on AXML
- Diagnosis of Asynchronous Discrete event systems. Datalog to the rescue! Serge Abiteboul, Z. Abrams, S. Haar, T. Milo, ACM Conference on Principle of Database Systems, 2005
- Regular Rewriting of Active XML and Unambiguity, Serge Abiteboul, Tova Milo, Omar Benjelloun, ACM Conference on Principles of Database Systems, 2005
Publication on distributed query optimization
- A Framework for Efficient Distributed XML Data Management, Ioana Manolescu, Serge Abiteboul, Emanuel Taropa, International Conference on Extending Database Technology, 2006
Demonstration of KadoP presented
- Constructing and Querying Peer-to-Peer Warehouses of XML Resources, Serge Abiteboul, Ioana Manolescu, Nicoleta Preda, International Conference on Data Engineering, 2005 (refereed demo)
- Presentations to local audiences: INRIA-Orsay, Nantes, Tel Aviv.
1.3 Deviation from original plans
Deviations that we view as positive are our larger views of the API and of the metrics that lead to joint works with other packages. Another one was the very limited content of the first deliverable for WP4. After getting familiar with the EDOS environment and its requirements and after studying the P2P approach intensively, we came to the conclusion that a classical approach of a fully-fledged distributed database system does not fit the needs. The various actors require a certain degree of autonomy and the network consists of many transient nodes. A distributed database, however, brings along a high degree of control and requires tight collaboration between actors, which is not desired in the present environment. Instead of providing a separate distributed database architecture for WP4, we decided rather to integrate well-proven (distributed) database services/techniques into the P2P architecture and to adopt an integrated architecture. This architecture however inherits a lot of elements of distributed databases such as distributed persistent storage over different nodes (or peers), distributed querying, management of structured data (e.g. metadata) etc. Additionally, we came to the conclusion that the proposed P2P architecture would benefit a lot if we integrate transaction support for the file/package upload and download so that we can guarantee a certain degree of control and consistency for the file upload and download, i.e. relaxed ACID properties. The plan was modified accordingly by replacing the distributed database by an "add-on for supporting transactions" in the P2P system.1.4 Updated list of deliverables and milestones
Deliverables
- D4.1.v2 Analysis of the problem (state of the art, specification of the evaluation metrics) and preliminary specification of the system (T0+12).
- D4.2.1 Prototype V1 : A P2P architecture for data distribution (T0+24).
- D4.2.2 Report including (i) the dissemination/broadcasting of the information, (ii) access to metadata and data, (iii) specification of the transactional library, (iv) and of the security library.
- D4.3.1 Prototype V2 : A P2P architecture for data distribution ; the effort bears on integration (transaction and security), robustness, performance and testing (T0+30).
- D4.3.2 Prototype libraries for transaction processing and security (T0+30).
- D4.4.1 Report : A general presentation of the work (T0+30)
- D4.4.2 Report : An analysis of the evaluation of the P2P architecture for data distribution (from data gathered by WP5). A comparison to the actual architecture. Performance analysis in different scenarios, e.g. with or without transaction processing (T0+30)
Milestones
- WP4-m12 Problem analysis and preliminary specifications
- WP4-m24 First prototype of the dissemination platform
- WP4-m30 Second prototype of the dissemination platform, with transaction and security features
- WP4-m30 Analysis of the platform in real context
Version 1.92 last modified by StephaneLauriere on 12/09/2006 at 11:12
Document data
Attachments:
No attachments for this document
Comments: 0