General Architecture for WP4

This document will present the general architecture for EDOS WP4, concentrating on the distribution problem: how to disseminate many and large files over an untrusted infrastructure to a large number of users, while keeping them updated on new versions and also able to retrieve and keep, if so they wish, old versions.

This document is structured as follows: section 1 presents the requirements for the system; section 2 lists our objectives and considerations for the architecture; section 3 outlines the higher-level design; section 4 dives into some of the implementation details and applied technologies; section 5 lists some of the open issues which will be dealt with in future research.

Section 1 - Requirements

Keeping users updated

A very important aspect of the open source community is being on "the bleeding edge" - a large number of users want to get the latest and greatest software onto their desktops. Users who opt instead for a stable system and install new software only rarely (e.g. system administrators) still should get frequent security updates and/or bug-fixes to improve their systems. Thus, the proposed system should have updates easily available for the users. Users will subscribe to abstract "channels" and will get notification of updates according to their interests. After receiving an update notification, the user might choose either to download the update or not to.

Efficient dissemination in "flash crowd" situations

A "flash crowd" situation might occur when a popular new download is made available for the public and many users attempt to download it at once. "Flash crowd" situations tend to stretch the server's capabilities to their limits, and in order to handle it many providers organize their servers in sophisticated clusters. These solutions are expensive and do not utilise the resources of the downloading peers. Our architecture will use the community's resources for handling "flash crowd" situations efficiently.

Efficient dissemination during off-peak times

While this requirement is related to the previous one, it is not trivial to achieve both. For example, BitTorrent works very well when many peers download a file simultaneously, but download rates slow considerably when only few peers are involved. We want our proposed system to be efficient also when the "flash crowd" has dispersed, e.g. a few days - or any arbitrary time period - after the file's initial release.

Persistent storage of old versions and distributions

Many existing systems are geared towards efficient distribution of the newest versions of files. However, when configuring a distribution, one might make use of older versions of packages and modules. Such older versions should not be difficult to obtain in the system. Even if the file doesn't exist anymore on any other peer's machine, the system should have means of locating it in archival storage. Files which are archived shall be organized both according to their versions and in relation to their dependencies, so that it will be possible to download a set of old versions which are guaranteed (as possible) to interoperate.

Use cases

A list of use cases and processes which the system has to support can be read here.

Section 2 - Objectives

P2P/Decentralization

Our main objective is to design an architecture which, as much as possible, does not depend on specific servers to provide needed services for the community. Our design prohibits unique nodes from becoming single points of failure. A peer in the community will get its services from "the network" and not from a server. This objective presents some challenges, since decentralization is inherently more complex than centralization of services. Nonetheless, decentralization is a major theme in our architecture and is also related to the open source community "spirit".

Scalability

This objective is related to the decentralization requirement, but also stands in its own right. Our employed techniques and algorithms need to be scalable up to millions of users. As as guideline, the proposed system should be able to serve all open source users worldwide. Addition of new users should be handled by the system in such a way so that their resources would be added to the system's available resources pool and help share the increasing load.

Resilience

Also related to the previous two objectives, the system should be able to bypass failure of nodes, either singular failures or multiple, e.g. due to network partitions. A high churn rate of nodes leaving and entering the system should also be handled gracefully by the system.

Heterogeneity

The architecture will assume heterogeneity of the peers, in regard to their resources and will to contribute. Peers in the community will have different bandwidth capabilities (incoming and outgoing), storage capacities, system configuration and usage patterns. An approach which recognizes the idiosyncrasies might, for example, use existing (and future) mirror servers so as to utilise their resources.

Security

The proposed system should be resilient to attacks on it, either by mistake or by malicious users (who can either be a part of the community or not).

Interoperability

Naturally, the solution architecture for WP4 will not stand on its own, seperate from the other EDOS work package solutions. Hence, the proposed system will not make assumptions about other stages of the process which will make it difficult to integrate with other solutions. Moreover, the proposed architecture might indeed serve as a substrate for easier implementation of additional solutions to the EDOS challenges.

Section 3 - Architecture Outline

A distribution, or any version update - is a list of packages (and possibly configuration changes) that have to be installed on the user's computer. Thus, the first part of staying up-to-date is getting the list of packages to be installed (via a notification). The user may have some of these packages updated already, or may not be interested in these packages altogether (e.g. update for KDE but he has only GNOME). So, the second part calls for the download of the packages that are truely needed by the user to his machine.

Inorder to provide the user with the services described in the previous sections, we suggest three phases:

  • Notification of changes and/or new versions
    • Purpose - "Tell me what's new or changed"
    • Basic concept
      • The user states his general interests (e.g. his distribution, whether he's a conservative user or a bleeding edge diehard etc.), possibly by registeration to one or more topics (channels)
      • The user receives the list of changes or a final snapshot via a (high level) push action
    • Implemented by a global event notification system (e.g. a topic-based publish/subscribe system)
  • Download (Seeding)
    • Purpose - "Get what I want and don't have" (possibly "forced" to participate in more than what is wanted)
    • Goal - create a large number of sources ("seeds"), among interested peers, for a new package or group of packages, in a fast and efficient way
    • Basic concept - peers are organized into one or more efficient trees (or meshes) and participate in the dissemination of packages in them
    • "Hope" - application-level multicast system is superior to the ad-hoc dissemination of files using traditional P2P file sharing application
    • Implemented using two subsystems
      • Peer clustering - most of the packages that will be disseminated in a cluster will be of shared interest between peers
      • Multicast dissemination - efficient distribution of packages in a specific cluster
  • Completion
    • Purpose - "Download any specific package, at any time"
    • Goals
      • Enable users that participated in the download phase, but failed to get some blocks/packages to get closure
      • Allow a user that "missed" the download phase to download packages from fellow peers (the assumption is enough sources were created using the download/seeding phase)
      • Allow users to get previous versions of packages & distributions
    • Implemented using two subsystems (ideally these would be one system)
      • File sharing - using the fact that enough sources are available after phase 2
      • Persistent storage - mostly to support historic versions and a "safe" storage for all packages
The sub-systems that comprise the three phases are as follows:

1. Event notification

Packages and distributions are constantly updated. Users should be able to keep being updated of new packages and versions, according to their interests and in specified intervals. A model which appears related to our requirements is the RSS (Really Simple Syndication) model in which users get the latest news updates on their desktop from multiple publishers. A major disadvantage of today's RSS approach is centralization, which is made problematic by constant polling of the server by users. One might alleviate the problem with a decentralization approach such as the one found in the Feed Tree article.

2. Peer clustering

When building a P2P network in which users contribute resources to the system, it is preferrable that the users share similar interests in order to improve the chance of them cooperating. Grouping the users into clusters, based on their similar interests, will allow us to build more efficient systems for multicast and file sharing. There's a trade-off to solve when grouping the users into clusters: on the one hand, if clusters are too fine-grained (e.g. a cluster per each package) then cluster management will be a nightmare for the users; on the other hand, if clusters are too coarse-grained (e.g. one cluster covering all peers) then we lose the common-interests incentives and users might have to cooperate on too much processes which don't interest them.

3. Multicast dissemination

The multicast system's purpose is to provide a large number of peers with a file (or a group of files) fast and efficiently, while keeping the load on the originating server low. A multicast starts at a specified pre-agreed time, in which peers who are interested in getting the update are organized in an overlay network (usually either a mesh or a tree). Peers utilise their outgoing bandwidth to help other peers with completing the file's download quickly. The main challenges in building the multicast system are: building the overlay in a decentralized way; having peers forward only content that they're interested in themselves; resilience to high-churn rates and peer failures; and having peers get the file fast. Some of those challenges are conflicting and create trade-offs which will have to be decided.

4. File sharing

After completing the multicast stage, many peers should have the file. This will enable peers who were not connected to the network at the time of the multicast (or chose not to participate for some reason) to download the file from their peers without contacting the originating server. A latecoming peer should be able to get (in a decentralized way) a set of other peers who have the file (or parts thereof) on their machines and are willing to share it, and then contact those peers to initiate the transfer. When a peer requires several related files (e.g. when an update consists of several inter-dependent packages), one might use the clusters for finding, with high probability, peers with several of those files.

5. Persistent storage

One of the requirements noted earlier is that users should be able to seek and find older versions of pacakges and distributions, even when no other peer uses them. For this, we will employ a persistent storage system. Such a system will keep, in a ubiquitous manner, old files, along with records of versions, distributions and dependencies. A user interested in an old version will consult the persistent storage system in order to find related files (according to dependency) and then download the files. Many persistent storage systems use the participating peers' spare storage capacities. However, in our case, due to both dynamicity of peer connections and supposed rare occurence of seeking very old files, it might suffice to store old file parts in dedicated mirror servers.

Inter-Subsystem Data Flow

The next diagram depicts the data flow between the different subsystems, and the boundaries of the different phases. The mainstream data flow is as follows:

  • The user receives notification of changes in packages/distributions that are of interest to him
  • The local update client determines the list of packages that need to be downloaded
  • After a sample of the users' lists have been analayzed, clusters of packages are formed to exploit the similarity of interests between users
  • The clustering results are published, each user joins the clusters that best fit his packages list vector and participates in the multicast dissemination of packages in these clusters
  • Every time a download of a package is complete, the information of "what user has which package" is used to feed the file sharing and the persistent storage subsystems
WP4_Gen_Arch_Imp.png

(an older version of this section can be found here).

Section 4 - Implementation Details

The following links provide more details about the different sub-systems:

  1. Event Notification
  2. Peer Clustering
  3. Multicast Dissemination
  4. File Sharing
  5. Persistent Storage

Section 5 - Open Issues

Architecture Open Issues

An older version of this page can be found here.

{metadata}

Topics Wp4

{metadata}

Version 1.11 last modified by StephaneLauriere on 08/11/2005 at 22:29

Comments 0

No comments for this document

Attachments 2

BIN
TAUGeneralArchitecture-Pa~.sxi 1.1
PostedBy: StephaneLauriere on 27/06/2005 (173kb )
Image
WP4_Gen_Arch_Imp.png 1.1
PostedBy: Yotam on 22/05/2005 (23kb )

Creator: Yotam on 2005/05/22 19:32
Copyright EDOS Consortium
1.1.1