Persistent Storage and File Hierarchy - Directions

Background

  • Problem Description
    • A full distribution may take up to 100GB of disk space;
    • Limited number of mirrors, not necessarily reliable;
    • Repositories on servers are non scalable and form a single point of failure;
    • Only few of the latest versions are kept ? usually the current version and the previous one.
  • Persistent Storage over P2P
    • The idea: use the peers' spare storage capacities to store packages, source, information, etc.
    • Use an overlay network substrate for file location and routing.
    • A framework should be created for assuring resilience, security and efficiency.
  • Persistent Storage Requirements
    • We need a system to allow peers to know what versions exist, what files and how to get them.
    • We want to allow individual packagers to post files on the network and make them available to other users.
    • The system should support querying the contents of versions and comparing to local installation.
    • The user should not be flooded with information but be able to look at the information through different views (?channels? abstraction).
  • Similar Model ? Red Hat Network
RHNDiag2.png
    • Channels are used to create staged environments.
    • Available types of channels: base channel, development channel, testing & QA channel, production channel (beta versions).
    • We need such a flexible channel model.

Suggested architecture

  • Filesystem Hierarchy
    • We will assume a filesystem which is available for all users (with proper permissions).
    • Packages may be conveniently grouped into hierarchies, according to modules, architectures, versions, distributions, providers, etc.
    • The system should support arbitrary hierarchies.
    • A package will be stored (logically) only once, but can be pointed to, an unlimited number of times.
    • Through the filesystem, the user will be able to get any version required and compare between different versions.
    • The top layer in the hierarchy will be the ?channel?.
    • A channel will always point to the latest version.
    • Each version will have a back pointer to the previous one.
    • All channels will be accessible from a Master Channel.
Channels2.png
  • Properties
    • Redirections are implemented as pointers in files. The files are kept in the persistent storage like any other package file.
    • All files are read only. An update creates a new file and respective pointers.
    • The channel is a pointer that always points to the last version.
    • For convenience, it's possible to add to each channel a log file of the changes between different versions.

OceanStore as an example P.S. system

  • OceanStore properties
    • By design, files stored in OceanStore are "read only" and changes result in new versions. New versions of packages will be available but so will be the old ones.
    • The more a file is read in OceanStore, the more available it becomes (replicated), while unread files can lose replicas. This way, the more popular a file is - the easier it will be to locate a copy.
    • OceanStore is decentralized and uses Tapestry as the P2P overlay network. Other possibilities for the overlay network exist, such as Chord.
    • OceanStore utilizes localization through Tapestry, bringing files closer to where many clients requested them.
    • Byzantine agreement methods are employed and guarantee that all non-faulty replicas agree (as long as no more than about one third of the replicas are faulty).
  • "Downgrading" OceanStore
    • OceanStore relies, for many operations on a file, on an "inner ring" ? a set of servers responsible for the specific file in question.
    • It's not certain that we can use the same model when no "servers" exist ? only unreliable peers.
    • We need to achieve the same operations, perhaps with a different architecture:
      • Getting a pointer to the latest version of a file.
      • Storing persistently a file's "Primary replica" (the replica which is assured to be persistent).

Open issues in persistent storage

  • Connecting the persistent storage system to the other sub-systems:
    • The multicast phase sends a file to the peers but doesn't take into account issues of persistent storage.
    • File sharing: the user should be able to get the ?pointer? to a package through the persistent storage channel scheme and then download it from peers.
  • Implementing permissions (and general security)
    • For writing (and reading?) into channels
    • For creating and deleting channels
  • Downgrading OceanStore (or a similar system)
    • Getting the latest version
    • "Primary replica" (perhaps using today's mirrors?)
  • Deleting obsolete versions
  • Incentives for peers to share files
  • Should we use coding (e.g. rateless codes) or simple replication?

State of the Art

Version 1.4 last modified by Yotam on 22/05/2005 at 20:03

Comments 0

No comments for this document

Attachments 2

Image
RHNDiag2.png 1.1
PostedBy: Yotam on 22/05/2005 (40kb )
Image
Channels2.png 1.1
PostedBy: Yotam on 22/05/2005 (16kb )

Creator: Yotam on 2005/05/22 19:50
Copyright EDOS Consortium
1.1.1