Persistent Storage and File Hierarchy - Directions
Background
- Problem Description
- A full distribution may take up to 100GB of disk space;
- Limited number of mirrors, not necessarily reliable;
- Repositories on servers are non scalable and form a single point of failure;
- Only few of the latest versions are kept ? usually the current version and the previous one.
- Persistent Storage over P2P
- The idea: use the peers' spare storage capacities to store packages, source, information, etc.
- Use an overlay network substrate for file location and routing.
- A framework should be created for assuring resilience, security and efficiency.
- Persistent Storage Requirements
- We need a system to allow peers to know what versions exist, what files and how to get them.
- We want to allow individual packagers to post files on the network and make them available to other users.
- The system should support querying the contents of versions and comparing to local installation.
- The user should not be flooded with information but be able to look at the information through different views (?channels? abstraction).
- Similar Model ? Red Hat Network
- Channels are used to create staged environments.
- Available types of channels: base channel, development channel, testing & QA channel, production channel (beta versions).
- We need such a flexible channel model.
Suggested architecture
- Filesystem Hierarchy
- We will assume a filesystem which is available for all users (with proper permissions).
- Packages may be conveniently grouped into hierarchies, according to modules, architectures, versions, distributions, providers, etc.
- The system should support arbitrary hierarchies.
- A package will be stored (logically) only once, but can be pointed to, an unlimited number of times.
- Through the filesystem, the user will be able to get any version required and compare between different versions.
- The top layer in the hierarchy will be the ?channel?.
- A channel will always point to the latest version.
- Each version will have a back pointer to the previous one.
- All channels will be accessible from a Master Channel.
- Properties
- Redirections are implemented as pointers in files. The files are kept in the persistent storage like any other package file.
- All files are read only. An update creates a new file and respective pointers.
- The channel is a pointer that always points to the last version.
- For convenience, it's possible to add to each channel a log file of the changes between different versions.
OceanStore as an example P.S. system
- OceanStore properties
- By design, files stored in OceanStore are "read only" and changes result in new versions. New versions of packages will be available but so will be the old ones.
- The more a file is read in OceanStore, the more available it becomes (replicated), while unread files can lose replicas. This way, the more popular a file is - the easier it will be to locate a copy.
- OceanStore is decentralized and uses Tapestry as the P2P overlay network. Other possibilities for the overlay network exist, such as Chord.
- OceanStore utilizes localization through Tapestry, bringing files closer to where many clients requested them.
- Byzantine agreement methods are employed and guarantee that all non-faulty replicas agree (as long as no more than about one third of the replicas are faulty).
- "Downgrading" OceanStore
- OceanStore relies, for many operations on a file, on an "inner ring" ? a set of servers responsible for the specific file in question.
- It's not certain that we can use the same model when no "servers" exist ? only unreliable peers.
- We need to achieve the same operations, perhaps with a different architecture:
- Getting a pointer to the latest version of a file.
- Storing persistently a file's "Primary replica" (the replica which is assured to be persistent).
Open issues in persistent storage
- Connecting the persistent storage system to the other sub-systems:
- The multicast phase sends a file to the peers but doesn't take into account issues of persistent storage.
- File sharing: the user should be able to get the ?pointer? to a package through the persistent storage channel scheme and then download it from peers.
- Implementing permissions (and general security)
- For writing (and reading?) into channels
- For creating and deleting channels
- Downgrading OceanStore (or a similar system)
- Getting the latest version
- "Primary replica" (perhaps using today's mirrors?)
- Deleting obsolete versions
- Incentives for peers to share files
- Should we use coding (e.g. rateless codes) or simple replication?
State of the Art
- OceanStore over Tapestry
- PAST over Pastry
- CFS over Chord
Version 1.4 last modified by Yotam on 22/05/2005 at 20:03
Comments: 0