Distribution API

This chapter presents the API proposed for the EDOS distribution system. The Distribution API focuses on the efficient distribution of content units over the network, in a P2P system architecture. The API description uses an object-oriented model, slightly different, but compatible (as shown below) with the general Project Management Interface (PMI) model, based on content units and attributes.

4.1 The Data Model

We considered the process of code and binaries distribution for Mandriva Cooker distribution. The data unit that we refer here is the Package, in the meaning of the .RPM file.

The Cooker distribution consists in a set of files, namely the .RPM packages. Packages are organized in Collections. In our model, collections correspond to the directory structure of the code distribution, in which a collection corresponds to a directory. The example of Cooker distribution below shows such a directory structure. Collections may contain sub-collections (sub-directories) and/or packages (files).

Still, there are files that are part of the distribution, which are not .RPM packages. These are the files used in the installation process and we shall call them Utility files. Unlike packages, Utility files do not have their own version number, they inherit the version number of the collection they belong to.

Note: There is an important difference in versioning packages, on one side, and collections and utilities, on the other side. Any change in a package contents leads to a new package version, while changes to collections or utilities do not necessarily change the collection version (and implicitly the utility version). At some point in time, when the collection is considered as "stable", it may be published as a new version.

In brief, we distinguish three distinct types of content units:

  • Packages, representing the RPM files
  • Utilities, representing files used in the installation process
  • Collections, representing grouping structures that gather several objects (content units): packages, utilities or collections. Collections allow organizing the content units in hierarchical (tree) structures.
The main difference between Packages and Utilities in distribution comes from the versioning politics. While for Packages any change in a package contents leads to a new package version, changes to Utilities (or Collections) do not necessarily change the version. For Collections, at some point in time, when the collection is considered as being "stable", it may be published as a new version. For Utilities, we do not consider they have their own, independent version number, but rather they inherit the version number of the collection they belong to.

4.1.1 Cooker example

The following example illustrates the data model elements in the context of the Mandriva Cooker distribution.

The figure represents the hierarchical organization of the Cooker distribution, where leaves represent packages or utilities and internal nodes represent collections. Square boxes are used for collection objects and round boxes represent packages or utilities.

As we can see, there are up to four level of recurrence for the collections of objects. At the upper level, we have "cooker" collection seen as a set of other collection objects, representing Mandriva Linux distribution for different architectures: intel586, sparc, amd, etc. Each architecture has mainly the same content structure, detailed here for "i586" architecture.

Besides the package collections, there are two collections of utility objects: "install" and "media_info". The package objects are grouped in three different collections: the base content of the distribution in "main", the complementary packages provided by the contributors in "contrib", and Java related packages in "jpackage".

4.1.2 Data model definition

We firstly define a general data model that will be used afterwards to define the distribution API. We consider the data shared in the distribution network as Objects classified into three categories: Packages, Utilities, and Collections, as we define in the following:

  • Object : Package | Utility | Collection

  • ObjectID
    • Name
    • Version_Number

  • Package
    • ID: ObjectID
    • Metadata
    • Location
    • Value: stream

  • Utility
    • ID: ObjectID
    • Metadata
    • Location
    • Value: stream

  • Collection
    • ID: ObjectID
    • Metadata
    • Value: ObjectID[]
Objects represent any content unit distributed in the system (Package, Utility or Collection). An object is identified by an ObjectID, consisting of a name and a version number. The Name must include the path of the object in the distribution tree, because for instance, the same package name and version may occur in different collections (e.g. package perl version 5.8.7 may exist for different system architectures: i586, sparc, ...).

The ObjectID is a logical identifier for the Object - several replicas of the same object may exist in the P2P system. The Location attribute allows physically locating the object instance (replica) in the network (it may be seen as an URL).

Value refers to the content of the Object: the file for Packages and Utilities, and the composition list (as a set of ObjectIDs) for Collections. Actually, Collection composition definition allows any graph structure for collections. We restrict Collections to hierarchical structures (trees or DAGs) and for the moment we focus on tree collection structures, in which an Object is not shared between different collections.

Collections group together several objects, that may be Packages, Utilities or other Collections. As mentioned above, collections correspond to the directory structure of the code distribution. The example of Cooker distribution above shows such a directory structure. A collection corresponds to a directory: the files (packages, utilities) or sub-directories contained in the directory represent the collection's elements (Packages, Utilities, sub-Collections).

In practice, Collections' version numbers come from the Cooker distribution version number and are inherited from collections to sub-collections. Also, Mandriva collections are homogeneous, i.e. either collections of packages, or collections of utilities, or collections of collections. The data model presented here is a little bit more general.

To give an intuition on these properties we take an example of objects from Mandriva Cooker distribution:

  • File "perl-5.8.7-1mdk.rpm" in directory /cooker/i586/media/main corresponds to a Package object with Name="/cooker/i586/media/main/perl" and Version_Number="5.8.7-1mdk" (where "5.8.7" represents the version kept by the authors of the software and "1mdk" represents the versioning system added by Mandriva)
  • "/cooker/i586/media/main" corresponds to a Collection object representing the set of "main" packages in for i586 architectures in Cooker distribution. If the Cooker distribution version is beta1, then Version_Number="beta1".
  • "/cooker/i586/install/README.txt" corresponds to a Utility object identified by its filename and by the version number of the collection it belongs to (i.e. "/cooker/i586/install").
Actually, an object name (e.g. "/cooker/i586/media/main/perl") could be split into a collection name (the path: "/cooker/i586/media/main") and a local name ("perl"). For the moment, the model description uses a single Name attribute.

Furthermore, we use a Metadata structure in order to keep object's additional information needed in the API definition. The list of content stored in Metadata is not exhaustive for the moment and it is the subject of further completion based on particular functionality needs. Though, a minimal set of content will consist in specifying the object's Type (Package, Utility or Collection), the Date when the object was created, the Signature and the Size, especially useful in the case of large files (see more details in the Physical Level).

Metadata represents all the Object properties necessary to characterize it in the process of distribution and retrieval. Metadata properties correspond to content unit attributes in the Project Management Interface (PMI), but contain only those attributes that do not change for a given ObjectID. For instance, Location or Collection composition are not included in Metadata.

The list of Metadata properties is:

  • Metadata
    • Type: application, documentation, source, binary, utility
    • Date: object's creation time
    • Signature: SH1 checksum
    • Size: Number_of_bytes
    • Dependencies
    • License
    • ...
Also, each instance of package and utility has a Location, important for indexing and retrieving objects in the distribution system. The location contains the peer address and the local address on the peer. Unlike for packages and utilities, we do not define location for collections. When downloading a collection, we avoid getting its components (packages and utilities) from the same location (peer). The goal is to optimize the use of resources by downloading components in parallel from different peers. In this case, locating collections is useless, we only need to locate packages and utilities.

We consider the Value of a Package or an Utility file as the content of the file: a stream of bytes. Contrarily, a Collection is an abstract object, containing the list of objects' IDs as its Value.

4.1.3 Correspondence with the Project Management Interface (PMI)

In this document we focus on the Code Distribution process for a F/OSS project and we consider the Mandriva Cooker distribution as an instance of such a project.

Using Cooker distribution as a practical base model, we employed a "bottom-up" approach to abstract a general data model to be used in any other Code Distribution process.

To be more general, we must plug the distribution process into the whole F/OSS project development framework. Therefore, we have considered Geneva's proposal "Towards an EDOS API: Modeling the F/OSS Process" and we defined the following mappings between the elements of the two data models.

General APIDistribution APIDescription
ContentUnitPackage or UtilityThe Data Unit
BundleCollectionA group of Data Units
ProjectDistributionThe whole set of elements used in the API

Here follows a quick example of how the distribution API can be expressed using the Project Management Interface (PMI).

Consider the Mandriva Cooker distribution example above. In the context of the PMI, the above illustration becomes the following one.

In this figure, square boxes represent units as defined by the PMI. Dotted arrows list members of a collection unit. Each unit defines a set of types of units it contains. These types usually have to be defined in the context of the project.

In the case of a Linux distribution, we define (for instance) that available types are: source, binary (opposite to source), documentation, application (opposite to documentation) and utility. Each unit may be of any of these types; the only restriction being that a collection can only contain members of a type of the collection. The type attribute can then be used to define how the unit has to be handled.

All the units possess all the other attributes defined by the PMI. This covers dependencies, localization, functionalities, license, etc. Nevertheless these are not illustrated on the figure.

The localization attribute acts as a UID for the unit itself as well as a link to a set of physical locations where the content can be found. This link can be updated over time in order to provide up-to-date localizations and can be of any type (such as a URL to a torrent, ftp site...)

In the concrete case of a Linux distribution, unlike utilities and packages that have content, a collection is a set of meta-data with no content. This meta-data is represented by the set of attributes associated to the unit.

In the Distribution API model presented above, attributes are separated in several categories:

  • Identifiers, corresponding to ObjectID, include logical localization (name) and versioning attributes
  • Constant properties, corresponding to Metadata, include all the attributes that do not change for a given ObjectID
  • Variable properties, corresponding to Location and Value, that may be different in time or on different peers for the same ObjectID

4.2 Architectural components, roles and API levels

4.2.1 Actors and roles 

The P2P distribution system is composed of 3 types of actors:

  • the Publisher represents the reference server of the distribution editor. It is the seeder of the data in the system. The insertion of data in the system is done either in Push or in Pull mode.
  • the Replicators (Mirrors), used as replication peers, get data from the Publisher or from other Replicators (Mirrors) and offer this data for distribution.
  • the Simple Peers, seen as the users' machines participating in data sharing.
Api.DistributionApi/DistributionAPI-Architecture2.jpg

For each one of this three categories of components, we associate specific roles in the distribution:

  • Publication, corresponding to the Publisher functions.
  • Replication, for providing replicas of published objects to distribution. Replicators, but also Simple Peers (depending on the implementation) may play this role.
  • Client, corresponding to functions such as searching, subscribing, downloading objects. Simple Peers, Replicators, but also the Publisher may play this role.

4.2.2 Logical and Physical Levels

The API makes the distinction between the Logical level and the Physical level. At the Logical level, we describe the main API methods, corresponding to the software distribution functionalities: publishing, replicating, downloading, subscribing, etc. Normally, a distribution application only needs this API level to use the system.

At the Physical level we consider lower-level methods, necessary to implement the logical level functions. Physical level methods provide finer grain access to EDOS distribution functionalities, in order to implement different strategies than those provided by the Logical level.

The set of methods provided by the Physical level are much more implementation dependent. These methods include feeding/querying the distributed index with various data elements, communicating between peers, downloading data, etc. Also they are concerned with optimization issues, such as the split of large packages in sets of pieces. In practice, there are some large size packages in Cooker distribution, e.g. with more than 100MB in size. Therefore, the Size property is used to determine the number of pieces resulting from splitting a Package.

The distribution API below is defined at the Logical level, organized by roles and is composed of the basic methods necessary to realize the system's functionalities. Examples of some Physical level sub-functions are also provided.

4.3 Basic functionalities for each role

Note: The API methods are described using an explicit list of necessary parameters, e.g. void publishPackage(Name, Version_Number, Metadata, Location). The alternative would be to use the whole distribution object as a parameter, i.e. void publishPackage(Package) in the previous example. Both are possible, because the package in the second case contains all the elements in the list of the first case.

4.3.1 Publisher

The role of the Publisher in this architecture is the Publication of the data in the distribution system. Publication means that the identity and location of objects (packages, utilities, collections) are made public in the distribution system and indexed using an indexing technique.

The list of methods associated to this role is the following one:

  • void publishPackage(PackageID, Metadata, Location)
  • void publishUtility(UtilityID, Metadata, Location)
  • void publishCollection(CollectionID, Metadata, Value)
  • void unpublishPackage(PackageID)
  • void unpublishUtility(UtilityID)
  • void unpublishCollection(CollectionID)
Let's take a closer look now on each one of these functions:

a) void publishPackage(PackageID, Metadata, Location)

Parameters:

  • PackageID = Name + Version_Number
  • Metadata = the set of package's properties (presented in section 1)
  • Location = complete URI of the published package on the Publisher's machine (IP address, path in the distribution hierarchy).
Action: Publishes a package by storing information about the package in the index. The system stores in the index, for the given package ID, information about the package location on the Publisher's machine and metadata properties. The package is published as an element of the implicit collection (whose name is extracted from the package name and whose version is the last published version). If a package with the same name, but an older version already exists in that collection, it is replaced by the new package.

b) void publishUtility(UtilityID, Metadata, Location)

This method is similar to publishPackage, but here the implicit collection has the same version number as the published utility. Also, if the utility already exist in that collection, it means that the published utility comes with a new content to replace the existing one. This implies that the existing replicas of the utilities must be removed from the index, because they do not correspond anymore to the right content.

c) void publishCollection(CollectionID, Metadata, Value)

Parameters:

  • CollectionID = Name + Version_Number
  • Metadata = the set of collection's properties (presented in section 1)
  • Value = collection composition (list of object IDs)
Action: This method publishes a collection and recursively all the collection's elements. Publishing a collection means publishing its value (the composition list), its metadata and then recursively each element of the collection. Similarly to packages and utilities, the published collection is added to the implicit collection (to be defined precisely).

For collections, the version number does not change as often as for packages. Unlike packages, the same collection may have different compositions in time. Only the Publisher surely has the right (latest) composition.

A collection is published when a new version is available. Between two versions, the collection's content may be modified through the publishing of packages, utilities or sub-collections.

d) void unpublishPackage(PackageID)

Parameters:

  • PackageID = PackageName + PackageVersion_Number
Action: This method un-publishes the given package, by removing all the information about it from the index. It removes the package from its implicit collection parent.

e) void unpublishUtility(UtilityID)

Parameters:

  • UtilityID = UtilityName + UtilityVersion_Number
Action: This method un-publishes the given utility, by removing all the information about it from the index. It removes the utility from its implicit collection parent.

f) void unpublishCollection(CollectionID)

Parameters:

  • CollectionID = CollectionName + CollectionVersion_Number
Action: This method un-publishes the given collection and recursively all its elements, by removing all the information about them from the index. It removes the collection from its implicit collection parent.

Physical level functions for Publisher

a) Methods for packages

  • void publishPackageLocation(PackageID, Location)
  • void publishPackageMetadata(PackageID, Metadata)
These functions are indexing respectively the package's location and metadata.

  • void publishPackagePush(PackageID, Peer[])
If push is allowed, this function allows pushing Values (files) to other peers (given in the set of Peers). We consider that the Publisher's package location and metadata are published by publishPackage before calling publishPackagePush (i.e. they are already indexed when publishPackagePush is called).

b) Methods for utilities

  • void publishUtilityLocation(UtilityID, Location)
  • void publishUtilityMetadata(UtilityID, Metadata)
  • void publishUtilityPush(UtilityID, Peer[])
These functions are similar to the corresponding package functions above.

c) Methods for collections

  • void publishCollectionInfo(CollectionID, Metadata, Value)
This function publishes in the index the collection's metadata and composition list.

  • void publishCollectionPush(CollectionID, Peer[])
This function pushes the collection to the given set of peers.

  • void insertIntoCollection (CollectionID, ObjectID[])
This function changes the composition of a collection by adding new objects to it. It is used when new packages, utilities, collections are published, in order to update the collection to which these objects are added.

  • void deleteInCollection(CollectionID, ObjectID[])
This function changes the composition of a collection by removing elements. It is used when objects are unpublished, in order to update the collection where these objects are removed.

  • void replacePackage(PackageID, NewPackageVersion_Number)
This function replaces the given package with a new version of the same package in the collection. Replacement is done in the implicit collection of the package. This only changes the collection composition.

  • void replaceUtility(UtilityID)
This function replaces the content of the given utility with a new content. The implicit collection composition does not change, but the replicas of the utility are invalidated (removed from the index).

4.3.2 Replicator

The Replication role is played by Replicators(Mirrors), that replicate data published by the Publisher. The methods for this role announce the distribution system that objects are replicated at this new location. Each publishing method is completed with a corresponding un-publishing one.

The methods associated to this role are:

  • void (un)publishReplicatedPackage(PackageID, Metadata, Location)
  • void (un)publishReplicatedUtility(UtilityID, Metadata, Location)
  • void (un)publishReplicatedCollection(CollectionID, Location)
a) void (un)publishReplicatedPackage(PackageID, Metadata, Location)

Parameters:

  • PackageID = Name + Version_Number
  • Metadata = metadata of the package
  • Location = location on the mirror server
Action: The publishReplicatedPackage method registers the fact that the package is replicated in the network at the given location. It also creates and registers in the index a new replica of the package's metadata (to avoid bottleneck when querying metadata). For unpublishReplicatedPackage, the reverse actions are done (package and metadata replicas are un-registered).

b) void (un)publishReplicatedUtility(UtilityID, Metadata, Location)

Parameters:

  • UtilityID = Name + Version_Number
  • Metadata = metadata of the utility
  • Location = location on the mirror server
Action: These methods are similar to (un)publishReplicatedPackage, but work for utilities.

c) void (un)publishReplicatedCollection(CollectionID, Location)

Parameters:

  • CollectionID = Name + Version_Number
  • Location = location on the mirror server
Action: These methods simply apply the (un)publishReplicatedPackage and (un)publishReplicatedUtility methods to all the packages and utilities in the given collection. Information about the collection itself is not registered.

Physical level functions for Replicator

  • void publishReplicatedPackagePush(PackageID, Peer[])
  • void publishReplicatedUtilityPush(UtilityID, Peer[])
  • void publishReplicatedCollectionPush(CollectionID, Peer[])
If push is allowed, these functions will push Values (files) to other peers, similarly to publishPackagePush. For collections, all the packages and utilities in the local collection are pushed.

  • void (un)publishReplicatedPackageLocation(PackageID, Location)
  • void (un)publishReplicatedUtilityLocation(UtilityID, Location)
  • void (un)publishReplicatedCollectionLocation(CollectionID, Location)
(Un)publishes the replica location for the package/utility. For collections, it (un)publishes the replica location for all the packages and utilities in the collection.

4.3.3 Client

The Client role is played by Mirrors and Peers, when getting data from the distribution system.

The methods associated to this role are:

  • Package getPackage(PackageID)
  • Utility getUtility(UtilityID)
  • Collection getCollection(CollectionID)
The Client methods ask for an object (given its ID) and get a copy of that object.

a) Package getPackage(PackageID)

Parameters:

  • PackageID = PackageName + PackageVersion_Number
Action: This method gets a copy of the requested package (its value). It chooses the best location of the package on the network for downloading. Large packages may be cut in several slices, downloaded in parallel from different locations in a BitTorrent-like style.

b) Utility getUtility(UtilityID)

Parameters:

  • UtilityID = UtilityName + UtilityVersion_Number
Action: Similar to getPackage, but for utilities.

c) Collection getCollection(CollectionID)

Parameters:

  • CollectionID = CollectionName + CollectionVersion_Number
Action: Looks for the composition list for that collection from the index and downloads then recursively the components. It identifies the missing packages (and utilities), downloads them in parallel from several sources and builds locally the requested collection.

Physical level functions for Client

  • Location[] locatePackage(PackageID)
  • Location[] locateUtility(UtilityID)
These functions return the set of locations in the P2P distribution system where the package/utility is available.

  • Location[] getBestPackageLocations(PackageID)
This function decides for the given package if it has to be split or not and chooses for each piece the best downloading location.

  • Location getBestUtilityLocation(UtilityID)
This function chooses the best downloading location for the utility.

  • PackagePiece getPackagePiece(PackageID, PieceNo, Location)
This function downloads the piece from the Location.

  • ObjectID[] getCollectionValue(CollectionID)
This function returns the composition list for that collection.

  • PackageID[] getCollectionPackages(CollectionID)
  • UtilityID[] getCollectionUtilities(CollectionID)
These functions return the list of package/utility IDs contained (at any depth) in the collection.

  • PackageID[] computeMissingPackages(PackageID[])
  • UtilityID[]computeMissingUtilities(UtilityID[])
These functions return the list of missing packages/utilities on the Client peer, wrt the needed packages list specified in the parameter.

  • LocationMap getBestLocations(PackageID[], UtilityID[])
This function decides for each package/utility in the parameter lists what is the best downloading location. If a package has to be cut in pieces, a best location is decided for each piece.

4.4 Advanced functionalities

4.4.1 Subscription

Subscription can be used in software distribution to provide event notification and possible automatic download of objects. In dealing with subscription, we will consider the concept of so-called channels, used in Red Hat Network (RHN).

Each channel is an abstraction that corresponds to a set of packages. For each channel, permissions can be assigned to distinct users. In RHN, for example, the base channel corresponds to the core system, and the developer channel receives selected packages from the base channel, according to developer's interest. There are also testing&QA and production channels.

Similarly, we could have different developer channels according to developers' preferences. Some developers may want to get information about the complete set of changed packages for a certain environment, while others may want only a specific subset. According to the Tel Aviv proposal, after subscribing to one or more channels users would get notification of updates according to their interests. The users could then get the desired packages in two ways: either by manually selecting them or automatically.

Subscription methods associated to the Publication role

a) Channel createChannel(ChannelName, ChannelDescription, ObjectID[], AccessRights)

Parameters:

  • ChannelName = name of the channel to be created
  • ChannelDescription = description of the channel
  • ObjectID[] = set of object IDs assigned to the channel at creation time
  • AccessRights = access rights description
Action: Creates a new channel and initializes its content with the set of objects. Publishes to the index the channel name and description.

b) void (un)publishPackageToChannel(PackageID, ChannelName, Date)

c) void (un)publishUtilityToChannel(UtilityID, ChannelName, Date)

d) void (un)publishCollectionToChannel(CollectionID, ChannelName, Date)

Parameters:

  • PackageID / UtilityID / CollectionID
  • ChannelName = name of the channel where the object is published
  • Date = date of publication
Action: Publishes the distribution object to the given channel at the given date. Notification to subscribers to the channel are sent and possibly multicast push is activated. Publishing to the channel and publishing the object in the system are two separate actions.

Physical level functions

  • SubscriptionID addSubscription(ChannelName, SubscriberInfo, Subscription)
This function is called by the publisher when it receives a subscription request from some client. It registers the subscription at the publisher level and returns the subscription ID.

  • void removeSubscription(ChannelName, SubscriberInfo, SubscriptionID)
This function is called by the publisher when it receives a canceling subscription request from some client. It removes the given subscription to the channel at the publisher level.

  • void multicast(ObjectID[], Client[])
This function realizes a multicast distribution of objects (packages, utilities, collections) to the subscribers.

Subscription methods associated to the Client role

a) ChannelName[] getChannelList()

Action: returns the list of channel names available in the system

b) ChannelDescription getChannelDescription(ChannelName)

Parameters:

  • ChannelName = name of the channel
Action: Returns the description of the given channel

c) SubscriptionID subscribeToChannel(ChannelName, SubscriberInfo, Subscription)

Parameters:

  • ChannelName = name of the channel
  • SubscriberInfo = information about the subscriber
  • Subscription = subscription information, including a query filter, notification type, when to be notified, etc.
Action: Subscribe to a channel

d) void unsubscribeToChannel(ChannelName, SubscriberInfo, SubscriptionID)

Parameters:

  • ChannelName = name of the channel
  • SubscriberInfo = information about the subscriber
  • SubscriptionID = ID of subscription to be removed
Action: Cancel some given subscription to a channel

4.4.2 Querying the system

The API should enable searching for data objects not only according to their IDs, but also by other criteria such as: functionality, license, status, size, etc. The Metadata will be queried in order to locate the wanted objects. A simple query language will be used to query metadata values.

a) PackageID[] queryPackages(Query)

b) UtilityID[] queryUtilities(Query)

c) CollectionID[] queryCollections(Query)

Parameters:

  • Query = query expressed in the EDOS distribution query language
Action: get the list of object (package, utility, collection) IDs matching the query

d) Version_Number getLastVersionNb(Name, Type)

Parameters:

  • Name = object (package, utility, collection) name
  • Type = object type
Action: get the last version number for the object of the given type (package, utility, collection)

4.4.3 Transactions  

We propose to extend the advanced functionalities of the distribution API by introducing the concept of transactions in the distribution process.

We add three new methods for the starting and the ending (or aborting) the download, as well as a rollback function in the case when the download fails or is aborted:

** in the case of a file download, a file transfer transaction is started by the peer who wants to download the file ** this function is invoked by the peer who wants to download the file ** *Peer[?* parameter represents the set of peers which hold the pieces of the file
    • ObjectID parameter identifies the file to be downloaded (we use objects, in general, ...)
    • ObjectDescription parameter can be used, for example, as a "composition guide" and it helps to compose a file out of the different downloaded pieces when they are all downloaded
    • the transaction time-out specifies the time interval for download abort. This ensures that the transaction does not take for ever and aborts if the file is not available anymore in the network during a certain period of time
    • we considered also to provide a locking mechanism, to ensure that the file is not changed or deleted during the file download (write-lock). This will be described afterwards, according to a more detailed specification of the physical level in the API

  • endTransaction(TransactionID, lock, ObjectID, ErrorMsg) and abortTransaction(TransactionID, lock, ObjectID, ErrorMsg)
    • we define which are the cases when a transaction should be aborted
    • we distinguish between a transaction abort by the system (for example, due to an internal error) and a transaction abort by the user who is downloading the file (because, for example, he does not need the package anymore or he already downloaded the package from another peer)
    • the lock parameter specifies the locking protocol used
    • an ErrorMgs will be send for a transaction log file entry, in order to specify why the transaction is aborted

  • rollbackTransaction(TransactionID, Location, undo)
    • if a transaction is aborted, the effects of the transaction are rolled back to recover the state of the peer before starting the transaction
    • therefore, an undo protocol (and a log file) is needed to log every operation that changes the state of the peer
    • the Location of the downloaded data should be specified in order to undo the change operations
These functions will guarantee relaxed ACID properties and otherwise rollback the operation.

In the distribution API, we place these methods at the physical level, since they are concerned with optimization issues, such as the split of large packages in sets of pieces or the concurrent download of a collection from different peers. They provide a finer grain access to the packages' Values (files) and they implement the downloading functionality from the logical level.

For the publisher role, the transaction methods are used by the publishPackagePush method at the physical level to upload the packages (in push mode). The same for the equivalent methods for utilities and collections.

The replicator role uses these methods too in publishReplicatedPackagePush for replicating the Values (files) to other peers. The same for the equivalent methods for utilities and collections.

The client role in the distribution architecture is the main role concerned by the transaction methods. The getPackage method (the same for getUtility and getCollection) from the logical level will initiate a file download as a transaction: a logical unit of operations with a defined starting point and an ending point. Each operation will then internally invoke the transaction functions.

The rollbackTransaction method will only be invoked by the startTransaction method or the abortTransaction method if one operation fails (for example, if a file piece can not be downloaded within a defined period of time) or if the user manually aborts the download (by using abortTransaction).

4.4.5 Security  

The basic aspects of security in P2P distribution systems were introduced in the state of the art section. According to the specifications of the distribution system architecture, we saw that a PKI (Public Key Infrastructure) based on PGP (Pretty Good Privacy) is the security solution suitable for EDOS approach.

We identified two security requirements in EDOS distribution system:

  • mutual authentication between Publisher and Replicators - Indexing network security
  • Client authentication
The security of the indexing network (composed by the Publisher and the Replicators peers) will certify all the sources of download in the P2P system and will avoid fake mirror sites. The authentication is based on PGP key pairs, generated by each "trusted" peer (Publisher or Replicators) and distributed in the network.

The distribution API provide these security functionalities by the following methods:

  • generateKey(PeerID)
  • sendKey(Key, Peer[])
Moreover, the Publisher will maintain a directory service to store the public key of each "trusted" peer in the network:
  • storePublicKey(PeerID, Key)
  • deleteKey(Key)
  • authenticatePeer(PeerID, Key)
Optionally, each Client peer can generate its own key pair and publish it in the publisher's directory service.
  • publishKey(PeerID, Key)
If the public key of a peer is present in the directory service, it can be authenticated by the other peers.

Version 1.79 last modified by RaduPop on 03/01/2006 at 17:39

Comments 0

No comments for this document

Attachments 8

Image
DistributionAPI-Architect~.jpg 1.4
PostedBy: StephaneLauriere on 30/12/2005 (13kb )
Image
Cooker-Objects2.jpg 1.1
PostedBy: RaduPop on 15/06/2005 (81kb )
PS
Cooker-Objects.eps 1.1
PostedBy: RaduPop on 07/06/2005 (0 bytes )
PS
DistributionAPI-Architect~.eps 1.1
PostedBy: RaduPop on 07/06/2005 (0 bytes )
Image
Cooker-Objects.jpg 1.1
PostedBy: RaduPop on 08/06/2005 (0 bytes )
Image
DistributionAPI-Architect~.jpg 1.1
PostedBy: RaduPop on 08/06/2005 (0 bytes )
Image
Cooker-Objects1.jpg 1.1
PostedBy: RaduPop on 08/06/2005 (0 bytes )
Image
Implementation-Architectu~.jpg 1.1
PostedBy: RaduPop on 27/07/2005 (41kb )

Creator: RaduPop on 2005/06/08 22:23
Copyright EDOS Consortium
1.1.1