User's Guide for the EDOS distribution prototype

1. Overview

The prototype of the EDOS Distribution System consists in two distinct web applications using web services for communication. The first one instantiates a Publisher peer, playing the Publisher role in the network. The second one can serve either as a Replicator peer or as a simple Client, according to the configuration parameters.

Both applications are written in Java and need a running Tomcat web server for deployment. At the lower level in the architecture, the applications use Azureus module for file transfer. This module comes along with the web applications and it has configured parameters for the prototype.

Java 2 Platform Standard Edition 5.0 is needed to run the applications and also Apache Tomcat web server, version 5.5.

2. Install the applications

We created a tarball archive for both Publisher and Client applications, containing:

  • a pre-configured Tomcat web server for each web application along with all the needed libraries
  • the Publisher and the Client web applications along with their user interfaces (jsp)
  • install scripts that extract the applications from the archive and change the configuration parameters for the installation machine
After copying the archive in the directory where you want to install the application, check the configuration parameters in "config.params" file:
  • TOMCAT_PORT (default=9090) - if you want to use a different port for the web server or if you run more Tomcat instances on the same machine
  • PUBLISHER - the address of the Publisher peer in the network
Launching the "install_client" or "install_publisher" script will automatically do the installation.

There is another option for the manual installation, using the latest version of the prototype from the EDOS SVN repository server, at the following location:

https://protactinium.pps.jussieu.fr:12345/svn/edos/software/distribution

The web applications are automatically build using the Ant script "edosDistributionBuilder.xml".

3. Publisher webapp configuration

The configuration parameters for the Publisher application are grouped together in "edos.properties" file, located in the web application's path:

  • /data_store/programs/server/tomcat/webapps/edos
Here is the complete list of parameters for the Publisher configuration:
  • edos.peer.log4j.configfile=log4j.properties
  • edos.index.bootstrapnode=classos.futurs.inria.fr
  • edos.index.bootstrapport=12345
  • edos.index.repository=kadop.txt
  • edos.axml.axis.peerurl=http://classos.futurs.inria.fr:8080/edos/servlet/AxisServlet
  • edos.axml.axis.servicename=ReceiveMessageService
  • edos.axml.axis.methodname=receiveMessage
  • edos.release.repository.path=/local/data_store/programs/server/distributions/
  • edos.idip.BittorentTrackerAddress=http://classos.futurs.inria.fr:30485/announce
  • edos.home=/local/edos-publisher
  • edos.peer.peertype=PUBLISHER

4. Replicator/Client webapp configuration

The configuration parameters for the Replicator/Client application are grouped together in "edos.properties" file, located in the web application's path:

  • /data_store/programs/client/tomcat/webapps/edos-client
Here is the complete list of parameters for the Replicator/Client configuration:
  • edos.peer.log4j.configfile=log4j.properties
  • edos.index.bootstrapnode=classos.futurs.inria.fr
  • edos.index.bootstrapport=12345
  • edos.index.repository=kadop.txt
  • edos.axml.axis.peerurl=http://atos.futurs.inria.fr:9090/edos-client/servlet/AxisServlet
  • edos.axml.axis.servicename=ReceiveMessageService
  • edos.axml.axis.methodname=receiveMessage
  • edos.release.repository.path=/local/data_store/programs/client/distributions/
  • idip.path=/local
  • edos.peer.peertype=CLIENT

5. Publisher Peer Application

The Graphical User Interface (GUI) for the publisher peer is represented by a stand-alone web application. It is a standard webapp implemented by Java Server Pages (JSP) and deployed in a Tomcat web server.

5.1. Run the Publisher

After installing the publisher application (see the "Install" chapter of this guide), the Tomcat web server is already configured and prepared to load the web application.

  • Note: We configured the Tomcat web server to use by default the following ports for the publisher application:
    • Port 8080 - for HTTP connections
    • Port 8005 - for shutdown command
We did some tests instantiating two distinct web servers on a single machine, running on different ports. But the best scenario is to use a different machine for each web application or, eventually, a different virtual machine. Anyway, if you need any particular configuration, the port numbers can be manually changed in the following file:
  • /tomcat/conf/server.xml
In order to run the publisher peer application, firstly you have to start the Tomcat web server:
  • /tomcat/bin/startup.sh (or /tomcat/bin/startup.bat for Windows)
The publisher application is already deployed, therefore the web server will load the "InitServlet" and it will instantiate the publisher peer. At this point, the servlet reads the configuration file "edos.properties" and loads all the parameters (see also the "Publisher webapp configuration" chapter). It will create a "PublisherPeer" object, along with the local repository reference and axis connection parameters.

You can access the running publisher application, via HTTP at:

  • http://localhost:8080/edos (or replace "localhost" with your machine's name; see also the "Publisher webapp configuration" chapter)
You will see the "Home" page of the EDOS Publisher Peer.

The GUI of the publisher is tab organized. There are three distinct pages that group different actions:

  • Distributions
  • Channels
  • Clusters
On the header of each page you will see this compact menu that will allow you to easily navigate through the different sections.

The "Home" page represents the "index" of the publisher application and it is the first page displayed when accessing the application. It is at the same time the confirmation that the web server is running and the publisher peer is well initialized. It displays a summary of the peer properties like:

  • PeerID - the unique ID assigned to this peer in the network
  • Address - the IP endpoint of the peer
  • ...
We will present the next pages of the GUI in the following subsections, along with each functionality implemented by the publisher peer application.

5.2. Publish a new distribution

As we previously presented in the system's architecture, each component of the system offers a specific set of functionalities according to its type. We grouped the functionalities into "roles" and each component (peer) can play distinct roles in the system.

The main role played by the publisher peer is the Publisher role. It is the only peer in the network that has the wright to publish or to delete a new distribution from the distributed index. In other words, the publisher is the only one to insert new distributions in the system. The content of the distributions is put available for download and the metadata of each published distribution is stored in the index.

  • Note that when deleting a distribution, its content will not completely disapear from the system. Only the associated metadata will be deleted from the index. Each client peer storing parts of the distribution will keep its content, but this content will be no longer indexed in the system. Therefore, the queries on the deleted distribution will not be possible any more.
In the upper part of the "Distributions" page it will display the table with the names of the distributions, their path in the local file system and the two possible actions: "Publish" and "Delete".

At the initialization time, the Content Manager on the publisher peer searches for all the distributions available in the local repository. At the physical level in peer's architecture, the "publish" and "delete" actions are implemented in the Index Manager. In short, the steps involved by the publication of a distribution are the following:

  • Step 1°: Call "publishRelease(releaseName)" method of the Publisher role;
  • Step 2°: Get the distribution's content from the Content Manager;
  • Step 3°: For each DataUnit in the distribution, the MetadataBuilder module will extract the metadata and store it in an XML document;
  • Step 4°: The Index Manager will publish the metadata into the distributed index (at the KadoP level).
Once that a new distribution was published in the system, the clients can subscribe to this distribution and they can connect to the dissemination channel. The clients will send a subscription message for this distribution and will wait for a notification from the publisher.

This is what we called a "Publish event". On the second part of the page there are some input fields that define a notification event:

  • Start time
  • End time
  • Distribution
  • Channel
Note that the notification message can be sent only for the distributions that were already published. In the combo-box you can choose only from the distributions that were previously published and available in the index.

The channel used for the broadcast notification events is a general channel called "EdosDistributionBroadcastChannel".

For each "publish event", the publisher defines a time-window for waiting all the client requests. The dialog scenario between the publisher and the clients can be described briefly in the following steps:

  • Step 1°: Before everything, the publisher has to put available the new distribution in the index: Publish distribution.
  • Step 2°: The clients will see the new distribution and they could subscribe for downloading it: Subscribe to channel.
  • Step 3°: The publisher makes the list of all the clients subscribed to this distribution.
  • Step 4°: When the publisher decides that it is ready to start the dissemination process, it sends a notification message to all the clients on the list: Publish event.
  • Step 5°: The clients receive the notification and send back their WishLists, in compliance with the time-window defined by the publisher. Now the clients are ready and waiting to download the content.
  • Step 6°: The publisher receives the WishLists from the clients during the time-window interval. It has also the possibility to close the time-window earlier than the scheduled "end time" (this is especially useful for testing).
  • Step 7°: The publisher prepares the content for dissemination: Cluster data.
  • Step 8°: The publisher creates the torrents for each cluster of data and publishes them in Azureus: Publish clusters.
You will find more details on the last three steps in the "Clustering data" subchapter of this section. Moreover, you will have also the client view on this dialog in the "Client Peer Aplication" section.

5.3. Channel manager

The "Channels" page of the GUI represents the front-end of the Channel Manager on the publisher peer.

In the upper part of the page the table displays the list of channels connected on the publisher. For each channel available on the publisher peer you have the information concerning:

  • the name of the channel
  • a short description of the channel
  • and the "Delete" action associated to the "removeChannel(channelName)" method from the Channel Manager
By default, a general channel called "EdosDistributionBroadcastChannel" is published at the initialization time. This is the channel used by the clients to send their subscriptions.

You can add new channels using the form in the lower part of the page. A channel is uniquely identified by its name and it has an optional text description. Pressing the "Add" button will call the "createPublisherChannel(channelName)" method from the Channel Manager.

5.4. Clustering data

The "Clusters" page of the GUI is used as an input form for the clustering algorithm. The data clustering is implemented at the physical level in the publisher peer architecture, in IDiP module.

The algorithm computes clusters of files in order to achieve an efficient dissemination mechanism. The clustering is based on the subscriptions registered by the publisher and on the WishLists sent by each subscribed client.

A WishList is a set of DataUnits that a client wants to download. Basically, the WishList is computed as a difference between what the client already has (in its local repository) and what it was newly published in the system. See also the "Client Peer Aplication" chapter of this guide.

In a first phase after sending a "publish event" to its subscribers, the publisher waits to receive the WishLists. For each "publish event" associated to a distribution, the publisher defines a starting time and an ending time for the WishLists' reception. This is what we called a "time-window". In the upper part of the "Clusters" page there is a table showing the state of each distribution:

  • Distribution name
  • Distribution state (the state of the scheduled time-window: opened/closed)
  • and a "Close window" action that can be used to shorten the waiting delay
After the time-window was closed, the publisher can start the clustering algorithm, based on the WishLists and on the following input parameters:
  • Number of clusters
  • Distance metric
    • euclidean
    • squared euclidean
    • manhattan
    • pearson correlation
    • square pearson correlation
    • chebychev
  • Iterations number
Finally, after the clusters were computed, the publisher can create the torrents corresponding to each cluster and pass them in Azureus. The torrents' description is published in the index by pressing the "Publish cluster" button at the bottom of the page. At the same time, the publisher sends another notification message to all the subscribed clients. This message will inform the clients that the torrents are available and they can start the download.

6. Client Peer Application

The Graphical User Interface (GUI) for the client peer is represented by a stand-alone web application. It is a standard webapp implemented by Java Server Pages (JSP) and deployed in a Tomcat web server.

6.1. Run the Client

After installing the client application (see the "Install" chapter of this guide), the Tomcat web server is already configured and prepared to load the web application.

  • Note: We configured the Tomcat web server to use by default the following ports for the client application:
    • Port 9090 - for HTTP connections
    • Port 9005 - for shutdown command
We changed the default ports (8080 and 8005) in order to distinguish between the server and the client applications when running on the same machine. We did some tests instantiating two distinct web servers on a single machine, running on different ports. But the best scenario is to use a different machine for each web application or, eventually, a different virtual machine. Anyway, if you need any particular configuration, the port numbers can be manually changed in the following file:
  • /tomcat/conf/server.xml
In order to run the client peer application, firstly you have to start the Tomcat web server:
  • /tomcat/bin/startup.sh (or /tomcat/bin/startup.bat for Windows)
The client application is already deployed, therefore the web server will load the "InitServlet" and it will instantiate the client peer. At this point, the servlet reads the configuration file "edos.properties" and loads all the parameters (see also the "Replicator/Client webapp configuration" chapter). It will create a "ClientPeer" object, along with the local repository reference and axis connection parameters.

You can access the running client application, via HTTP at:

You will see the "Home" page of the EDOS Client Peer.

The GUI of the client is tab organized. There are five distinct pages that group different actions:

  • Home
  • Repository
  • Channels
  • Distributions
  • Query
On the header of each page you will see this compact menu that will allow you to easily navigate through the different sections.

The "Home" page represents the "index" of the client application and it is the first page displayed when accessing the application. It is at the same time the confirmation that the web server is running and the client peer is well initialized. It displays a summary of the peer properties like:

  • PeerID - the unique ID assigned to this peer in the network
  • Address - the IP endpoint of the peer
  • ...
We will present the next pages of the GUI in the following subsections, along with each functionality implemented by the client peer application.

6.2. Client's roles

As we previously presented in the system's architecture, each component of the system offers a specific set of functionalities according to its type. We grouped the functionalities into "roles" and each component (peer) can play distinct roles in the system.

The first version of the prototype implements two roles for a client peer:

  • Client role - for the basic content management and download operations
  • Query role - for the advanced query operations
By the basic operations assigned to the client role, we understand the operations related to the local content management ("Repository"), the subscriptions and channel management ("Channels") and, finally, the download of the new content ("Distributions"). At the physical level in our implementation, each operation available in the client's GUI corresponds to a particular method in the client peer's managers.

6.3. Local repository manager

The client peer stores its data content into a local repository. This content is represented by the DataUnits (Packages, Utilities and whole Distributions) that were previously downloaded from the system. There is also the possibility that Distributions could be manually added to the local repository, but this is not the regular method. We use this feature only for testing reasons.

Basically, the local repository is a file system on the client machine. Its location is given by "edos.release.repository.path" parameter in the properties file (see also the "Replicator/Client webapp configuration" chapter). This is the root directory of the repository. Each Distribution is stored in a different directory, using the same name as the distribution's name. The Packages and the Utilities are stored hierarchically in the file structure, according to our datamodel.

At the initialization time, the Content Manager on the client peer loads the repository structure in the memory. It uses the root directory as entry point. It searches for all the Distributions available and for each package or utility file in the repository. Each DataUnit will have an associated object stored in memory.

After a file download, the Dissemination Manager on the client peer creates a symbolic link in the local repository with the name of the file. This link points out to the file content which is stored in Azureus download directory: "/local/data_store/programs/edos-client/distributions"

Note that a regular Client peer stores only the data content into the repository. This is the difference from a Replicator peer that, in addition to the DataUnits, it stores also the metadata content. The Replicator peers are part of the index network where the metadata is distributed on different replicators (see also the "Replicator Peer" chapter).

On the "Repository" page of the GUI you will see the content of the local repository. In the upper part of the page you will find a summary of the local repository:

  • Current Path - the current location in the file system
  • Local Distributions - the list of the distributions available locally
The path of the repository can be changed in the text box below and pressing the "Set New Path" button will record the new location. After changing the repository location, the "Reload Client Repository" button becomes active. The action assigned to this button will call the "reloadRepository()" method from the Content Manager. It will search for the content in the new location and the new DataUnits will be charged into memory. In the bottom part of the page it is displayed a tree view of the repository content. It is the result of a simple function call that shows all the DataUnits registered by the client peer. This is especially useful when testing the download completeness.

6.4. Channel subscription

The "Channels" page of the GUI represents the front-end of the Channel Manager on the client peer. In the table displayed you will find the summary of the peer's connections to the network. For each channel available on the client peer you have all the information concerning:

  • the name of the channel
  • the end-point address of the channel
  • the status of each channel - connected or not connected
In the second part of the table there is the list of the "General Open Channels" like, for instance, the "EdosDistributionBroadcastChannel". This is the channel connected to the Publisher main broadcast events. The client can register to this general channel by pressing the "Subscribe" button. The address of the Publisher is taken from the "edos.index.bootstrapnode" parameter in the properties file (see also the "Replicator/Client webapp configuration" chapter). By subscribing to this general channel, the client peer registers itself to the Publisher's list of "known" clients. Each time that a "publish" event is launched by the Publisher, all the clients registered to this list will be notified through the broadcast channel.

6.5. Download distributions

The "Distributions" page represents the download section of the client peer. This base functionality is implemented by the Dissemination Manager.

The table on this page displays the distributions currently available in the peer-to-peer system. This information is obtained by calling the "getDistributionList()" method from the Index Manager. Each distribution found in the system's index is searched afterwards locally by the Content Manager. The table summarizes which distributions are available in the local repository and which distributions are not, but they can be downloaded from the system.

When pressing the "Download Distribution" button, the "getNewRelease()" method is called. As we defined in the system's architecture, at the logical level this method belongs to the Client role. At the physical level, it is implemented by the Dissemination Manager. The radio-buttons on the left side allow you to select the distribution to download.

The download of a distribution involves several steps and a set of actions to proceed on the client application. We shall detail the download algorithm as following:

  • Step 1° - Wait for the announce message
    • the Client is already subscribed to the main broadcast channel of the Publisher;
    • the Client starts listening on this channel and it waits for an announce message from the Publisher;
    • when a publish event occurs on the Publisher (e.g.: when a new release is available for download), each Client subscribed on the Publisher's list will be notified with an announce message;
    • this announce message contains also the information concerning the time-window proposed by the Publisher for the broadcast;
  • Step 2° - Wait for the right time-window
    • the Client must check that its current time is inside the time-window for the broadcast of this distribution;
    • it has to verify though that the time-window is not already closed (e.g.: its current time is past over the closing time of the window); this case should not happen unless some communication problems occured;
    • it has to verify also that the starting time of the window is already passed; otherwise, it has to wait for a while;
  • Step 3° - Compute the WishList
    • the download of a new distribution basically consists in a parallel download of independent files;
    • in the case when the selected distribution is not present in the local repository (e.g.: it has never been downloaded before by this Client), the set of files to download is actually the whole collection of files contained by the distribution (this is the case of a new release);
    • in the case when the selected distribution is already present in the local repository (this is the case of an update), the Client will compute the delta between its local content of files and the new content announced by the Publisher;
    • this delta is basically assessed between the Content Manager and the Index Manager;
    • in both cases (new release or update), the set of files to download we called it the WishList;
  • Step 4° - Send the WishList to the Publisher
    • the WishList is packed in a message and sent to the Publisher
  • Step 5° - Get the torrent files and download the files
    • during the announced time-window, the Publisher will receive the WishLists from all its subscribed Clients;
    • after closing the window, it computes the file clusters and creates the torrent files;
    • the Publisher releases the torrents in Azureus and sends another message to all the Clients to inform them about the availability of the torrents;
    • the Client receives the torrents message and starts its Azureus downloader;
  • Step 6° - Announce completion
    • Azureus finished the download of all files in the WishList;
    • the downloaded files are registered by the Content Manager;
    • the Dissemination Manager creates the symbolic links from the local repository to the new files downloaded by Azureus

6.6. Querying the system

The "Query" page of the GUI represents the interface of the Query role played by the client peer. The queries are passed to the Index Manager and then send lower in the psysical level, to the KadoP Manager.

We defined two categories of queries:

  • Simple queries - based on the pre-defined tags building the metadata of DataUnits
    • NAME
    • VERSION
    • RELEASE
    • SUMMARY
    • DESCRIPTION
    • SIZE
    • LICENSE
    • ...
  • Compouned queries - logical combinations of simple queries (joins)
Version 1.12 last modified by StephaneLauriere on 18/10/2006 at 14:00

Comments 0

No comments for this document

Attachments 0

No attachments for this document

Creator: RaduPop on 2006/09/12 18:24
Copyright EDOS Consortium
1.1.1