Unit tests for all components

1 Roadmap

ObjectivePeriodTimeComment
Grid5000 first round
of tests
23 April - 6 May2 weeksPublishing
functionality
BDA paper7 May - 13 May1 weekDeadline: 14 May
(Abstract: 7 May)
Wp4 final deliverables &
Grid5000 second round
14 May - 10 June4 weeksfirst version
OSS 200711 June - 14 June1 weekslides
(add measures)
Wp4 final deliverables18 June - 29 June2 weeksDeadline: 29 June
Full paper1 May - 29 July13 weeksDraft: 29 July
Final review--6 September
Full paper--Final version: 30 September

2 Tasks

NoTaskCommentStatus
1Open Grid5000 accountEDOS experimentdone
2Java for 64 bitsGrid5000 nodes have 64 bits processorsdone
3Metadata extractionGrid5000 nodes use Debian distributiondone
4Deployment scripton SVNdone
5Launching scripton SVNin progress
6Java stubs for method callsuse web service callsin progress
7Log4j configlog the measurementsdone
8Log retrieval scriptcollecting the resultsdone
sample
9Testing with Anteaterweb service frameworkdone
sample
10Pastry params_socket_buffer_size=32768 (2p15)
new value: 262144 (2p18)
testing

3 Description of the scalability tests

The following tests concern two main issues for EDOS P2P distribution:

  • check scalability in terms of data/metadata size and of network size (number of peers)
  • compare for each functionality the performances of the basic implementation with the improved one

3.1 Publishing

  • We measure: the time for publishing metadata
  • Main parameters:
    • size of the indexing network (INS, in number of total peers)
    • size of the distribution network (DNS, in number of total peers in the distribution process)
    • size of published metadata (MDS, in number of files)
  • Other parameters: level of replication in the DHT
  • We could compare: current publishing (key-by-key) and improved publishing (release-by-release)
Scenario 1 (restrained set: 10 Mirrors):

ParametersValuesNotes
INS1110 Mirrors and the Publisher
DNS1110 Mirrors and the Publisher
MDS6000one release

Notes:

  • Client peers are not needed, since only publishing functionality is considered
  • interest for a restrained set: watching the index network stability and keys distribution
  • the application is stored in the /home directory (mounted by NFS on all the nodes) and then transferred on each local /tmp directory
  • results are gathered from the log files, written independently on each local /tmp directory
Steps:
  • 1° Deploy a Publisher peer
  • 2° Deploy 10 Mirror peers
  • 3° Run Publisher peer
  • 4° Run Mirror peers
    • shell scripts
    • check that all peers joined the indexing network
  • 5° Launch the publishing process on the Publisher
    • start counter
  • 6° Wait for publishing process and keys distribution
    • stop the counter and check the logs
Scenario 2 (large set: 100 Mirrors):

ParametersValuesNotes
INS101100 Mirrors and the Publisher
DNS101100 Mirrors and the Publisher
MDS3 x 6000three different releases

Notes:

  • real usage scenario (actual number of mirrors in the current architecture: 50)
  • focus on the load balancing of the mirrors: for the improved publishing (release-by-release) we watch for the keys distribution
  • compute the average time for the publication:
    • different releases
    • key-by-key and release-by-release publication
  • using more nodes than in the previous scenatio might involve the deployment of the experiment on more than one site (e.g. Orsay) and therefore a synchronization of the repositories
Steps:
  • similar as in the previous scenario

3.2 Query

  • We measure: the time for executing a query
  • Main parameters:
    • size of the indexing network (INS, in number of total peers)
    • size of the distribution network (DNS, in number of total peers in the distribution process)
    • size of published metadata (MDS, in number of files)
  • Other parameters: query size and type
    • queries on packages/utilities/collections (the metadata documents addressed are different)
    • single/multiple projections
    • selection on frequent/rare tags or words
  • We could compare: without and with metadata replication
ParametersValuesNotes
INS101100 Mirrors and the Publisher
DNS102One Client, 100 Mirrors and the Publisher
MDS6000one release

Steps:

  • 1° Deploy a Publisher peer
  • 2° Deploy 100 Mirror peers and a Client
  • 3° Run Publisher peer
  • 4° Run Mirror peers and the Client
    • check that all peers joined the indexing network
  • 5° Publish the release on the Publisher
  • 6° Build and send the query from the Client peer
    • start counter
  • 7° Wait for the result
    • check the logs

3.3 Download with IDiP (flash crowd)

  • We measure: the time for downloading a release
  • Main parameters:
    • size of the indexing network (INS)
    • number of clients that simultaneously download the release (DNS - distribution network size, in number of total peers in the distribution process)
    • number of clusters
    • size of the release (MDS, in number of files)
  • We could compare: download release when participating to clustering (the normal IDiP dissemination) and asking for download after the clusters are computed
Scenario 1 (one source: only the Publisher in the indexing network):

ParametersValuesNotes
INS1the Publisher
DNS10011000 Client peers and the Publisher
MDS5000one release

We evaluate the scalability of the system for the downloading functionality. We consider the case of the "flash-crowd" situations, where the concurrence of a large number of peers in a short time interval is crucial for the system's performances.

Syncronizing more client peers in a real test scenario is a difficult task, though the size of the distribution network is the essential parameter in our evaluation. Grid5000 framework offers a range of several hundred nodes, exclusively available for our experiment. (The current state of the network counts 1200 total nodes, but due to limitations in nodes' reservation, we can scale our evaluation only in the order of hundreds of nodes). Another hypothesis involved by the Grid5000 test-bed is the fast link of 1Gb/s between the peers. This is not the common value in the real use-cases, but we exploit this network gain for highlighting only the dissemination mechanisms in our evaluation.

The data units used in the experiment are the packages from the last Cooker2007 release. Counting a total number of around 5000 packages, the size of the content is around 5GB. We want that our measures to be linear with the number of data units. Therefore we consider in our experiments that each package has a normed size of 1MB. (The names of the packages correspond to the ones in the release, but the content of the package is normed at the same fixed size)

As we described in the system's architecture, the downloading functionality is implemented by the IDiP module, using Azureus implementation of BitTorrent. IDiP is based on efficient algorithms of clustering, whose complete evaluation is presented in deliverable D4.2.2, and Azureus uses BitTorrent algorithm for content dissemination. Both software modules contribute together to the overall performances of the system. The goal of the current evaluation is the analysis of the downloading functionality at a grosgrain level, viewed at the system's functionality level.

We focus distinctly on the evaluation of each module, by limiting each time one of the performance factor. Basically, we measure the downloading times obtained in two categories of experiments:

  • for a variation of the network size, using a single cluster for all the disseminated content: every peer wants always the same content;
  • for a fixed network size, each peer having a different downloading request: the data units are grouped in clusters of various sizes.
The size of the clusters is influenced by the data set partitioning. Each Client peer computes the list of the packages to download, according to its local content (as delta between the content of the new release and the content of its local repository; see "System functionalities"). We call this wishlist and it is used by the Publisher peer to compute the clusters. Depending on the size of the wishlists and the size of their intersections, we obtain a different configuration of the clusters and of the commonly downloaded content, respectively.

One possibility to model the wishlist composition for a large number of peers is to consider sets of packages of equal sizes. Each Client peer will build a wishlist with equal size composed by two parts:

  • a distinct part, totally disjunctive with all the other wishlists, and
  • a shared part, containing the same set of packages for all the wishlists.
We note the ratio between the shared part and the distinct part with SHS (Shared Size) and we consider it as a parameter for the second category of our experiments.

The figure below illustrates the wishlists' composition for an example of 5 Client peers and a shared ratio SHS=30%.

Clusters2.gif

Summarizing, the parameters used in the experiments are:

ParameterValuesNotes
DNS10, 100, 200, 300Distribution Network Size
MDS1000, 2000,… 5000MetaData Size (number of packages of 1MB)
SHS0%..100%Shared Size (for clustering evaluation)

Steps:

  • 1° Deploy a Publisher peer
  • 2° Deploy 1000 Client peers
  • 3° Run Publisher peer
  • 4° Run Client peers
  • 5° Publish release
    • wait for metadata publication
  • 6° Subscribe the Client peers to the new release
  • 7° Launch the clustering process on Publisher
    • start counter
  • 8° Publish the clusters
    • log the number of clusters
    • start counter
  • 9° Wait for the download process
    • stop the counter and check the logs

3.4 Download query results (off peak)

  • We measure: the time for downloading a set of query results
  • Main parameters:
    • Number/size of the files to download
    • Average number of replicas for the files to download
  • We could compare: download through HTTP (current method) and download through BitTorrent

3.5 Subscription

  • We measure: the time for distributing a channel event/message to all the subscribers, the Publisher?s load (how?)
  • Main parameters: number of subscribers
  • We could compare: messages distributed by the Publisher (current method) and messages published in the DHT

4 Running the tests

4.1 Deployment scripts on SVN

4.2 Java test classes on SVN

5 Platforms for running the tests

Version 1.41 last modified by RaduPop on 29/05/2007 at 19:05

Comments 0

No comments for this document

Attachments 3

BIN
metrics.log 1.1
PostedBy: RaduPop on 23/04/2007 (4kb )
BIN
anteater.test 1.1
PostedBy: RaduPop on 23/04/2007 (762 bytes )
Image
Clusters2.gif 1.1
PostedBy: RaduPop on 29/05/2007 (10kb )

Creator: StephaneLauriere on 2007/01/23 12:19
Copyright EDOS Consortium
1.1.1