Unit tests for all components
1 Roadmap
| Objective | Period | Time | Comment |
|---|---|---|---|
| Grid5000 first round of tests | 23 April - 6 May | 2 weeks | Publishing functionality |
| BDA paper | 7 May - 13 May | 1 week | Deadline: 14 May (Abstract: 7 May) |
| Wp4 final deliverables & Grid5000 second round | 14 May - 10 June | 4 weeks | first version |
| OSS 2007 | 11 June - 14 June | 1 week | slides (add measures) |
| Wp4 final deliverables | 18 June - 29 June | 2 weeks | Deadline: 29 June |
| Full paper | 1 May - 29 July | 13 weeks | Draft: 29 July |
| Final review | - | - | 6 September |
| Full paper | - | - | Final version: 30 September |
2 Tasks
| No | Task | Comment | Status |
|---|---|---|---|
| 1 | Open Grid5000 account | EDOS experiment | done |
| 2 | Java for 64 bits | Grid5000 nodes have 64 bits processors | done |
| 3 | Metadata extraction | Grid5000 nodes use Debian distribution | done |
| 4 | Deployment script | on SVN | done |
| 5 | Launching script | on SVN | in progress |
| 6 | Java stubs for method calls | use web service calls | in progress |
| 7 | Log4j config | log the measurements | done |
| 8 | Log retrieval script | collecting the results | done sample |
| 9 | Testing with Anteater | web service framework | done sample |
| 10 | Pastry params | _socket_buffer_size=32768 (2p15) new value: 262144 (2p18) | testing |
3 Description of the scalability tests
The following tests concern two main issues for EDOS P2P distribution:- check scalability in terms of data/metadata size and of network size (number of peers)
- compare for each functionality the performances of the basic implementation with the improved one
3.1 Publishing
- We measure: the time for publishing metadata
- Main parameters:
- size of the indexing network (INS, in number of total peers)
- size of the distribution network (DNS, in number of total peers in the distribution process)
- size of published metadata (MDS, in number of files)
- Other parameters: level of replication in the DHT
- We could compare: current publishing (key-by-key) and improved publishing (release-by-release)
Scenario 1 (restrained set: 10 Mirrors):
| Parameters | Values | Notes |
|---|---|---|
| INS | 11 | 10 Mirrors and the Publisher |
| DNS | 11 | 10 Mirrors and the Publisher |
| MDS | 6000 | one release |
- Client peers are not needed, since only publishing functionality is considered
- interest for a restrained set: watching the index network stability and keys distribution
- the application is stored in the /home directory (mounted by NFS on all the nodes) and then transferred on each local /tmp directory
- results are gathered from the log files, written independently on each local /tmp directory
- 1° Deploy a Publisher peer
- 2° Deploy 10 Mirror peers
- 3° Run Publisher peer
- 4° Run Mirror peers
- shell scripts
- check that all peers joined the indexing network
- 5° Launch the publishing process on the Publisher
- start counter
- 6° Wait for publishing process and keys distribution
- stop the counter and check the logs
Scenario 2 (large set: 100 Mirrors):
| Parameters | Values | Notes |
|---|---|---|
| INS | 101 | 100 Mirrors and the Publisher |
| DNS | 101 | 100 Mirrors and the Publisher |
| MDS | 3 x 6000 | three different releases |
- real usage scenario (actual number of mirrors in the current architecture: 50)
- focus on the load balancing of the mirrors: for the improved publishing (release-by-release) we watch for the keys distribution
- compute the average time for the publication:
- different releases
- key-by-key and release-by-release publication
- using more nodes than in the previous scenatio might involve the deployment of the experiment on more than one site (e.g. Orsay) and therefore a synchronization of the repositories
- similar as in the previous scenario
3.2 Query
- We measure: the time for executing a query
- Main parameters:
- size of the indexing network (INS, in number of total peers)
- size of the distribution network (DNS, in number of total peers in the distribution process)
- size of published metadata (MDS, in number of files)
- Other parameters: query size and type
- queries on packages/utilities/collections (the metadata documents addressed are different)
- single/multiple projections
- selection on frequent/rare tags or words
- We could compare: without and with metadata replication
| Parameters | Values | Notes |
|---|---|---|
| INS | 101 | 100 Mirrors and the Publisher |
| DNS | 102 | One Client, 100 Mirrors and the Publisher |
| MDS | 6000 | one release |
- 1° Deploy a Publisher peer
- 2° Deploy 100 Mirror peers and a Client
- 3° Run Publisher peer
- 4° Run Mirror peers and the Client
- check that all peers joined the indexing network
- 5° Publish the release on the Publisher
- 6° Build and send the query from the Client peer
- start counter
- 7° Wait for the result
- check the logs
3.3 Download with IDiP (flash crowd)
- We measure: the time for downloading a release
- Main parameters:
- size of the indexing network (INS)
- number of clients that simultaneously download the release (DNS - distribution network size, in number of total peers in the distribution process)
- number of clusters
- size of the release (MDS, in number of files)
- We could compare: download release when participating to clustering (the normal IDiP dissemination) and asking for download after the clusters are computed
Scenario 1 (one source: only the Publisher in the indexing network):
| Parameters | Values | Notes |
|---|---|---|
| INS | 1 | the Publisher |
| DNS | 1001 | 1000 Client peers and the Publisher |
| MDS | 5000 | one release |
- for a variation of the network size, using a single cluster for all the disseminated content: every peer wants always the same content;
- for a fixed network size, each peer having a different downloading request: the data units are grouped in clusters of various sizes.
- a distinct part, totally disjunctive with all the other wishlists, and
- a shared part, containing the same set of packages for all the wishlists.
Summarizing, the parameters used in the experiments are:
| Parameter | Values | Notes |
|---|---|---|
| DNS | 10, 100, 200, 300 | Distribution Network Size |
| MDS | 1000, 2000,… 5000 | MetaData Size (number of packages of 1MB) |
| SHS | 0%..100% | Shared Size (for clustering evaluation) |
- 1° Deploy a Publisher peer
- 2° Deploy 1000 Client peers
- 3° Run Publisher peer
- 4° Run Client peers
- 5° Publish release
- wait for metadata publication
- 6° Subscribe the Client peers to the new release
- 7° Launch the clustering process on Publisher
- start counter
- 8° Publish the clusters
- log the number of clusters
- start counter
- 9° Wait for the download process
- stop the counter and check the logs
3.4 Download query results (off peak)
- We measure: the time for downloading a set of query results
- Main parameters:
- Number/size of the files to download
- Average number of replicas for the files to download
- We could compare: download through HTTP (current method) and download through BitTorrent
3.5 Subscription
- We measure: the time for distributing a channel event/message to all the subscribers, the Publisher?s load (how?)
- Main parameters: number of subscribers
- We could compare: messages distributed by the Publisher (current method) and messages published in the DHT
4 Running the tests
4.1 Deployment scripts on SVN
4.2 Java test classes on SVN
5 Platforms for running the tests
- Grid5000: see also WP4_Large Scale Simulator
- see also Tools For P2p Network Simulation
Version 1.41 last modified by RaduPop on 29/05/2007 at 19:05
Comments: 0