Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer speed does not improve with multiple nodes. #5083

Closed
Juelun opened this issue Jun 6, 2018 · 9 comments

Comments

Projects
None yet
3 participants
@Juelun
Copy link

commented Jun 6, 2018

I built a local network with four nodes which connected by router. Each node is installed with ubuntu 16.04.
image

I used iftop tool to monitor network traffic on each node.
Running ipfs daemon on A node and added a file as 644MB to IPFS at A node firstly.
Then I tried to get the file on each node one by one(from B node to D node ) with ipfs get command.

When I ran ipfs daemon on B node and got the file on B node, it costed 26 seconds to get it.
The B node network traffic : TX:11.5MB RX:655MB

After then, running ipfs daemon on C node and I downloaded the file on C node. It costed 20 seconds to get it.
The C node network traffic : TX:20.5MB RX:1.28GB

Finally, running ipfs daemon on D node and I downloaded the file on D node. It costed 35 seconds to get it.
The D node network traffic : TX:35.5MB RX:1.94GB

The experiment seems if there is a file existing in multiple nodes, and a empty node tries to get it, the node will download the file from multiple nodes with replicated data. The download speed will not be improved with multiple nodes and even be worse than two point transfer.

In my original opinion, I thought the ipfs node will try to get different parts of the same file from different nodes to improve the speed. But, my experiment showed ipfs node will try to get an entire file from every nodes it connected. Is IPFS just designed like that? Or is there any way to make the download speed scale up with multiple nodes?

@ivan386

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

Try to pin file on other nodes. When used get ipfs download blocks one by one.

@Juelun

This comment has been minimized.

Copy link
Author

commented Jun 7, 2018

Pinning file on the other nodes is just to make the contents will not be deleted as garbage collection. But it can not resolve my concern. I have tried it. It did not improve the download speed as well.
My point is that if there is a file stored in 3 nodes(A, B, C), and I tried to download it on D node.
Then download speed will not be faster than one node to one node transfer, even slower than one node to one node transfer.
I used iftop tool to monitor the network traffic at each node. I found the output network traffic of each node from A,B,C sent was almost equal to an entire file size. And the input network traffic on D node got was almost equal to three times of the file size.
untitled picture

@ivan386

This comment has been minimized.

Copy link
Contributor

commented Jun 9, 2018

https://github.com/ipfs/go-ipfs/blob/f43811a750cc65626d4c5169da2bc455010bbd8d/exchange/bitswap/message/pb/message.proto

It is protocol problem. IPFS ask same block from all nodes and can't cancel block transfer untill read it full. If other nodes start to send it they can't stop. They can only break connection.

Need to improve bitswap protocol to allow transfer block parts. BitTorrent allow only 16k parts max in one message.

http://bittorrent.org/beps/bep_0003.html

@Stebalien

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2018

Duplicate of #3802

Basically, we need to ask for different nodes from different blocks.


@ivan386 that's unlikely to be a good solution. Blocks are usually a max 256KiB so requesting parts of a single block from multiple parties is unlikely to help much. Worse, we can't verify partial blocks so if some set of peers (possibly controlled by a single attacker) send us a bunch of disagreeing partial blocks, we'll have a combinatorial problem trying to figure out the combination of these blocks that, when concatenated, yield the correct hash.

@ivan386

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

@Stebalien
256KiB is 16 *16KiB. This 16 peers max for speedup. And 16 times less duplicate data from other peers.

In my protocol update (bitswap 1.2.0) i add confirmation of block sending. Offset and length of block part help to assembly it. Tree hashes can be used to check parts of block if it need.

Now nodes can flood with blocks of two megabytes and the peer will not be able to understand that this is a useless block until it loads it completely.

@Stebalien

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

Now nodes can flood with blocks of two megabytes and the peer will not be able to understand that this is a useless block until it loads it completely.

That's correct. That's why we set a limit at 2MiB. We would have gone smaller but we didn't want to add a bunch of round trips.

Tree hashes can be used to check parts of block if it need.

That's effectively what we already have with our merkle trees (just with 2MiB blocks). If/when we add support for a tree-hash based multihash, we can start verifying blocks while downloading them.

However, I still doubt that downloading a single block from multiple peers will help in most cases.

This 16 peers max for speedup.

For tiny files (<256KiB), fetching them from multiple peers won't yield that much speedup. Most of the overhead will be in round trips. Really, with files this small, you'd want to ask for the same chunk from multiple peers to avoid add additional round trips.

And 16 times less duplicate data from other peers.

We can achieve the same result by downloading different blocks from different peers and by not asking multiple peers for the same block.


The question at hand is really: do we parallelize downloads of single blocks or across all blocks in a file? I highly doubt that parallelizing a download of a single 256KiB chunk will help much and it will certainly add a significant amount of complexity.

@ivan386

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

Little parts allow control block downloading. Download can be canceled in any time between 16k parts.

We can achieve the same result by downloading different blocks from different peers and by not asking multiple peers for the same block.

Need to know that they have that blocks. At this time bitswap do not allow that. We can wait for block forever.

That's why i add haveBlocks message bitswap 1.2.0. It contain block size and allow to know what peers have that block that we want. We can chose peer(or peers if block is big) to download it. If some old protocol peer send full block faster then we get it by parts from other than we can just not ask(sendBlock) next parts.

do we parallelize downloads of single blocks or across all blocks in a file?

We can do both. If need one block and is big then it can be downloaded from many peers. If need many small blocks than ask different peers for different blocks. Or do both for many big blocks.

bitswap 1.2.0 can be implemented by steps.

First step is "have" and "send" messages. Have will contain block cid and block size. In send need only cid. Block will be sended full but only from peer that we ask for it. Full cid in "part" message allow to send cancel message before we download full block.

Second step is allow download parts of block. Send will contain cid and range(offset and length) of part that we ask. Block message will contain "part" message that contain cid and offset of part. Than we can assembly block from parts and check it hash. In worst case we download full block from each peer.

Than implement tree multihash. This allow check parts of block. Hashset can be get from one peer that have block.

After that blocks can be bigger. They will be independent from transport layer.

@Stebalien

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

We can and plan on doing all of this with 256KiB blocks. What's the concrete improvement here other than "smaller pieces"?

Remember, our merkledag approach is equivalent to a tree hash with larger chunks. That is, both systems give us a tree of hashes with chunks at the bottom. Our system has chunks that are at most 2MiB in size but usually <=256KiB, yours has 16KiB chunks.

@ivan386

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2018

  1. Less data between messages. In this time until peer send full big block(s) it can't send other messages (want, cancel).
  2. Cancel block download without breaking connection. If we already download it from other peer.
  3. Alow to ask different parts from different peers. If some peer stuck we can ask that part from other. Without full block duplication and timeouts.
  4. We can try UDP as transport.

Only "pin" can effectively download many blocks at one time. For pin block download order doesn't matter.

"Get" and "cut" download blocks one by one. When they go down by block tree they know only one next block hash. On the bottom they download 10 blocks ahead. But they have priority order and peers send it that order. We get many duplicates blocks from peers or we stuck in waiting first block from some peer. Without first block others will be useless and download stuck.

https://github.com/ipfs/go-ipfs/blob/dbaeb8449976d5387f27138b0dd272ec9eef3593/unixfs/io/pbdagreader.go#L69L81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.