Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Transfer speed does not improve with multiple nodes. #5083
I used iftop tool to monitor network traffic on each node.
When I ran ipfs daemon on B node and got the file on B node, it costed 26 seconds to get it.
After then, running ipfs daemon on C node and I downloaded the file on C node. It costed 20 seconds to get it.
Finally, running ipfs daemon on D node and I downloaded the file on D node. It costed 35 seconds to get it.
The experiment seems if there is a file existing in multiple nodes, and a empty node tries to get it, the node will download the file from multiple nodes with replicated data. The download speed will not be improved with multiple nodes and even be worse than two point transfer.
In my original opinion, I thought the ipfs node will try to get different parts of the same file from different nodes to improve the speed. But, my experiment showed ipfs node will try to get an entire file from every nodes it connected. Is IPFS just designed like that? Or is there any way to make the download speed scale up with multiple nodes?
It is protocol problem. IPFS ask same block from all nodes and can't cancel block transfer untill read it full. If other nodes start to send it they can't stop. They can only break connection.
Need to improve bitswap protocol to allow transfer block parts. BitTorrent allow only 16k parts max in one message.
Duplicate of #3802
Basically, we need to ask for different nodes from different blocks.
@ivan386 that's unlikely to be a good solution. Blocks are usually a max 256KiB so requesting parts of a single block from multiple parties is unlikely to help much. Worse, we can't verify partial blocks so if some set of peers (possibly controlled by a single attacker) send us a bunch of disagreeing partial blocks, we'll have a combinatorial problem trying to figure out the combination of these blocks that, when concatenated, yield the correct hash.
Now nodes can flood with blocks of two megabytes and the peer will not be able to understand that this is a useless block until it loads it completely.
That's correct. That's why we set a limit at 2MiB. We would have gone smaller but we didn't want to add a bunch of round trips.
That's effectively what we already have with our merkle trees (just with 2MiB blocks). If/when we add support for a tree-hash based multihash, we can start verifying blocks while downloading them.
However, I still doubt that downloading a single block from multiple peers will help in most cases.
For tiny files (<256KiB), fetching them from multiple peers won't yield that much speedup. Most of the overhead will be in round trips. Really, with files this small, you'd want to ask for the same chunk from multiple peers to avoid add additional round trips.
We can achieve the same result by downloading different blocks from different peers and by not asking multiple peers for the same block.
The question at hand is really: do we parallelize downloads of single blocks or across all blocks in a file? I highly doubt that parallelizing a download of a single 256KiB chunk will help much and it will certainly add a significant amount of complexity.
Little parts allow control block downloading. Download can be canceled in any time between 16k parts.
Need to know that they have that blocks. At this time bitswap do not allow that. We can wait for block forever.
That's why i add haveBlocks message bitswap 1.2.0. It contain block size and allow to know what peers have that block that we want. We can chose peer(or peers if block is big) to download it. If some old protocol peer send full block faster then we get it by parts from other than we can just not ask(sendBlock) next parts.
We can do both. If need one block and is big then it can be downloaded from many peers. If need many small blocks than ask different peers for different blocks. Or do both for many big blocks.
bitswap 1.2.0 can be implemented by steps.
First step is "have" and "send" messages. Have will contain block cid and block size. In send need only cid. Block will be sended full but only from peer that we ask for it. Full cid in "part" message allow to send cancel message before we download full block.
Second step is allow download parts of block. Send will contain cid and range(offset and length) of part that we ask. Block message will contain "part" message that contain cid and offset of part. Than we can assembly block from parts and check it hash. In worst case we download full block from each peer.
Than implement tree multihash. This allow check parts of block. Hashset can be get from one peer that have block.
After that blocks can be bigger. They will be independent from transport layer.
We can and plan on doing all of this with 256KiB blocks. What's the concrete improvement here other than "smaller pieces"?
Remember, our merkledag approach is equivalent to a tree hash with larger chunks. That is, both systems give us a tree of hashes with chunks at the bottom. Our system has chunks that are at most 2MiB in size but usually <=256KiB, yours has 16KiB chunks.
Only "pin" can effectively download many blocks at one time. For pin block download order doesn't matter.
"Get" and "cut" download blocks one by one. When they go down by block tree they know only one next block hash. On the bottom they download 10 blocks ahead. But they have priority order and peers send it that order. We get many duplicates blocks from peers or we stuck in waiting first block from some peer. Without first block others will be useless and download stuck.