Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Bitswap Sessions #3786
Bitswap is pretty inefficient right now. There is a lot we can do to make it better (i would say it would be hard to make it worse, but i know for a fact that statement is false). I've spent a lot of time thinking of ways forward, and have come to a decent (in my opinion) 'next step'.
This proposal is to add the concept of sessions to bitswap. The idea expands on an existing optimization we do, where we only find providers for the first block in a
When starting a request for a portion of a graph (
Doing things this way will allow us to not spam uninterested bitswap peers with our wantlist updates, and also allow us to reduce duplicate blocks by splitting up request amongst peers we know are likely to respond.
Feedback and requests for elaboration will be much appreciated.
If we already assume that the "active set" peers likely have all children of the wanted root, couldn't we tell them to "consider all children of $hash wanted by me"? This would be a super light wantlist update :)
And in case they don't have all children of the wanted root, we can adjust reactively just like you're saying. If we don't receive any more wanted blocks from the session (all children considered wanted), then we remove it from the active set, and try
This was referenced
Mar 20, 2017
@btc It depends on the scenario, There are really two main scalability issues that i'm looking to address here. The first is duplicate blocks, which are a pretty bad problem when you are fetching content that multiple peers in the network have. The second problem is that of broadcasting every wantlist update to every connected peer, that bit gets pretty expensive when fetching large items, even from a single other peer.
My proposal (and WIP implementation in #3867 ) reduce both issues, at a small cost to the usecase of fetching a single large graph from a disparate set of peers.
I think this is a great idea which should reduce the current overhead issue by leaps-and-bounds.
One improvement I'd make would be to gossip more. It should generally be possible to gossip without breaking anything as adding fields to protocol buffers shouldn't break anything (and gossip should always be safe to ignore).
One way to (sort-of) do this without sending blocks up-front would be to gossip one of:
Bitswap would then make sure to ask peers that have already indicated that they have a block before spamming the rest of the peers in the session. This doesn't help pipeline deep and narrow trees as IPLD selectors would but it should help further reduce the number of unnecessary requests.
Another improvement I'd try would be to add in a bit of cross-session gossip. Basically, whenever one talks to a bitswap peer, disregarding sessions, gossip a compressed wantlist (e.g., truncated hashes of the oldest N blocks on our want-list) and the set of blocks we have that we think they want (based on the last compressed wantlist they sent us). If we tune the parameters correctly, this may help ensure we have some potential peers lined up in case we can't find a block within a session (hopefully without introducing too much overhead). However, this could be rather tricky to get right.
Unrelated to gossip, we may also want to (persistently?) cache some of this session information to speed up future requests. For example, given:
If I receive
@Stebalien Thanks for the feedback!
on gossip, I really like the idea, though implementing this would be difficult. The issue is that checking 'do i have all the objects in this graph?' isnt cheap, and its hard to cache as it ends up being a lot of data. A simpler tactic (related to your suggestions) would be to simply list the indexes of the links of this object that you also have. you could even extend this to do your third suggestion by doing index paths. This would work because the set of all links for a given node is deterministically ordered.
The compressed wantlist idea is interesting, though sounds like it would get complicated fast. I've already put a significant amount of effort into making sure we keep track of who we tell about wanting what (to help avoid receiving duplicate blocks), spreading the wantlist more generally ups that complexity quite a bit.
The compressed wantlist idea is about telling peers about what we may want from them, not what we want right now, so that they can gossip what they have do have in case we we want it in the future; we wouldn't expect them to send the actual blocks up-front. Basically, the general idea of all this gossiping is to learn of where to find what we may want in the near future.
That is, we gossip:
This way, when we go to fetch a block, we will likely already have a list of candidate peers.
Note: By gossip, I really do mean gossip. This shouldn't trigger any additional background messages, just some extra bytes tacked onto existing messages.
referenced this issue
Jan 17, 2018
Hi guys, I just want to know if bitswap session cloud solve my problem?
From the git log I'm afraid not. But this issue is opened, so...
So do I understand this correctly that at the moment, when some piece of data is available on a lot of peers, that actually makes things worse because the poor requester will be flooded with answers from all the providers? This would seem to be a serious problem. Any recent work on this?