Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitswap Sessions #3786

Closed
whyrusleeping opened this issue Mar 15, 2017 · 15 comments

Comments

Projects
10 participants
@whyrusleeping
Copy link
Member

commented Mar 15, 2017

Bitswap is pretty inefficient right now. There is a lot we can do to make it better (i would say it would be hard to make it worse, but i know for a fact that statement is false). I've spent a lot of time thinking of ways forward, and have come to a decent (in my opinion) 'next step'.

This proposal is to add the concept of sessions to bitswap. The idea expands on an existing optimization we do, where we only find providers for the first block in a GetMany request, and make more FindProviders calls later if nobody is sending us blocks.

When starting a request for a portion of a graph (ipfs refs -r, ipfs pin, ipfs cat, etc) we would create a new session with the hash in question as the session 'root'. Bitswap would then broadcast that ref to every bitswap partner. Any node that responds with the block (even if its a duplicate) gets added to that sessions 'active set'. Further hashes requested that are children of the session root will be broadcasted just to those peers (we can even split up requests amongst peers). If a requested hash isnt received in a given time frame, we start a FindProviders call for that hash, and adds the resultant peers to the sessions 'active set' (or perhaps add them to a 'to try set' and to the 'active set' when a block is received from them).

Doing things this way will allow us to not spam uninterested bitswap peers with our wantlist updates, and also allow us to reduce duplicate blocks by splitting up request amongst peers we know are likely to respond.

Feedback and requests for elaboration will be much appreciated.

@lgierth

This comment has been minimized.

Copy link
Member

commented Mar 15, 2017

If we already assume that the "active set" peers likely have all children of the wanted root, couldn't we tell them to "consider all children of $hash wanted by me"? This would be a super light wantlist update :)

And in case they don't have all children of the wanted root, we can adjust reactively just like you're saying. If we don't receive any more wanted blocks from the session (all children considered wanted), then we remove it from the active set, and try FindProviders.

@whyrusleeping

This comment has been minimized.

Copy link
Member Author

commented Mar 15, 2017

@lgierth That should be possible later with ipld selectors, but for now it would be a breaking change to the protocol (and would make the duplicate blocks problem likely worse)

@btc

This comment has been minimized.

Copy link

commented Apr 25, 2017

Is the duplicate blocks problem the primary issue? Are there other considerations to weigh?

(Writing from an uninformed perspective and curious about the scalability challenges.)

@whyrusleeping

This comment has been minimized.

Copy link
Member Author

commented Apr 25, 2017

@btc It depends on the scenario, There are really two main scalability issues that i'm looking to address here. The first is duplicate blocks, which are a pretty bad problem when you are fetching content that multiple peers in the network have. The second problem is that of broadcasting every wantlist update to every connected peer, that bit gets pretty expensive when fetching large items, even from a single other peer.

My proposal (and WIP implementation in #3867 ) reduce both issues, at a small cost to the usecase of fetching a single large graph from a disparate set of peers.

@kevina

This comment has been minimized.

Copy link
Contributor

commented May 2, 2017

@whyrusleeping the idea sound solid, did you implement: "split up requests amongst peers"?

@whyrusleeping

This comment has been minimized.

Copy link
Member Author

commented May 2, 2017

@kevina no, not in my live PR. That bits coming next

@Stebalien

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2017

I think this is a great idea which should reduce the current overhead issue by leaps-and-bounds.

One improvement I'd make would be to gossip more. It should generally be possible to gossip without breaking anything as adding fields to protocol buffers shouldn't break anything (and gossip should always be safe to ignore).

That should be possible later with ipld selectors, but for now it would be a breaking change to the protocol (and would make the duplicate blocks problem likely worse)

One way to (sort-of) do this without sending blocks up-front would be to gossip one of:

  1. An assertion that this peer has none of the children of the sent block.
  2. An assertion that this peer all of the children of the sent block.
  3. A short list of children blocks that the peer has (for space efficiency, we can even just send a bitmask indicating which children we have).

Bitswap would then make sure to ask peers that have already indicated that they have a block before spamming the rest of the peers in the session. This doesn't help pipeline deep and narrow trees as IPLD selectors would but it should help further reduce the number of unnecessary requests.

Another improvement I'd try would be to add in a bit of cross-session gossip. Basically, whenever one talks to a bitswap peer, disregarding sessions, gossip a compressed wantlist (e.g., truncated hashes of the oldest N blocks on our want-list) and the set of blocks we have that we think they want (based on the last compressed wantlist they sent us). If we tune the parameters correctly, this may help ensure we have some potential peers lined up in case we can't find a block within a session (hopefully without introducing too much overhead). However, this could be rather tricky to get right.


Unrelated to gossip, we may also want to (persistently?) cache some of this session information to speed up future requests. For example, given:

   A
 /   \
B     C

If I receive A in a session with peers {S}, I can record that {S} may have B and C and start any sessions for fetching B and/or C with {S} ^ {connected} (skipping the "ask everyone" step). This specific optimization would be really useful when lazily streaming files/directories.

@whyrusleeping

This comment has been minimized.

Copy link
Member Author

commented Jul 4, 2017

@Stebalien Thanks for the feedback!

Some thoughts:

on gossip, I really like the idea, though implementing this would be difficult. The issue is that checking 'do i have all the objects in this graph?' isnt cheap, and its hard to cache as it ends up being a lot of data. A simpler tactic (related to your suggestions) would be to simply list the indexes of the links of this object that you also have. you could even extend this to do your third suggestion by doing index paths. This would work because the set of all links for a given node is deterministically ordered.

The compressed wantlist idea is interesting, though sounds like it would get complicated fast. I've already put a significant amount of effort into making sure we keep track of who we tell about wanting what (to help avoid receiving duplicate blocks), spreading the wantlist more generally ups that complexity quite a bit.

@Stebalien

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2017

The compressed wantlist idea is interesting, though sounds like it would get complicated fast.

I agree.

I've already put a significant amount of effort into making sure we keep track of who we tell about wanting what (to help avoid receiving duplicate blocks), spreading the wantlist more generally ups that complexity quite a bit.

The compressed wantlist idea is about telling peers about what we may want from them, not what we want right now, so that they can gossip what they have do have in case we we want it in the future; we wouldn't expect them to send the actual blocks up-front. Basically, the general idea of all this gossiping is to learn of where to find what we may want in the near future.

That is, we gossip:

  • What we predict we may want from our peers (in the near future): the compressed maybe-wantlist, etc.
  • What we predict each of our peers may want (in the near future): cids of children of blocks on their wantlist/maybe-wantlist, things on their maybe-wantlist that we have, etc.

This way, when we go to fetch a block, we will likely already have a list of candidate peers.

Note: By gossip, I really do mean gossip. This shouldn't trigger any additional background messages, just some extra bytes tacked onto existing messages.

@Stebalien

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2017

  • Reduce duplicate blocks by ranking peers: #4396
@peteryj

This comment has been minimized.

Copy link

commented Sep 12, 2018

Hi guys, I just want to know if bitswap session cloud solve my problem?

From the git log I'm afraid not. But this issue is opened, so...

https://discuss.ipfs.io/t/big-file-transferring-problem/3819

Thanks!

@rklaehn

This comment has been minimized.

Copy link

commented Sep 21, 2018

So do I understand this correctly that at the moment, when some piece of data is available on a lot of peers, that actually makes things worse because the poor requester will be flooded with answers from all the providers? This would seem to be a serious problem. Any recent work on this?

@ivan386

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2018

@rklaehn Yes. From all connected providers that have block. Work is here: ipfs/go-bitswap#8

@momack2

This comment has been minimized.

Copy link
Contributor

commented Dec 3, 2018

@hannahhoward is this one of the things you were planning to start on next in Bitswap land?

@Stebalien Stebalien removed the ready label Dec 18, 2018

@Stebalien Stebalien added backlog in progress and removed backlog labels Dec 18, 2018

@momack2 momack2 added this to In Progress in ipfs/go-ipfs May 9, 2019

@eingenito

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

I think this issue has been addressed by bitswap session improvements or superseded by discussion of GraphSync.

@eingenito eingenito closed this May 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.