Recently, I've mostly been working on this orc related RTS game, currently named Runes of Yore. I'm using the Slick library to render client side, and I've been recording progress over here. I was asked on a news post to explain a little more about what I meant by Multiplayer Online Persistent RTS and how that effects networking. Given it's not really Slick related I thought I'd bring the rambling over here.

Ok, so what is it I want to write. Well, most of the time I'm not 100% on this but here's what I think I mean by the terms above:

Multiplayer - more than one player can connect to the game and play against others. I'm not claiming "Massively Multiplayer" because I'm of the opinion that this isn't possible for a non-funded operation. Obviously it'd be wonderful if the game did support this and hence the design has to try and account for that.

Online - the game is going to be playable across the internet. This brings some interesting networking challenges. If a game is only LAN playable you can take some fantastic liberties with regards to latency. On the wonderous entity that is the internet, players will expect to be able to play their friend on the other side of the world without issue. Factor in round-trip times of around 300-400 ms.

Persistent - the game world exists at all time, whether players are logged in or not. The game events continue to happen when you're not involved, so the game state may be completely different when you connect next time.

RTS - (game). The game to be played will be a real-time strategy, that is you control a group of units which you can order to take actions in the game. You use strategy to defeat other players or achieve goals. This gives some interesting problems for networking, in many games the player controls one character - which must be sycnrhonized between the clients. However, in an RTS the player can control 10's sometimes 100's of units - each of these needs ot be synchronized across the network - even with a small number of players we may be talking about 100's of "characters" to sync.

So, ok, thats what I'm talking about - but how is this applied in Runes of Yore (ROY)? The game of ROY places you as a mage in the world of Yore. As mage you're part of one of the four(?) factions of magic. When joining the game you'll see the top level strategic map showing the complete world of Yore and the current state of each province within the world. Provinces will be owned by different factions, one of the objectives being that you help your faction to take over the world.

Objective 1: Take over the world as part of your faction.
Reward: Kudos on the faction chat

Magic in the world is fueled by several things. Primarily the mage's focus is used to guage how many different spells he or she can have cast and maintain at the same time. This magical focus grows with experience which is gained during skirmishes.

Objective 2: Gain experience, improve magical focus.
Reward: Ability to control more units, cast more spells

The second thing that fuels magic in the world is the Runes. (story mode on) Long ago the mages of Yore discovered that the magical stones that grew in the world. With time and concentration the mages learnt that this ornate stones could be absorbed into the body posessing the absorber with a intense feeling of power. Before the mages came to understand the powers of casting, the runes and their absorbtion were seen as a narcotic. It was only as the elders learnt the words of powers did they realise the importantce of these power giving artefacts. (store mode off). The runes spawning is controlled by the "winds of magic" which can be visualised on the strategic map. Like a weather system, the "high pressure" areas will have a higher spawn rate of runes.

Objective 3: Gather runes for power
Reward: Abilitiy to continue fighting, experience and statistical gains (bragging rights)

So, our user is at the world strategic map. They see a province is under attack by another faction and delve into that province. On selecting the province they're taken to a game play screen in which they can place their mage (by casting it into existence) into the province and begin defending.

Defending/Attacking consists of conjuring units (orcs, humans, archers etc) into the world or by throwing direct spell attacks (fireball, protection etc). Within the world the mage can find and claim runes. The process of absorbing runes takes time and focus. The mage must be defended from direct attack while this takes place otherwise the rune can not be absorbed. Hence the mage must use their units to run interference while they take the rune - getting more magic - allowing the to cast more units and on and on.

The only other facet of the game thats been thought about so far is how your faction takes ownership of the province. The provinces will have a selection of wizard/mage towers. Claiming these will increase your faction standing in a province and potentially give some bonus elsewhere. As you can tell the game isn't what you could call planned out. It's just a seed of an idea that I'd like to see written some day. Friends and I (Endolf, Snodge, Tweety, Dan) have been talking this idea over since I worked at Lucent - whats that, 5 or 6 years ago.

The original point of this blog was to talk about networking issues involved, it seems to have turned into a big of a game design session, sorry about that. So, on to the networking. Looking at the game described above and the descriptions for the terms there are several issues to work round:

  • Networking across the internet can have large latency (a problem for all internet games)

  • In an RTS there are a large number of units, 5 players could mean 500 units (unlikely, but possible)

  • The game world needs to be persistent, this means a central constant server - stability.

Fit's into Project Darkstar quite nicely, don't you think? Well at least the third point does, the darkstar server software should allow me to scale the game later to a fault tolerant architecture. Right now the networking layer is abstract over a simple TCP implementation - however it's explicitly designed with the purpose of porting to DarkStar later.

Looking at the first two we have the same issues as most network games, only on a bigger scale. How can the game sycnhonize a large number of units between clients.

The first approach most people think of is simple sending the state of the units across the network several times a second - thus keeping everything in sync. First, this only works for an extremely limited amount of data. Second this isn't friendly to bandwidth, if you factor in how many players you want to support against how much data you'll be sending a second - the cost of server package to support this become prohibitive for a indie/hobbiest. You could probably get away with it if you just wanted to throw cash away, but big business normally means big player base, so it's no good for anyone.

Another, possibly better approach is the Quake 3 style networking model. Only send what's changed in the world - and compress these changes by concatonating them. If Bob moved from A to B then B to A, Bob didn't move. Compress these packages into UDP packets and spit them out all the clients. Make sure the client sends an ack of a frame number so you can stop sending them changes up to that point. This could work for Runes, but it still seems like too much data to me.

The approach I'm taking with ROY is derived from a method sometimes called "lock step" - that I learnt about talking to Jeff (Sun) and Elias (OddLabs) - obviously this is my understand of it, and not their fault if I get it wrong ;). Each client has a game world, I refer to as a deterministic simulation due to some career background I have ;). The server periodically sends out a message to the clients to tell them to move their simulation forward a step. Each client does so, gathers any requests they have, and sends back the requests and confirmation that the simulation has been moved forward. The server locks until all clients tell it they've moved forward. It then applies the requests it's recieved to the simulation (sending the to the clients as well). Finally it steps the simulation again. The benefit here is that only the original user input has to be sent, so if I move 100 units to 10,10 then only move, 100 IDs and the 10,10 needs to be sent once. Since the simulations running on all clients and the server are deterministic and they are updated in identical ways - they stay in perfect synchronization.

Of course there are down sides to this:

  • Complexity, the simulation data model has to be perfectly deterministic. In Java this means strictfp everywhere and some careful check-summing :)

  • Performance, there is nearly always a delay between requesting an action and seeing the response. You can't do much client side before the server has the message, simply because you don't want to effect your game state so it becomes inconsistent. Visual effects help here, and for an RTS it's not a problem as long as the delay remains constant - click to order, 200 ms later, the unit moves. It feels pretty good actually.

  • One client lags them all! This is the only bit I'm not happy with, if one client lags out a big, then all the other clients have to wait for him/her based on the server lock.
So, one client lags them all. Ick. I've started using a slightly different method to lock step that I normally refer to as "loose step". With this method the server keeps moving the simulation forward. The server sends out a keep alive periodically (about 4 times a second) which is a single byte message. The clients are running their own simulations, which are intentionally hanging back a bit from the current simulation time. The clients are free to move their simulations forward also, however, they must not move their simulations past the time of the last server message they recieved.

With me so far? We have a server moving it's simulation forward, and client's lagging behind the server a bit also moving their simulations forward but never overtaking the server. There's some jiggery pokery to speed up or slow down clients as they get too far or too close to server time.

When a client submits a request to change the world the server sends it to all clients, with a timestamp of the current server time. Each client recieves it and schedules it into thier simulation. There is no way the client has moved past the time of the request, because they're not allowed to move past the server time. The simulations on all clients and server proceed and stay synchronized. The really nice thing about this method is that if a client lags the only person that notices is that client, their orders take longer to act out. They can also catch up by running their simulation a little faster. All of this with minimum network traffic. Woot! :)

The final complexity here is the fact the game is persistent, a client can join a simulation half way through but have no state to apply requests to. The client has to download the state of the game from the server - no problem. However, we haven't talked about random numbers. The simulation may want to have random influences, but must stay deterministic. To achieve this the random number generator is seeded with a common value for all client. Unfortunately the client that joins half way through can get the seed but can't determine where in the random sequence everyone is. This is handled by reinitialisng the random number generator with a new common seed every time someone connects to the game world.

And that.. is that. What it isn't a brute force approach to network - "hey, everyone has broadband now, just sync everything all the time". It would work for physics - though you wouldn't be able to effect the world instantly.

Ok, apologies for the long blog post, it got out of control. It's been pretty useful for me writing it all down, it's been stuck in my head for a while now. To be honest, I'm not sure if thats really helpful to anyone else but me, but heh, for once my blog works for me :)

Well, your post has been

Well, your post has been helpful to me if no one else! I'm toying with my own lockstep network model and am discovering the joys of strictfp and checksums covering my code. It's always enlightening to compare and contrast with other designs. In my case, though, it's mostly comparison: our architectures are remarkably similar.

All that I really have to add is the observation that making a lockstep model work for a more action-oriented game is more difficult than I would have imagined. The 100ms lag that was perfectly acceptable for an RTS now feels awkward and... well, slow. My highly imperfect "solution" has been simply to adapt the game mechanics. It works, but I can't help but feel a little ashamed, heh.

Nice post there Kev!

Nice post there Kev!
It sure made me think over the network procedures for my game, I dont think I'll begin working on it quite yet but it's good to be well prepared for the future =)
I'll probably go with the Quake 3 way of wich you speak of, and the game is simular to q3 as well so this should work excellent =)
Nice to here more indeep where you want to go with ROY as well! Did you manage to find any good way of producing them sprites btw?

KEV: Seems that there are some max scripts for producing sprites. Still causes me to need to have Max tho, might just use the trial to knock out some conversions and stick with that. Unless of course I can find someone who can do the leg work for me :)

If you reinitialize your

If you reinitialize your random number generator every time a new client connects, won't your numbers be less random all around?

Kev: Only as pseudo-random as any generation. The reinitialisation of the random number generator uses a new (random) seed, so the new sequence is as random as the original.

One possible issue with this

One possible issue with this is if your "protocol" were to be open to things like bots or if it were hacked.

A malicious user could gain a matrix'esque (or maybe jedi?) type power of being able to see into the future and act optimally accordingly on whatever your maximum time horizon was.

Greg

Kev: Sure, but it'd be a very small lookup ahead (probably less than a second) in a real life scenario.

Wouldn't a full/fractional

Wouldn't a full/fractional second of information make a big difference in many games? I mean people seem pretty much set on their belief that 20-50ms makes a huge difference what about 500?

Cheers,

Greg

Kev: In a game that's an FPS it's absolutely true that 100ms could make all the difference. In something like a street fighter 2 fighting game then 20ms could make the difference. However, I'd onlyv use the described approach for something like an RTS or point and click RPG - where even a second or two wouldn't really hurt. What it does allow a large number of units with low bandwidth and consistent latency.

Not to mention, everyone would be seeing the same lag behind the server all conditions being well met. One of the things the Diablo post mortem says is that they discovered that users can cope with lag as long as it's consistant, so it's better to buffer them from the lag rather than expose them to the constant changes of the network.

The downside is that with a really laggy connection some people are going to be seeing evens from some time ago. However, if you've got a seriously laggy connection that'd cause that there isn't much option.

Thats funny kev, I was

Thats funny kev, I was googleing "RTS network approach" and this was the first hit... its funny because: I am using slick (btw, hope the forum is back soon)
and 2. just yesterday i implemented exactly the same strategy. The server kicks off all clients which then run independently, but they never run ahead the server "ticks" actually the min is 10 ticks behind.

if a client is too slow with rendering and its update interval overflows, then it would lag behind the server and that would sum up during the game. I limit the ticks to 50 which a client is allowed to lag behind. if the lag grows over 50, i just do some game.update() in a loop to keep it below 50. the player gets a stucking game experience, but he/she cant complain if the machine doesnt make the 30FPS...

but still, i want no lag to be visible to any player. when i build a tower, it should be there immediatley. if there is a 200ms lag, then this doubles because the command is send to the server, and then broadcasted to the clients to ensure sync and i would have to wait half a second to see my tower... thats worst case, but too much. this is how i plan to do it:

Player a: 0ms lag
Player b: 200ms lag

- Towers get build immediately in the local game, dont care what tick you are.
- the command is also send to the server and broadcasted to the clients, containing the gametime it WAS executed
- Payer A build a tower, Player B gets the command a bit later, but knows that the command is overdue.
The world is moved back in time, palcing the command at the correct time and then advancing the world back to the present time again.

the only thing i can think of is that in some cases, a high latency player builds a *critical* tower to kill a monster. from the other players POV, the tower appears and the monster disappears and no bullet was seen. but thats OK, its only visiual, and usually you dont care about the bullets.

Dan

PS: maybe we can discuss this in the forum, would help me.

Kev: Sure though I think you're attempting to solve the problem with game specifics, which is of course ideal. However, this article was really about a generic starting point.