ALEMIL

Welcome to a blog about application and game development

Bitmask - why, how and when

Bitmask - why, how and when 2018-03-19 22:03:00
Est. read time: 3 minutes, 34 seconds

Bitmasks are a very useful way to compress multiple boolean flags in a single variable. It can reduce memory usage and operations on bits are basically as fast as they can get. In practice, any time you want to have multiple flags describing something in your application, a bitmask could be the right tool for the job.

Using multiple booleans

Let’s say we want to create some generic animals in our application with below properties:
bool canRun;
bool canBark;
bool hasTail;
bool canJump;
bool canFly;
bool isCute;
We could add many more, but for the purpose of explanation this will do.
Now let’s describe a dog using these properties:
// Dog
bool canRun = true;
bool canBark = true;
bool hasTail = true;
bool canJump = true;
bool canFly = false;
bool isCute = false;
And, just for contrast, let’s add a parrot:
// Parrot
bool canRun = false;
bool canBark = false;
bool hasTail = true;
bool canJump = false;
bool canFly = true;
bool isCute = true;
First thing worth considering is that it takes 6 variables and 6 bytes (48 bits) of memory space (for simplification I’m assuming that you use a language where a single boolean takes 1 byte in memory, like in C++, but this value can vary) to describe every animal. It is a personal taste if having many boolean flags is a good or bad choice in terms of readability, but undeniably there is a more efficient way of holding this data in our memory.

Using a bitmask

Let’s change the previous example to use a bitmask. First we will need some flags. We create them as constants, where the value of a newly added is a previous value multiplied by 2.
const unsigned int CAN_RUN = 1 << 0; // 1
const unsigned int CAN_BARK = 1 << 1; // 2
const unsigned int HAS_TAIL = 1 << 2; // 4
const unsigned int CAN_JUMP = 1 << 3; // 8
const unsigned int CAN_FLY = 1 << 4; // 16
const unsigned int IS_CUTE = 1 << 5; // 32
The above sign << represent a bit shifting operator. I will explain it in detail later. This is easy to write because you don’t have to calculate the power of 2 for every new flag, just increment the value to the right of operator << when adding a new flag.

Unsigned integer in C++ will take 4 bytes (32 bits) of memory for every flag or property variable of our animals. Let's add a single variable that will hold all of this information about what our animals can do.
// Animal
unsigned int properties = 0;
We have constants defining the value for each property, and we have a base animal that will hold a single unsigned integer. 0 means that no masks have been assigned to the animal which translates to having all booleans set to false by default. Let’s make our new dog and parrot now.
// Dog
unsigned int properties = 15;
// Parrot
unsigned int properties = 52;
To come up with these numbers, we simply add all the necessary masks together using bitwise OR operator, a | pipe character:
// Dog
CAN_RUN (1) | CAN_BARK (2) | HAS_TAIL (4) | CAN_JUMP (8) = 15
// Parrot
HAS_TAIL (4) | CAN_FLY (16) | IS_CUTE (32) = 52
If some time will pass, our parrot could stop being cute as it repeats something annoying over and over again. We could then toggle the IS_CUTE flag from the properties bitmask with the ^ sign. It will either remove or add it depending on the current state of the bitmask.
parrot.properties ^= IS_CUTE;
Or make sure to remove it no matter the current state by unsetting the IS_CUTE flag with bitwise AND & operation on the flag, unset by the bitwise NOT ~ operator. In this example, ~IS_CUTE means every flag except IS_CUTE is set to true and we want to get the intersection of it with parrot properties. The result is all the flags that were already set, except IS_CUTE.
parrot.properties &= ~IS_CUTE;

Bitwise logical operations

We are holding all these properties efficiently, we know how we can add or remove them, but how can we actually check if our dog has a tail? We can do this by using a bitwise AND operator and compare the results:
if ((dog.properties & HAS_TAIL) == HAS_TAIL)
The above translates to: give me every mask that exists both to the left and to the right of the & ampersand character and compare == to see if what I need is the result. The above will return true since our dog has a tail.

This was the simplest use case, just checking if a single object has a property. Let’s say we want to compare our dog to our parrot and see if any one of them have a tail.
if (((dog.properties | parrot.properties) & HAS_TAIL) == HAS_TAIL)
The | pipe character will give us a bitmask consisting of every flag that is in a dog or in a parrot and & ampersand will check if there is a HAS_TAIL flag in the result, filtering all the rest out.

We could also check if exactly one of the animals have a certain property, but not both, by using ^ XOR (exclusive OR) operator.
if (((dog.properties ^ parrot.properties) & HAS_TAIL) == HAS_TAIL)
And the last operator is ~ tilde, bitwise NOT operator. It inverts the bitmask, turning true values to false and vice versa.
if ((~dog.properties & CAN_FLY) == CAN_FLY) // returns true, as our dog cannot fly

Bits in memory

There is a bit more advanced use case for bitmasks, which is bit shifting. But before we can get to it we should know how this data is being held in memory. Every bit can be either 1 or 0. You can think of this as something being on (1) or off (0), true (1) or false (0). A single byte is 8 of such bits. Let’s visualize an unsigned integer for our bitmask, which will be 4 bytes of 8 bits each in a 32bit application:
unsigned int properties = 0; 
// Is represented in memory as:
// 00000000 00000000 00000000 00000000
Now let’s add our CAN_RUN(1) flag to the properties and see what will happen.
properties |= CAN_RUN;
// 00000000 00000000 00000000 00000001
First bit starting from the right side turned to 1, which can be read by out application as an unsigned integer of value 1. Let’s add CAN_BARK (2) flag
properties |= CAN_BARK;
// 00000000 00000000 00000000 00000011
Now our properties hold 2 flags: CAN_RUN (1) and CAN_BARK (2), which also can be read as an unsigned integer of value 3.
In case this is not apparent yet, let’s add our last flag, IS_CUTE (32) to the bitmask.
properties |= IS_CUTE;
// 00000000 00000000 00000000 00100011
Since our IS_CUTE flag is the 6th flag starting from the beginning, to turn it on we just change the value of the 6th bit counting from the right side to a value of 1. These bytes can be read by our application as an unsigned integer of value 35 (32 + 2 + 1, values from 3 of our flags turned on).

We can see that our simple example of 6 flags doesn’t affect a lot of bits, but this memory is still reserved because our application needs these 4 bytes of memory in case the integer will increase its value. We could save some memory by changing our unsigned integer to unsigned char, which takes just 1 byte (8 bits). Keep in mind that this would limit the number of flags you can have to 8. We will get back to this near the end, so let’s not worry too much about it now.

Bit shifting

Now we know how our variables are represented in memory, how we can manipulate bits and how they affect the values of our variables. So let’s get to another interesting thing we can do with bits of our bitmask - bit shifting.

Let’s use our dog example:
unsigned int properties = CAN_RUN | CAN_BARK | HAS_TAIL | CAN_JUMP;
// 00000000 00000000 00000000 00001111
We can move all of our bits by using << left shift operator or >> right shift operator. Let's see what will happen when we try to move bits of our dog with << left shift operator.
properties = properties << 1;
// 00000000 00000000 00000000 00011110
This moved all our flags forward, which essentially made our dog not being able to run anymore but gave him the ability to fly.

If we would have used >> right shift operator instead, we would move the first flag out of bounds of our variable, discarding it.
properties = properties >> 1;
// 00000000 00000000 00000000 00000111
Shifting bits is faster than regular calculations like multiplying or division, but this kind of microoptimization is something that your compiler might be doing on its own.

If you remember how we wrote our constants at the beginning of this article you might notice that by writing a higher number to the right of the operator, we can shift bits by multiple places.
properties = 1 << 3;
// 00000000 00000000 00000000 00001000
That’s essentially it for bit shifting. There is just one more thing you should know about bitmasks.

Limitations of bitmasks

As you might have noticed already, the number of flags you can have in a single bitmask is restricted. The 32-bit integer will allow for 32 different flags in a single bitmask. To surpass this limitation you would have to move bits into another variable and have a solution that would keep track of your flags across multiple variables.

Bitmask is a great option if you will not have more flags than what is allowed in a single variable to gain great efficiency for data manipulation and reduced memory footprint.

Another advantage is that having 32 flags in a single variable can be a great way to reduce bloat of managing 32 different variables. Even if doing this in code can be arguable, it can be a lot more significant in json files, or SQL database.

However, as it is right now, you might rarely need this solution for performance and memory reduction of your applications. An array of booleans will also solve the bloat issue, max amount of flags issue and might not need from you to remember all these bitwise operators.

If you can spare the memory and you don’t know if you will want more flags than 32, such an array may be a safer bet.

Keep in mind that even a boolean is taking a single byte in C++, which means an array of booleans take 8x the amount of memory. 32 different flags represented in an unsigned integer bitmask would take 4 bytes, but an array of booleans 32 bytes. This is relatively not much of a difference until you work with hundreds of millions of records or on a very resource constrained machine.

On the other hand, some languages use more memory for basic variable types, like booleans. In PHP array, booleans takes the same amount of RAM as integers. This makes 32 boolean flags taking 32x the amount of memory + array footprint instead of just the size of a single integer. That can make a significant difference.

In conclusion, bitmask will definitely be a great tool in your arsenal. Just remember that it does not come without any tradeoffs.



Comments +

Rick
"CAN_RUN (1) | CAN_BARK (3) | HAS_TAIL (4) | CAN_JUMP (8) = 15"
should be:
"CAN_RUN (1) | CAN_BARK (2) | HAS_TAIL (4) | CAN_JUMP (8) = 15"
2018-06-11 08:30:34
Reply
Artur
In response to @Rick
Good catch, thanks!
2018-06-11 16:08:47
Reply
alfC
Excellent post. Question is this idiom really necessary?

`if ((dog.properties & HAS_TAIL) == HAS_TAIL) ...`

won't it be equivalent to this in the boolean sense?

`if (dog.properties & HAS_TAIL) ...`

Perhaps you are contemplating that `HAS_TAIL` can also be a negative property, like `HAS_NO_TAIL`?
2018-11-28 18:49:43
Reply
Alex Pelham
alfC: I agree with you, and I find the latter more intuitive and less confusing:
if(property & HAS_TAIL){
}

In fact one would have to be careful to not introduce a bug and confuse the former with this:
If((property & HAS_TAIL) == property) {
}
Which checks that *only*that bit is set on property.
Cheers
Alex
2019-02-01 14:12:12
Reply