Optimizing C++ string parsers using hashing functions and switch/cases

2024-07-15
0 views

Over the past few weeks I have been working on a new React Native framework - a foundation which allows developers to build native C++, Swift or Kotlin modules with minimal overhead and maximum flexibility. 👀

The idea is to have a JSIConverter<T> C++ class, which can convert from- and to JS values:

template <typename T>
struct JSIConverter<T> {
  // Converts `jsi::Value` to `T`
  static T fromJSI(jsi::Runtime& runtime, const jsi::Value& arg);
  // Converts `T` to `jsi::Value`
  static jsi::Value toJSI(jsi::Runtime& runtime, T arg);
};

One specific case I found interesting was enum parsing - how do you convert a JS enum to a C++ enum? 🤔

Converting a JS enum to a C++ enum

Well, for true JS enums it's easy - they are just numbers!

enum Powertrain {
  ELECTRIC,
  GAS
}

In Powertrain, each enum member is essentially just a number. ELECTRIC is 0, and GAS is 1. This means, we can just treat it as an int (cast it), and it's already perfectly typed. 🥳

So in my custom code-generator, I can now generate the following code:

enum class Powertrain {
  ELECTRIC = 0,
  GAS = 1,
};
 
template <>
struct JSIConverter<Powertrain> {
  static Powertrain fromJSI(jsi::Runtime& runtime, const jsi::Value& arg) {
    int enumValue = JSIConverter<int>::fromJSI(runtime, arg);
    return static_cast<Powertrain>(enumValue);
  }
  static jsi::Value toJSI(jsi::Runtime& runtime, Powertrain arg) {
    int enumValue = static_cast<int>(arg);
    return JSIConverter<int>::toJSI(enumValue);
  }
};

This is not only very short code, it is also extremely efficient - there is no ifs, no strings and no parsing happening as it's just casting operations. 💪

But let's say we're modern JS developers, and we want to use discriminating unions - just like all the other cool kids.

Converting a TS discriminating union to a C++ enum

A discriminating union is essentially just a string.

type Powertrain = 'electric' | 'gas'

It's main benefit is that it feels more modern, and is easier to type:

function findBestCar(powertrain: Powertrain): Car { ... }
 
// JS enum
const car = findBestCar(Powertrain.GAS)
// TS union
const car = findBestCar('gas')

So how do we convert this to a C++ enum then? Let's try to just generate if branches for each possible value:

enum class Powertrain {
  electric,
  gas,
};
 
template <>
struct JSIConverter<Powertrain> {
  static Powertrain fromJSI(jsi::Runtime& runtime, const jsi::Value& arg) {
    std::string unionValue = JSIConverter<std::string>::fromJSI(runtime, arg);
    if (unionValue == "electric") {
      return Powertrain::electric;
    } else if (unionValue == "gas") {
      return Powertrain::gas;
    } else {
      throw std::runtime_error("Invalid value! " + unionValue);
    }
  }
  static jsi::Value toJSI(jsi::Runtime& runtime, Powertrain arg) {
    switch (arg) {
      case Powertrain::electric:
        return JSIConverter<std::string>(runtime, "electric");
      case Powertrain::gas:
        return JSIConverter<std::string>(runtime, "gas");
      default:
        throw std::runtime_error("Invalid value! " + std::to_string(arg));
    }
  }
};

Great - this works fine, but have you ever thought about what actually happens if you compare two strings? 🤔

To compare two strings for equality, we need to iterate over each character in both strings and compare the chars one by one - so for electric we have a loop of 8 iterations, and one char comparison per iteration:

e == e
l == l
e == e
c == c
t == t
r == r
i == i
c == c

This is fine for pretty much every case - but we just want to unnecessarily optimize things, so let's dive in a bit deeper.

How do string comparison if-branches scale?

What happens at scale? Let's say you have the following type:

type PixelFormat =
  | 'yuv-biplanar-420-8bit'
  | 'yuv-biplanar-420-10bit'
  | 'yuv-biplanar-420-12bit'

This would essentially generate these if branches:

if (unionValue == "yuv-biplanar-420-8bit") {
  ...
} else if (unionValue == "yuv-biplanar-420-10bit") {
  ...
} else if (unionValue == "yuv-biplanar-420-12bit") {
  ...
}

If you pass in yuv-biplanar-420-12bit, it will then go letter by letter for the first value (yuv-biplanar-420-8bit) until it reaches the first character that doesn't match - in this string this is the 8 which doesn't match with our 1 - so we loop through 18 characters just to find out that this isn't the value we are searching for.

Onto the next string we encounter the same "issue", we loop through 19 characters (yuv-biplanar-420-1) just to find out that it is the 10-bit instead of the 12-bit enum.

And then finally for the third case, we iterate through 22 characters (23 with the \0 terminator) to make sure this is exactly the string we are searching for.

So in total we just looped through 59 characters to find our matching enum.

On my 2023 M2 MacBook Pro, running 10.000.000 string -> enum conversions takes ~350ms with 3 branches. We can make this faster! 😁

Switch vs If

While if statements have to go through each possible if/else branch, a switch statement can jump directly to given cases.

This would be a huge optimization, since we could theoretically just directly jump to the string that we are searching for (the -12bit one) instead of going through each character in the other strings (-8bit or -10bit)! Let's try it:

switch (unionValue) {
  case "yuv-biplanar-420-8bit": ...
  case "yuv-biplanar-420-10bit": ...
  case "yuv-biplanar-420-12bit": ...
}

..well, this doesn't work in C++. 👎

A switch statement can only jump to numbers, and a string is not a number (it is essentially a collection of numbers (chars)), so how do we implement a switch/case for our string? 🤔

Hashing functions to the rescue!

We can get deterministic "unique" numbers for a given string using hashing functions!

There are multiple hashing algorithms, some slower some faster, but all of them have a certain degree of uniqueness - in general, the faster algorithms have a higher probability of encountering hash collisions than the slower, more accurate ones.

In our case, we don't have long strings, so let's use the fast FNV-1a hashing algorithm to convert our string to a number:

uint64_t hashString(const char* str, size_t length) {
  uint64_t hash = 14695981039346656037ull;
  const uint64_t fnv_prime = 1099511628211ull;
 
  for (size_t i = 0; i < length; ++i) {
      hash ^= static_cast<uint64_t>(str[i]);
      hash *= fnv_prime;
  }
 
  return hash;
}

If we now hash our input value "yuv-biplanar-420-12bit", we get the number 10295652043838681355 as a result.

Let's hash all of the possible values of our enum:

  • "yuv-biplanar-420-8bit": 14920774025253445768
  • "yuv-biplanar-420-10bit": 15322878767298378161
  • "yuv-biplanar-420-12bit": 10295652043838681355

Perfect - knowing all possible values, we can now create a switch statement:

uint64_t inputHash = hashString(inputValue.c_str(), inputValue.size());
switch (inputHash) {
  case 14920774025253445768:
    return PixelFormat::YUV_BiPlanar_420_8Bit;
  case 15322878767298378161:
    return PixelFormat::YUV_BiPlanar_420_10Bit;
  case 10295652043838681355
    return PixelFormat::YUV_BiPlanar_420_12Bit;
  default:
    throw std::runtime_error("Invalid input value!");
}

This is almost looking great already - now we just need to make it a bit more usable - we can't just have the developer manually create hashes and put them in the switch cases - we want the developer to still write strings here for readability.

C++ constexpr for compile-time evaluation

As we all know, C++ is a powerful language. But did you know that C++ has a lot of tools for compile-time code evaluation?

We can either use C-based macros (#define ...), templates for static code generation, or my favourite: constexpr functions!

Consider the following code:

constexpr int magic(int input) {
  return input + input;
}
 
int main() {
  int result = magic(10);
  std::cout << result;
}

If we compile this C++ code, the generated assembly code looks like this:

main:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], 20
        mov     eax, 0
        pop     rbp
        ret

We can see that magic(..) no longer exist - there is no runtime evaluation of input + input - it has been computed at compile time!

We can now make our hashing function constexpr as well:

constexpr uint64_t hashString(const char* str, size_t length) {
  // same code as before
}

..and now use it to generate the individual switch cases at compile time:

uint64_t inputHash = hashString(inputValue.c_str(), inputValue.size());
switch (inputHash) {
  case hashString("yuv-biplanar-420-8bit", 21):
    return PixelFormat::YUV_BiPlanar_420_8Bit;
  case hashString("yuv-biplanar-420-10bit", 22):
    return PixelFormat::YUV_BiPlanar_420_10Bit;
  case hashString("yuv-biplanar-420-12bit", 22):
    return PixelFormat::YUV_BiPlanar_420_12Bit;
  default:
    throw std::runtime_error("Invalid input value!");
}

Amazing - now it is much more user-friendly, while still being exactly as fast as before!

Benchmarks - how fast is it actually?

Let's benchmark our code!

I used the following code:

#include <iostream>
#include <chrono>
#include <string>
 
constexpr int ITERATIONS = 10000000;
 
int main() {
  auto begin = std::chrono::steady_clock::now();
 
  std::string input = "yuv-biplanar-420-8bit";
  for (int i = 0; i < ITERATIONS; i++) {
    if (input == "yuv-biplanar-420-8bit") {
      continue;
    } else if (input == "yuv-biplanar-420-10bit") {
      continue;
    } else if (input == "yuv-biplanar-420-12bit") {
      continue;
    }
  }
 
  auto end = std::chrono::steady_clock::now();
  auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin);
  std::cout << ITERATIONS << " iterations took " << elapsed.count() << "ms!";
 
  return 0;
}

We have 3 cases, but the first if statement is already what we want - so we don't need to compare the other 2 cases.

I compiled this using clang++:

clang++ main.cpp -o main -std=c++20 && ./main

Running this on my 2023 M2 MacBook Pro shows ~390ms for 10.000.000 iterations:

10000000 iterations took 390ms!
10000000 iterations took 388ms!
10000000 iterations took 388ms!
10000000 iterations took 387ms!
10000000 iterations took 387ms!

Now let's switch out the if branches for a switch statement with hashing functions:

#include <iostream>
#include <chrono>
#include <string>
 
constexpr int ITERATIONS = 10000000;
 
constexpr uint64_t hashString(const char* str, size_t length) {
  uint64_t hash = 14695981039346656037ull;
  const uint64_t fnv_prime = 1099511628211ull;
 
  for (size_t i = 0; i < length; ++i) {
      hash ^= static_cast<uint64_t>(str[i]);
      hash *= fnv_prime;
  }
 
  return hash;
}
 
int main() {
  auto begin = std::chrono::steady_clock::now();
 
  std::string input = "yuv-biplanar-420-12bit";
  for (int i = 0; i < ITERATIONS; i++) {
    switch (hashString(input.c_str(), input.size())) {
      case hashString("yuv-biplanar-420-8bit", 21):
        continue;
      case hashString("yuv-biplanar-420-10bit", 22):
        continue;
      case hashString("yuv-biplanar-420-12bit", 22):
        continue;
    }
  }
 
  auto end = std::chrono::steady_clock::now();
  auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - begin);
  std::cout << ITERATIONS << " iterations took " << elapsed.count() << "ms!";
 
  return 0;
}

Running this again gives us roughly the same results:

10000000 iterations took 391ms!
10000000 iterations took 392ms!
10000000 iterations took 390ms!
10000000 iterations took 388ms!
10000000 iterations took 396ms!

This is expected, as we only iterate through the string once in both our implementations.

Let's try to scale this a bit - let's change the input string so it doesn't already find the first if branch, and let's also add 7 more cases:

// Ifs:
if (input == "yuv-biplanar-420-8bit") { }
else if (input == "yuv-biplanar-420-10bit") { }
else if (input == "yuv-biplanar-420-12bit") { }
else if (input == "yuv-biplanar-420-16bit") { }
else if (input == "yuv-biplanar-420-18bit") { }
else if (input == "yuv-biplanar-420-24bit") { }
else if (input == "yuv-biplanar-420-32bit") { }
else if (input == "yuv-biplanar-420-64bit") { }
else if (input == "yuv-biplanar-420-128bit") { }
else if (input == "yuv-biplanar-420-256bit") { }
 
// Switch:
switch (hashString(input.c_str(), input.size())) {
  case hashString("yuv-biplanar-420-8bit", 21): ...
  case hashString("yuv-biplanar-420-10bit", 22): ...
  case hashString("yuv-biplanar-420-12bit", 22): ...
  case hashString("yuv-biplanar-420-16bit", 22): ...
  case hashString("yuv-biplanar-420-18bit", 22): ...
  case hashString("yuv-biplanar-420-24bit", 22): ...
  case hashString("yuv-biplanar-420-32bit", 22): ...
  case hashString("yuv-biplanar-420-64bit", 22): ...
  case hashString("yuv-biplanar-420-128bit", 23): ...
  case hashString("yuv-biplanar-420-256bit", 23): ...
}

When the input string is the first (or one of the first) values in the if, the if is fast.

Let's try to run this for different input strings:

If Branches     Hash/Switch
"yuv-biplanar-420-8bit"362ms378ms
"yuv-biplanar-420-10bit"634ms392ms
"yuv-biplanar-420-32bit"2.627ms387ms
"yuv-biplanar-420-256bit"     2.709ms390ms

As we can see, even if the if branches are lucky and find the value on first try, the hashing function is still roughly as fast as the if so the performance difference is negligible.

The longer we need to search for our value however (e.g. the further down it is in our if/else branches), the slower the if approach gets - after 10 items, our hashing/switch approach is almost 7x faster than an if/else string comparison!

Conclusion

So to conclude - generating hashes and switch/case statements can be much faster than going through if branches.

While the performance of the if implementation depends on both the input string's length, and the amount of different branches we have, the switch implementation only depends on the input string's length.

In most modern languages (Java, C#), you can use switch statements with strings, and I assume the language environment automatically generates static hashes for you.

Thanks for reading! ❤️✌️





Bonus: Automatically getting constexpr string length

One other cool thing we can do is to automatically find string-length for static strings using C++ templates.

We know that dynamic strings are heap-allocated char*, but static strings are char[N] where N is the length of the string.

With this knowledge, we can add a separate overload for hashString that doesn't take a length parameter:

template <size_t N>
constexpr uint64_t hashString(const char (&str)[N]) {
    return hashString(str, N - 1); // N includes the null terminator, so subtract 1
}

..which we can then use for our statically allocated strings:

uint64_t inputHash = hashString(inputValue.c_str(), inputValue.size());
switch (inputHash) {
  case hashString("yuv-biplanar-420-8bit"):
    return PixelFormat::YUV_BiPlanar_420_8Bit;
  case hashString("yuv-biplanar-420-10bit"):
    return PixelFormat::YUV_BiPlanar_420_10Bit;
  case hashString("yuv-biplanar-420-12bit"):
    return PixelFormat::YUV_BiPlanar_420_12Bit;
  default:
    throw std::runtime_error("Invalid input value!");
}

No more manually counting the letters! 🥳

Liked this blog post?Share this on Twitter💖  Sponsor me on GitHub