alexeyfv

Published on

- 2 min read

Fastest way to extract a substring in C#

C# Performance
img of Fastest way to extract a substring in C#

Today, we’ll dive back into a microbenchmarking and a concise article about performance in C#. Our focus will be on strings and the most effective way for extracting a substring from the original string.

Benchmark

In this benchmark, we’ll consider the following ways of extracting substring:

  1. Substring method.
  2. Range operator.
  3. Split method.
  4. ReadOnlySpan<T> Struct.
  5. Regex Class;
  6. SkipWhile method.

For benchmarking, I used the BenchmarkDotNet library. The whole code of the benchmark class can be found here.

Results

As usual, I run the benchmark on both .NET 6 and .NET 7 platforms. The results show minimal variation between the two.

Execution time

Benchmark results comparing different substring extraction methods in C#

The benchmark results

We observe that ReadOnlySpan<T>, Substring and Range operator show fairly similar performance results. Split, Regex and SkipWhile are notably slower, being 2.5, 8.5 and 23.5 times respectively.

MethodMean, nsPercent
ReadOnlySpan<T>687.6100
Substring698.5102
Range710.5103
Split1696.3247
Regex5830.4848
SkipWhile16211.72358

If we’ll look at decompiled C# code, it becomes apparent that Range operator’s implementation is very similar to the implementation of Substring.

   // Range Operator after decompiling
string text = data[num];
int num2 = text.IndexOf(_symbol);
string text2 = text;
int num3 = num2;
list.Add(text2.Substring(num3, text2.Length - num3));
num++;

The only difference is that Substring implementation has fewer local variables.

   // Substring after decompiling
string text = data[num];
int startIndex = text.IndexOf(_symbol);
list.Add(text.Substring(startIndex));
num++;

ReadOnlySpan<T> shows better results. It looks like getting memory span and creating a new string from it is slightly faster, than getting substring by string.Substring method. I’m assuming that the reason of that is index bounds checks inside internal implementation of Substring method.

   // ReadOnlySpan<T> after decompiling
string obj = data[num];
int start = obj.IndexOf(_symbol);
ReadOnlySpan<char> value = MemoryExtensions.AsSpan(obj, start);
list.Add(new string(value));
num++;

Split is slower because its internal implementation and use of this method to obtain a substring is incorrect.

   // Split after decompiling
string text = data[num];
list.Add(text.Split(':')[1]);
num++;

Regex is a good option when you need to get a substring with a more complex pattern rather than a single char. But in this particular case it’s like breaking a butterfly on a wheel.

   // Regex after decompiling
string input = data[num];
list.Add(Regex.Match(input, _pattern).Groups[1].Value);
num++;

SkipWhile is super slow because:

  1. It creates a new delegate Func<char, bool>.
  2. Enumerable.SkipWhile calls this delegate for each char in the string.
  3. Enumerable.ToArray converts IEnumerable<char> to char[].
   // SkipWhile after decompiling
string source = data[num];
list.Add(new string(
    Enumerable.ToArray(
        Enumerable.SkipWhile(
            source,
            new Func<char, bool>(<SkipWhile>b__5_0)))));
num++;

Memory

Speaking about memory allocations, ReadOnlySpan<T>, Substring and Range shows the same results. Other implementations require more memory.

MethodGen0Gen1AllocatedPercent
ReadOnlySpan<T>0.39010.00574.79 KB100
Substring0.39010.00574.79 KB100
Range0.39010.00574.79 KB100
Split0.73620.01149.03 KB188
Regex1.91500.030523.5 KB490
SkipWhile2.28880.030528.23 KB589

Conclusion

The most efficient methods for extracting a substring in C# are ReadOnlySpan<T>, Substring and Range. I favor the Range operator due to its cleaner appearance compared to other implementations. However, it is worth noting that it is 1-3% slower than ReadOnlySpan<T> and Substring.