Navigation
Recherche
|
How to split strings efficiently in C#
jeudi 26 décembre 2024, 10:00 , par InfoWorld
C# provides the string.Split() method to split strings in.NET applications. However, we have a better alternative, the ReadOnlySpan.Split() method, which accomplishes the same thing in a much more efficient way. In this article, we’ll examine the performance drawbacks of the string.Split() method and illustrate how the ReadOnlySpan.Split() method can be used to optimize performance. To work with the code examples provided in this article, you should have Visual Studio 2022 installed in your system. If you don’t already have a copy, you can download Visual Studio 2022 here. Create a console application project in Visual Studio 2022 First off, let’s create a.NET Core 9 console application project in Visual Studio 2022. Assuming you have Visual Studio 2022 installed, follow the steps outlined below to create a new.NET Core 9 console application project. Launch the Visual Studio IDE. Click on “Create new project.” In the “Create new project” window, select “Console App (.NET Core)” from the list of templates displayed. Click Next. In the “Configure your new project” window, specify the name and location for the new project. Click Next. In the “Additional information” window shown next, choose “.NET 9.0 (Standard Term Support)” as the framework version you would like to use. Click Create. We’ll use this.NET 9 console application project to work with the the ReadOnlySpan.Split() method in the subsequent sections of this article. Why do we need to split strings? When working in applications, you will often need to manipulate text data by splitting strings, joining strings, creating substrings, etc. Typically, you will encounter the need to split strings in the following scenarios: Processing large text files Parsing arguments passed to an application from the command line Tokenizing a string Parsing CSV files Processing log files The String.Split() method in C# The String.Split() method creates an array of substrings from a given input string based on one or more delimiters. These delimeters, which act as separators, may be a single character, an array of characters, or an array of strings. The following code snippet shows how you can split a string using the String.Split() method in C#. string countries = 'United States, India, England, France, Germany, Italy'; char[] delimiters = new char[] { ',' }; string[] countryList = countries.Split(delimiters, StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries); foreach (string country in countryList) Console.WriteLine(country); When you execute the preceding piece of code, the list of countries will be displayed at the console window as shown in Figure 1. Figure 1. Displaying each substring in a list of substrings created from a long string.IDG The ReadOnlySpan.Split() method in C# While the String.Split() method might be convenient and easy for you to use, it has performance drawbacks. It is not a good choice in performance-critical applications because of its resource allocation overhead. Whenever you use the Split() method to split a string, a new string becomes allocated for each segment. Additionally, the Split() method stores all the segments parsed into an array of strings. This uses a significant amount of memory, particularly when you’re dealing with large strings. With.NET 9, you can split strings in a much more efficient way using the ReadOnlySpan struct. The code snippet below shows how we can split the same string in the above example by using the ReadOnlySpan struct. ReadOnlySpan readOnlySpan = 'United States, India, England, France, Germany, Italy'; char[] delimiters = new[] { ',' }; foreach (Range segment in readOnlySpan.Split(delimiters)) { Console.WriteLine(readOnlySpan[segment].ToString().Trim()); } Note that by using ReadOnlySpan here, memory allocations are reduced because the substrings are returned by referencing the original span, without creating new strings or arrays. You can read more about this in Microsoft’s documentation here and here. Benchmarking the String.Split() and ReadOnlySpan.Split() methods Let us now benchmark the performance of both approaches to splitting strings in C#. To do this, we’ll take advantage of an open source library called BenchmarkDotNet. You can learn how to use this library to benchmark applications in.NET Core in a previous article. To benchmark performance of the two approaches, first create a new C# class named SplitStringsPerformanceBenchmarkDemo and write the following code in there. [MemoryDiagnoser] public class SplitStringsPerformanceBenchmarkDemo { readonly string str = 'BenchmarkDotNet is a lightweight, open source, powerful.NET library ' + 'that can transform your methods into benchmarks, track those methods, ' + 'and then provide insights into the performance data captured. ' + 'It is easy to write BenchmarkDotNet benchmarks and the results of the' + ' benchmarking process are user friendly as well.'; [Benchmark] public void SplitStringsWithoutUsingReadOnlySpan() { char[] delimiters = new char[] { ',' }; string[] list = str.Split(delimiters); foreach (string country in list) { _ = country; } } [Benchmark] public void SplitStringsUsingReadOnlySpan() { ReadOnlySpan readOnlySpan = str; char[] delimiters = new char[] { ',' }; foreach (Range segment in readOnlySpan.Split(delimiters)) { _ = readOnlySpan[segment].ToString().Trim(); } } } You should decorate each method to be benchmarked using the [Benchmark] attribute. In this example, we have two methods that will be benchmarked. One of them uses the ReadOnlySpan.Split() method to split strings, and the other uses the Split() method of the String class. The [MemoryDiagnoser] attribute has been used on the benchmarking class to retrieve memory usage details. Execute the benchmarks The following code snippet shows how you can run the benchmarks in the Program.cs file. var summary = BenchmarkRunner.Run(typeof(SplitStringsPerformanceBenchmarkDemo)); Remember to include the following namespaces in your program. using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; Finally, to compile the application and execute the benchmarks, run the following command in the console. dotnet run -p SplitStringsPerformanceBenchmarkDemo.csproj -c Release Figure 2 shows the results of the executed benchmarks. Figure 2. Comparing the ReadOnlySpan.Split() and String.Split() methods using BenchmarkDotNet. IDG As you can see from the benchmarking results in Figure 2, the ReadOnlySpan.Split() method performs significantly better compared to the String.Split() method. The performance data you see here is only for one run of each of the methods. If you run the benchmark methods multiple times (say, in a loop), you might see even greater performance differences. The ReadOnlySpan.Split() method is a faster, allocation-free alternative to the String.Split() method in C#. The Span-based methods in C# are much more efficient, requiring hardly any Gen 0 or Gen 1 garbage collections compared to the methods of the String class. They reduce the memory footprint and garbage collection overheads considerably. While Span and ReadOnlySpan can provide significant performance gains, it is not a recommended practice to use them when you might have limitations for using stack memory in your application. For example, you should refrain from using Span-based methods when performing simple string manipulations in your application or when working with large, long-lived data structures because the stack usage might outweigh the performance benefits they can provide.
https://www.infoworld.com/article/3626790/how-to-split-strings-efficiently-in-c-sharp.html
Voir aussi |
56 sources (32 en français)
Date Actuelle
dim. 29 déc. - 14:29 CET
|