Preserving Delimiters When Splitting a String with Ruby’s String#split
For some time, I wasn’t aware that there is a very simple way to preserve delimiters when splitting a string with
String#split. Let’s look at the exercise below as a basic example of when this knowledge might be useful:
substitute_numbersmethod so that each number written out in English within the string argument is replaced with its corresponding string digit instead. For example: ‘zero’ should be replaced with ‘0’. The string argument will contain only single-digit numbers zero through nine.
A first attempt at implementing a solution may look something like this:
Our general approach is to split the string on any non-word character (using RegEx), and iterate through the resulting array of words with
Array#map. Each time we come across a written number, we replace that number with its corresponding digit instead. Finally, we join our array of words back into a string and return this new string from our method. This approach is decent, but let’s take a look at what this returns in practice:
As you can see, the return value of our
substitute_numbers method is not what we expect. When we split the original string, our delimiters (which includes all punctuation marks) were not preserved.
Now, let’s talk about a very simple way that we can preserve these delimiters and easily get our intended return value, using nearly the exact same approach. We can do so using RegEx capture groups. This simply involves placing parentheses around our pattern. Our RegEx now looks like this:
/(\W)/. Now the array returned by
str.split(/(\W)/) includes all delimiters as elements of the returned array. This can be demonstrated with a simple example as seen below:
Let’s now take a look at the solution code for our exercise using this approach:
Note that we no longer need to pass a space to our
Array#join method, because the space characters from our original string, as well as all other delimiters, are already elements in the array that we are joining into a string. Upon invoking our method, we now see the expected output: