Introduction
If you are a Power BI user, you probably know that one of the greatest features of the tool is the Power Query Editor. The Power Query Editor lets you wrangle, alter, shape, transform, adjust, and modify your data prior to loading it into Power BI.
The UI, however, can only take you this far. One of the most challenging things about Power Query is that it does not let you replace multiple values at once. You can add a zillion steps to replace a single value multiple times, but that’s not really efficient.
A couple of ways of doing so by writing a custom function, can be found here and here. However, these functions are somewhat complicated to write and to maintain. They are also very time consuming when running on a large dataset.
This article will show you a few examples you can use to remove or replace characters as a bulk, using only 1 or 2 steps.
At the end of the article, here is what you will be able to get:
Source Data | Target Result |
ARISTIDE, HILAIRE E. | ARISTIDE,HILAIRE,E |
BEDI,HARVINDER,P | BEDI,HARVINDER,P |
CIDELCA,BERTHONIER | CIDELCA,BERTHONIER |
CONTI, VICTOR | CONTI,VICTOR |
PICHARDO – JOSE-ANDRE | PICHARDO,JOSE,ANDRE |
SINGH, KULJIT/MULTANI, DALJIT | SINGH,KULJIT,MULTANI |
Here is how this article is going to roll:
- Remove unwanted characters as bulk with List.RemoveItems
- Replace specific characters with another value in 1 step using ToList.ReplaceMatchingItems
- Cleanse data for unwanted values as bulk with Splitter.SplitTextByAnyDelimiter
- Manage a list of unwanted values dynamically with the List function
Are you comfortably seated? Let’s get started!
Before starting, we need to check a couple of things:
- Open your Power BI Desktop and go to “Edit Queries” to open the Power Query Editor window. In the View tab, make sure the “Formula Bar” checkbox is, well, checked. This will display the M code that each of your steps in the Query Editor represents. You’re going to need it in the future.
- Also, you may want to copy/paste the data below into a flat-file and load it into Power BI, to be able to follow along:
Name |
ARISTIDE, HILAIRE E. |
BEDI,HARVINDER,P |
CIDELCA,BERTHONIER |
CONTI, VICTOR |
PICHARDO – JOSE-ANDRE |
SINGH, KULJIT/MULTANI, DALJIT |
Alright, now we’re ready 😊
Join us at an upcoming panel discussion, webinar or meetup!
1. Remove unwanted characters as bulk with List.RemoveItems
We are going to use the List.RemoveItems function to remove unwanted characters from our column Name. From the documentation provided by Microsoft, we learn that the function works the following way:
List.RemoveItems ( {source list} , {list of characters to remove} )
In our example, let’s pretend the characters we do not want from our data are the space, the dot, the slash, and the dash. By adding the proper quotation around it, we get: ” “, “.” , “/” , “-” . Now we just need to add the curly brackets, which is the M formula slang for List, and add it to our formula:
List.RemoveItems ( {source list} , {” “, “.” , “/” , “-” } )
Now’s the time to plug in our source data. The issue here is that our column is by definition a column and not a list. So we need to convert it with the Text.ToList function. Now, we can add it to the formula.
List.RemoveItems ( Text.ToList([Name]), {” “, “.” , “/” , “-” } )
Next, we need to convert this list back to an actual column with the Text.Combine function.
Text.Combine (
List.RemoveItems ( Text.ToList([Name]), {” “, “.” , “/” , “-” } )
)
And voilà! You can now copy/paste this formula in the Custom Column window from the Add Column tab. With our example, you should get the following result:
Name | RemoveItems |
ARISTIDE, HILAIRE E. | ARISTIDE,HILAIREE |
BEDI,HARVINDER,P | BEDI,HARVINDER,P |
CIDELCA,BERTHONIER | CIDELCA,BERTHONIER |
CONTI, VICTOR | CONTI,VICTOR |
PICHARDO – JOSE-ANDRE | PICHARDOJOSEANDRE |
SINGH, KULJIT/MULTANI, DALJIT | SINGH,KULJITMULTANI,DALJIT |
2. Replace specific characters with another value in 1 step using ToList.ReplaceMatchingItems
In some cases, you may want to replace the characters with another value, instead of removing them altogether. Well that’s what we are going to do now with the List.ReplaceMatchingItems function. Here is what we get from the Microsoft documentation:
List.ReplaceMatchingItems(
{Source List}, { {OldValue1, NewValue1}, {OldValue2, New Value2} }
)
Here, let’s pretend we want to replace our dot with nothing (aka remove it), the slash with an ampersand (with spaces around it), and the dash with a coma. Let’s update our formula:
List.ReplaceMatchingItems(
{Source List}, { {“.”, “”} , {“/”, ” & “} , {“-” , “,” } }
)
Now, we can add our source column as a list as saw in the previous section:
List.ReplaceMatchingItems(
Text.ToList ([Name]), { {“.”, “”} , {“/”, ” & “} , {“-” , “,” } }
)
Finally, let’s put the pieces back together in one column with the Combine function:
Text.Combine( List.ReplaceMatchingItems(
Text.ToList ([Name]), { {“.”, “”} , {“/”, ” & “} , {“-” , “,” } }
))
You can now paste the formula as a new custom column. Here is the result we are getting:
Name | ToList_ReplaceMatchingItems |
ARISTIDE, HILAIRE E. | ARISTIDE, HILAIRE E |
BEDI,HARVINDER,P | BEDI,HARVINDER,P |
CIDELCA,BERTHONIER | CIDELCA,BERTHONIER |
CONTI, VICTOR | CONTI, VICTOR |
PICHARDO – JOSE-ANDRE | PICHARDO , JOSE,ANDRE |
SINGH, KULJIT/MULTANI, DALJIT | SINGH, KULJIT & MULTANI, DALJIT |
Alternatively, you can play with your data by adjusting the way your source data is broken into a list. With Text.ToList, each list item is a character. This time, let’s use the Text.Split function to convert our source data into a list based on a space separation, instead of each individual characters:
Text.Combine( List.ReplaceMatchingItems(
Text.Split ([Name], ” “), { {“.”, “”} , {“/”, ” & “} , {“-” , “,” } }
))
Here is the result. Notice that the record starting with “PICHARDO” got its first dash replaced because it was in the middle of two spaces. However, the second dash remained as there was no space around it to split it into its own list value.
Name | Split_ReplaceMatchingItems |
ARISTIDE, HILAIRE E. | ARISTIDE,HILAIREE. |
BEDI,HARVINDER,P | BEDI,HARVINDER,P |
CIDELCA,BERTHONIER | CIDELCA,BERTHONIER |
CONTI, VICTOR | CONTI,VICTOR |
PICHARDO – JOSE-ANDRE | PICHARDO,JOSE-ANDRE |
SINGH, KULJIT/MULTANI, DALJIT | SINGH,KULJIT/MULTANI,DALJIT |
3. Cleanse data for unwanted values as bulk with Splitter.SplitTextByAnyDelimiter
Now it may seem like all that is cool, but it is not really robust. So how to cleanse all of your data at once?
To do so, we are going to use a 4 step process:
- Duplicate your existing column (optional)
- Split your column using all the characters you want to remove
- Combine your columns back together with the delimiter of your choice
- Enjoy life with your freshly cleansed data
i) Duplicate your existing column (optional)
Since this technique is SOOO powerful it will OBLITERATE (cough * overwrite * cough) the source column. So if you wish to keep it, duplicate it and work off that duplicated field. You can still insert a step at any time. But it is a good practice to keep your source values to compare it with your modified fields. This will help you validate that the behavior is what you expected.
To duplicate, select your source column and click on “Duplicate Column” from the Add Colum tab.
Tip: Once the column is created you can modify the name directly from the “Duplicate Column” step by modifying it in the Formula Bar. This will save you a “Rename Column” step.
ii) Split your columns using all the characters you want to remove
The first step is to split your source column. To do so, select your column and from the Transform tab, click on Split Column > By Delimiter. From the advanced options, make the number of columns to split into equal to the maximum possible number of values you want to remove within the same entry. In our sample, that number is 5.
Note: You may delete the “Change Type” step that is created automatically.
From the Formula Bar, you should get a formula ending this way:
Splitter.SplitTextByDelimiter(” “
, QuoteStyle.Csv), {“Name.1”, “Name.2”, “Name.3”, “Name.4”, “Name.5”}
)
Let’s use this formula as a baseline and modify it with what we want. In our case, we want to use the Splitter.SplitTextByAnyDelimiter function. So we will just add the “Any”, directly in the Formula Bar:
Splitter.SplitTextByAnyDelimiter(” “
, QuoteStyle.Csv), {“Name.1”, “Name.2”, “Name.3”, “Name.4”, “Name.5”}
)
If you press enter, you will get an error message saying something relative to a list, blah blah blah. That is because we need to add the values we want to remove as a list. This calls for our beloved curly brackets:
Splitter.SplitTextByAnyDelimiter( {“.” , “/” , “-” , “,” , ” “}
, QuoteStyle.Csv), {“Name.1”, “Name.2”, “Name.3”, “Name.4”, “Name.5”}
)
Press enter and you will get your 5 new columns.
iii) Combine your columns back together with the delimiter of your choice
Now, let’s bring back those columns into one field. We will use the Text.Combine function we saw earlier. This time, we will leverage the option to separate the values with a comma. Also, we want to ignore the blank and null fields. To do so, we add the selection condition to the List.Select function. You may paste the formula below into the Custom Column window:
Text.Combine ( List.Select (
{[Name.1],[Name.2],[Name.3],[Name.4], [Name.5]}
, each _<> “” and _ <> null), “,”)
TaaaDaa! You now have a new column with cleanse data. Which brings us to the last step.
iv) Enjoy life with your freshly cleansed data
You can now take advantage of all the free time gained by cleansing all your data as a bulk!
4. Manage a list of unwanted values dynamically with the List function
Now someone asks you to also remove this or that value from the fields. Then change their minds. And change their minds again.
Luckily, you can manage the values to remove dynamically. To do so, add your list of values to remove as a new data table in the Power Query Editor. The source can be a flat file, a SQL data table, or whatever source you choose.
Note: It is good practice to have the field you want to cleanse and the field of values to be cleansed both lowercase, for better matching.
Now that you have your data in the Power Query Editor, select the column you want to use and click on “Convert to List” from the Transform tab. From the left Queries panel, you will see your data table (now list) with a List icon next to it. Double click (or right-click) on it and rename it the way you want. In our example, we want to call it “List”. Here is what you get:
At this point, we just need to update our Split formula from the previous section. Our current formula ends with:
Splitter.SplitTextByAnyDelimiter( {“.” , “/” , “-” , “,” , ” “}
, QuoteStyle.Csv), {“Name.1”, “Name.2”, “Name.3”, “Name.4”, “Name.5”})
We now want to replace the list of values with our new list. To do so, just type the name of your list directly in the formula bar:
Splitter.SplitTextByAnyDelimiter( List
, QuoteStyle.Csv), {“Name.1”, “Name.2”, “Name.3”, “Name.4”, “Name.5”})
To finalize, repeat step #3 for the section above to combine the columns into one, if necessary.
Alright! You are now able to replace multiple values in bulk in Power BI! Of course, you may need some additional steps to adjust to your data and use-case properly, but we hope this guide is enough to get you started.
Feel free to contact us or reach out to us on [email protected] if you have any questions.
Tebow Taquet, Sr Analytics Associate
Originally from France, Tebow (or Thibaut) joined Syntelli in 2015 as an intern, while pursuing his MBA in Finance from Winthrop University. Soon after obtaining his MBA with honors, he came on board full-time data wizard at Syntelli. In addition to being a former racecar driver, Tebow managed the U.S. branch of a private French school group and was a project lead for one of Las Vegas’ top attractions. With a background in business and graphic design, Tebow is a great asset to Syntelli’s project management and data visualization teams. His hard working personality, dedication, and attention to detail guarantee great deliverables.
When Tebow isn’t creating beautiful visualizations, you can find him running half-marathons in under 100 minutes!