The data we import in Power BI would always be formatted consistently-and-predictably in an ideal situation, but the likelihood of inconsistencies and changes in delivered reports bring up a critical question you should ask yourself when setting up your edit query steps in the Power Query Editor.
I’ll be specifically writing about the case where you need to remove rows from imported data, and the question is: should I use Remove Rows or use Column Filters instead?
The answer I propose: Use Column Filters when you can and use Remove Rows if you must.
Let me elaborate on why.
Using Remove Rows and Column Filters
Let’s take a look at a very simple CSV report that is provided in a file-drop from a few members on the Sales team every day. This is the daily sales totals by sales person:
After importing the CSV into Power BI Desktop and loading the Power Query Editor, the report looks as follows:
The Date row and blank row are not needed, and they can be removed simply by the following one of the two methods captured in the video below:
So, what’s the problem? The problem is that these reports are not always coming in consistently and if you use Remove Rows, the command does not discriminate. Imagine the following variations of the original report:
- Some don’t include the Date row.
- Some don’t include a blank row after Date.
- Some don’t include either the Date row or the blank row.
When these inconsistencies are processed by Power Query Editor steps that use Column Filters, the filters set above (i.e. exclude rows with “Date” and/or [blank] in the first column) simply don’t get applied and no rows are removed from the data set. But, let’s look at what happens with Remove Rows when there is no blank row after Date as one example.
Here’s what happens when I refresh the data source to pick-up additional drops of the CSV with Remove Rows in place.
A quick screen shot of the final sample (which I edit in Notepad for simplicity in the next video). Note, no [blank] row after “Date”:
Now, the video which shows:
- Reapplying the steps from scratch – everything works.
- Changing some data, but leaving the first two lines alone – everything still works.
- Making one more minor change and “accidentally” excluding the blank line after date – the First Row Headers are removed!
This was a very basic example, but Remove Rows has this run-always behavior that could cause some serious reporting problems if there aren’t safeguards on your input data integrity. If you can’t enforce that data integrity through governance, and still need a way to take a best-bet at removing unwanted rows, see if using column filters may be a better option for you.