Using Power Query to Clean Data in Excel

Power Query is a handy feature in Excel that helps you to clean your data, and change it into the format that you need it in. Power Query has been available in Excel for a long time, although in Excel 2016 it was called Get and Transform.

This tool is an ETL (Extract, Transform and Load) utility which makes it so much easier to work with other people’s data. Power Query can be used to connect to data from a huge range of sources, and lets you take a data source, put it into an Excel workbook, and ‘clean the data up’ so that you can perform calculations on it, load it into pivot tables, and generally enjoy the data in a more useful format.

Cleaning Up Your Data

Let’s imagine you have some data that was entered into a spreadsheet by hand. Perhaps you run a yoga studio, and you record classes for a given day, the name of the instructor, and who attended each class, along with whether they were a monthly member or paid per session. All the information that you need is entered into a spreadsheet, but because it was entered by hand there are several problems with it:

  • The date of the class appears only once, at the ‘top’ of the section, rather than in a column alongside the rest of the data
  • There are inconsistencies regarding spaces before or after names
  • There are blank rows which could confuse data processing
  • The membership information is spread across multiple rows

It’s entirely possible that someone who ‘wanted to computerize their data’ but who was not well versed in how to set up spreadsheets would create a document like that. It is well laid out and human readable, but it is not set up in a way that is machine-readable. The good news is that you can improve it to make it easier to work with.

Importing the Data For Use in Power Query

The first thing that you need to do is import the data. Open the Data tab and click ‘Get Data’. Select From File, and drill through that menu to ‘From Workbook’, then select the file and workbook that you want to get your data from.

If you need to load data from more than one workbook, then you can do this by ticking the ‘Select multiple items’ box, and then selecting the additional sheets that you need. Next, click the ‘Edit’ button to load the data into the Power Query Editor.

You can now start working on the data that you have imported, removing unnecessary information, changing types and formats, and merging or splitting cells to get the data in the format that you need it in.

Let’s imagine that the names of your customers were written like this “Smith, Pete”, and you wanted that to be split into LASTNAME and FIRSTNAME as separate columns . You could use the “Split Column” tool to split the column. Select “By Delimiter” and choose “comma”. This will split the name column into two separate columns for you.

You can filter other columns in the same way, and merge columns as well. You can also “Trim” extra characters, and change fields to make them look neater, transforming text into all upper or all lower case, or Capitalizing Each Word in a field. You can even remove unnecessary rows, and rename the column headings.

Perhaps the most useful option is the ability to unpivot certain columns. When you unpivot the columns, you will find that the data is split into two columns, one called attribute (which has the former heading) and one called value, which contains what was previously in the column with that heading. You should rename those columns to make it clear what they are.

Once you have finished going through the data in the Power Query Editor, you should have a neat looking tabular set of data. You can close the editor, and finish loading the data into a standard Excel table so that you can work with it properly. In there, you can make your own pivot tables, and process the data as normal.

If you saved the query and connection properly, then you would be able to re-load the original data file in the future, and the tool should repeat the steps that you used to clean the data, giving you fresh information. As long as the data is still in the same consistent format, the cleaning process will be done automatically.

Power Query takes some time to set up the first time that you use it, but if you import information from the same tool over and over again it will save you a lot of time in the long term. That’s what makes it so useful for data analysts.