If you work with data in Excel, you know what a mess it can be. I help my customers clean up data that they've imported from another computer system, or from reports received from another department or group.
Those files can be filled with spelling mistakes, strange abbreviations, extra spaces or missing punctuation. It can take you hours, or days, to match items between lists, and eliminate any duplicates. Your favourite Excel functions, like VLOOKUP or MATCH, can do some of the work, but fail in other areas.
Shaking your fist, or cursing at the data, doesn't help, as I'm sure you know. Fortunately, there is a new Excel add-in, from Microsoft's BI Labs, that looks promising.
Fuzzy Lookup Add-in for Excel 2010
The new add-in is named Fuzzy Lookup Add-in for Excel 2010, and it helps you find best matches, for items that aren't an exact match. The project description says:
It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms, and added or missing data.
That sounds useful, so I downloaded the add-in, and installed it for testing.
Set Up the Fuzzy Lookup
When you open Excel 2010, after installing the Fuzzy Lookup add-in, you'll see a Fuzzy Lookup tab on the Ribbon, and it contains the Fuzzy Lookup command.
Follow these steps to set up the Fuzzy Lookup:
- The add-in works with formatted tables, so create one or more Excel tables in your workbook, if you haven't done so already. The add-in download includes a sample file named Portfolio, which contains Excel tables.
- On the Ribbon, click the Fuzzy Lookup command, to open the Fuzzy Lookup pane.
- Select the left and right tables for the comparison (to identify matches in a single table, set the left and right tables to be the same)
- Columns with the same heading will be automatically joined. You can leave that join, or delete, and create new joins for other fields.
- For the Output columns, select the columns in each table that you want in the output table. Include the FuzzyLookup.Similarity column, to see how close the matches are.
- Leave the number of matches at the default of 1, or increase it to see more matches for each item in the left table.
Run the Fuzzy Lookup
After you set up the Fuzzy Lookup, follow these steps to run it:
Select a cell in an empty part of the workbook. The results will start in the active cell, so make sure there is enough room, below and to the right of the selected cell.
Click the Go button, to output the results.
Configure the Fuzzy Lookup
You can change the configurations in the Fuzzy Lookup add-in, and create your own.
Test It Yourself
To do your own testing, you can download Fuzzy Lookup Add-in for Excel 2010. The download page has a couple of videos that you can watch, for an overview of how the add-in works.