Beginning PivotTables in Excel 2007 will introduce you to the exciting new pivot table features in Excel 2007. Create quick summaries and pivot charts, add impact with traffic light icons, design calculated fields, group dates and numbers.

Categories

Archives

Learn how to create Excel dashboards.

Clean Excel Data With TRIM and SUBSTITUTE

You have two Excel lists, and you’re trying to find the items that are in both lists. You know there are matching items, but your VLOOKUP formulas can’t find them. In this screenshot, the tiptech.html page is in both lists, but the VLOOKUP formula in cell C2 can’t find it.

TRIM01

Spot the Differences

Working with Excel data can be like one of those “Spot the Difference” puzzles. What’s different between list A and list B? If you’re lucky, the differences are obvious, like the forward slash in column B, and no leading slash in column E. Other times, it’s tougher to find the differences.

A common problem is items with leading or trailing spaces. You can’t see them on the screen, but after you’ve encountered them a few times, you learn to check for them. The LEN function is a great help if you suspect there are hidden space characters.

TRIM02

If you use the LEN function to compare the length of the text in cell B2 and E4, you’d see that there are 2 additional characters in cell B2. One character is the forward slash, and the other character is a trailing space.

Remove Leading and Trailing Spaces

If you want to use a VLOOKUP or MATCH to find column B items, in column E, you’ll have to get rid of any extra characters. First, you can deal with the spaces, by using the TRIM function.

To return the text from cell B2, without any leading or trailing characters, you’d use this formula:

=TRIM(B2)

If you use a formula in cell B12 to check the length of the trimmed text, it’s now 13 characters, instead of 14. The trailing space has been removed.

TRIM03

Remove a Specific Character

Next, you can use the SUBSTITUTE function to remove the forward slash from the text in cell B2.

=SUBSTITUTE(B2,”/”,””)

  • The first argument, B2, is the cell that contains the text value.
  • The second argument, “/”, is the old text, that you want to replace.
  • The third argument, “”, is the new text, that replaces the old text. If you want to remove the old text, without inserting new text, use “” as an empty string, as we did here.

TRIM04

Combine the Functions

The TRIM and SUBSTITUTE functions work well separately, and you can combine them, to remove both the spaces and the forward slash. The order doesn’t matter, so you can use either:

=TRIM(SUBSTITUTE(B2,"/",""))

or:

=SUBSTITUTE(TRIM(B2),"/","")

TRIM05

Add to the VLOOKUP

Now that you know the TRIM and SUBSTITUTE functions will clean up the text in column B, you can add those functions to the VLOOKUP formula.

Instead of using B2 in the VLOOKUP:

=VLOOKUP(B2,$E$2:$F$8,2,FALSE)

use the TRIM and SUBSTITUTE functions:

=VLOOKUP(TRIM(SUBSTITUTE(B2,"/","")),$E$2:$F$8,2,FALSE)

A match for the cleaned up text is found in column E, and the Update column is filled in with the correct date.

TRIM06

Troubleshooting a VLOOKUP Formula

If TRIM and SUBSTITUTE don’t solve your VLOOKUP problems, there are a few more suggestions on the Contextures website: Troubleshoot the VLOOKUP formula. You’ll also find examples of using the IF function or IFERROR function to deal with VLOOKUP function errors.

And if that doesn’t work, you could try some Mr. Clean; according to this commercial from the 1950s, it works on everything, including the dog!

_____________

8 comments to Clean Excel Data With TRIM and SUBSTITUTE

  1. Rick Rothstein (MVP - Excel)
    February 17th, 2010 at 1:23 am

    In the "Combine the Functions" section, you wrote...

    =TRIM(SUBSTITUTE(B2,"/",""))

    or:

    =SUBSTITUTE(TRIM(B2),"/","")

    implying that these two formulas return the same result. And, of course, this is true for the data you showed us. However, so readers of your blog don't make the mistaken assumption that these two formulas always return the same results, it should be pointed out that, for other types of data in B2, the order could matter. For instance, if the there were one or more spaces in front of the slash **and** one or more spaces immediately after the slash (possibly followed by other text), the second formula will retain the spaces located after the slash whereas the first formula will not. When the TRIM function is applied matters for data such as I just proposed.

  2. Jason M
    February 17th, 2010 at 7:53 am

    In addition to TRIM, the CLEAN worksheet function can assist in sanitizing your data set. CLEAN is less common and honestly, I think I have only used CLEAN once when I imported some data that was loaded with lots of garbage characters.

  3. Roger Govier
    February 17th, 2010 at 10:27 am

    Hi Deb

    Your readers may also come across that pesky non breaking space character, Char(160) which, whilst not visible as a space is not cleaned with TRIM.

    SUBSTITUTE will remove it fine though
    =SUBSTITUTE(B2,Char(160),"")

  4. David McRitchie
    February 17th, 2010 at 12:03 pm

    For a macro solution:
    http://www.mvps.org/dmcritchie/excel/join.htm#trimall

    works from a selection. By using a macro you fix the problem and don't have to keep reworking bad data with a formula.

  5. Debra Dalgleish
    February 17th, 2010 at 7:19 pm

    Rick, Jason, Roger and David, thanks for adding your Excel data cleaning tips.

  6. Oladapo
    March 3rd, 2010 at 5:43 am

    Hello,

    Can someone help with this:
    Removing space(s)-regular and non regular, in an excel text or number. For example

    1) 123 5697
    2) 23145 5
    3) 1 98742
    4) 569 49 12

    Thank you

  7. Roger Govier
    March 3rd, 2010 at 6:15 am

    Hi Oladapo

    You can just use the Substitute function
    =Substitute(A1," ","")

  8. cs coleman
    July 15th, 2010 at 1:44 am

    Excellent resource and examples, especially intuitive for those seeking to combine functions.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>