Sunday, September 29, 2013

Excel tricks for cleaning data

In this post lets talk about few simple Excel tricks which are handy to clean small datasets.

 How can we split text from one column into multiple columns?
  • Sometime you might have a time stamp in given column but you want date and time in separate column for analysis.
  • Select the column of interest, go to Data tab in excel and click in "Text to columns".
  • You will get a popup window. In first step here we can select if we want to split column based on fixed width or based a delimiter character. Lets select delimiter and click on next.
  • In next step you can select one or many delimiters from comma, space, semicolon, tab. You can also add you own delimiter and see preview below. Lets select space and click on next.
  • In next step we select the destination column starting from which output will be displayed. Be default it will replace existing column, so it you want you can change it to next empty column.

How to count the occurrences of a cell-value in a specific range in Excel?
  • Lets say we want to count values as that of D2 from column D for first 100 rows (excluding header). 
  • Use this formula COUNTIF(D2:D100,D1)
  • If you want to use this for entire column use, COUNTIF($D$1:$D$319,D2) 
  • Always add $ in excel formula if you want a fixed range. Lets say I use formula  COUNTIF(D2:D100,D2) in cell E2 to find number of occurrences of D2. Now I want to scale it for finding count of all unique values till D100 I can simple drag bottom right corner of E2 till E100. However for E3 the range will be D3 to D101. So we are missing value from D2 and unnecessarily counting value from D101. To avoid this we add $ in formula like, COUNTIF($D2:$D100,E2) in E2 and then drag it till E100. 
  • In the screenshot below you can confirm number of occurrences for each values from column D in column E.  As you can see in most of the cases it will be repetitive.

How can I use output values of formula (in another sheet or file)?

  • If you copy values from column E and paste it in another sheet/file you will see null values
  • It can be fixed with a simple step. Select the whole column. You can do this by selecting first cell in column and pressing "Ctr + Shift + Down arrow key". On mac its "Command + Shift + Down key". Copy these selected cells.
  • Go to destination where you wish to paste it. Right click > select "Paste special" > Values.

How can I remove records with specific values (blank, hyphen, "NULL") in a column?

  • There might be better way of doing this but what I find handy is,
  • Select column of interest and click on filter. When the filter is applied on column go to filter dropdown at the top of column and uncheck values you don't want (like blank, hyphen etc)
  • Select the records. You can do this by selecting first cell from where header starts (generally A1) and press "Ctr + Shift + Down arrow key" to select whole column. On mac its "Command + Shift + Down key". Then press "Ctr + Shift + Right" to select all columns.
  • Copy these selected cells and paste at destination.

No comments:

Post a Comment