Lesson 4

Preview

Organize and Clean Data

Excel Standards for Organizing and Cleaning Your Data

Organize Your Data

  • Leave at least one blank column and one blank row between a related data range and other data on the worksheet.
  • Avoid putting blank rows and columns within a range of data. In fact, you should avoid all empty spaces, unless you have a specific reason for using them.
  • Make sure that any hidden rows or columns are displayed before you make changes to a range of data. When rows and columns in a range are not displayed, data can be deleted inadvertently.

Format Your Data

  • When you want to separate labels from data, use cell borders — not blank rows or dashed lines — to insert lines below the labels. 
  • Avoid inserting spaces at the beginning or end of a cell to indent data. These extra spaces can affect sorting, searching, and the format that is applied to a cell. Use the Increase Indent icon if you must apply spacing rather then entering a series of spaces.
  • When you add new rows of data to the end of a data range, Excel extends consistent formatting and formulas. Three of the five preceding cells must use the same format.
  • You can turn a contiguous range of cells on your worksheet into an Excel table. Data that is defined by the table can be manipulated independently of data outside of the table, and you can use specific table features to quickly sort, filter, total, or calculate the data in the table.

Clean Your Data (see following page concerning tools mentioned below)

  • Perform a sort on your data; this shows empty rows that could be missed otherwise.
  • Look for any unnecessary empty columns, rows, cells, and delete them.
  • Check and remove any duplicate data (Remove Duplicates tool).
  • If there are multiple pieces of data in one column (I. E. City, State, Zip), remove each piece and place in a separate column (Text to Columns tool).
  • There are several functions that can be used to specifically clean your data.

Organize and Clean Data Using Data Tools Section on the Ribbon

Text to Columns – appropriate when there are multiple data pieces in one column (field) and one of the data pieces needs to go into a separate column; i.e. city state zip each have to go into separate columns.

Flash Fill – good for splitting up data when too much is in one column. This is similar situation to Text to Columns but resolved differently. In an adjacent column, enter the zip code of the first listing of data in an adjacent column. Select “Flash Fill” feature on Data Tools section. Excel will enter all of the remaining zip codes below the first.

Remove Duplicates – checks and automatically removes duplicates in a table.

Data Validation – shows instructions for the Excel user to input data within a specific criteria range. If data enter is outside of that range, it is not allowed and the user is prompted to reenter within the range.

Consolidate – good for combining multiple sheets of data into one sheet

Relationships – Shows relationships (linked tables through a common field) between tables or create a new relationship by linking 2 or more tables together through a common field).

Manage Data Model – interfaces with the Power Pivot Excel Add-in, that allows you to create more sophisticated pivot tables or add larger datasets that a workbook can not handle directly. 


Practice Exercise #3 Organize and Clean Data in the EMPLOYEES Table

Continue to use data file that was saved in Practice Exercise 2 as “XYZ SWITCHBOARD.xlsx”

STEP 1: ​ First, let’s get rid of unnecessary columns, row, and duplicates. ​

Figure 1

STEP 2A: Let’s get rid of Column D; it is not needed.

STEP 2B: Right-button mouse click Column D and select “Delete” from mouse shortcut menu (FIGURE 2).

Figure 2

STEP 3A: We can check for empty rows by performing a sort on our data (FIGURE 3).
STEP 3B: Select the Data tab; in the Ribbon Sort and Filter section
STEP 3C: Select the large Sort icon, as shown to the right. It designates a Custom Sort.

Figure 3

STEP 4A: We can use just about any of the fields to sort on. We will use Last_Name (FIGURE 4).
STEP 4B: For the “Sort by” option, select the Last_Name choice in the drop-down menu.
STEP 4C: For the “Sort On” option, leave the default choice, “Cell Values”.
STEP 4D: For the “Order” option, leave the default choice, “A to Z”.
STEP 4E: Select OK button

Figure 4

STEP 5: ​The sort action shows all of the rows with missing data, as shown below (FIGURE 5).

Figure 5

STEP 6A: Select rows 2-7 headers to highlight those rows (FIGURE 6).
STEP 6B: Right-button mouse-click highlighted rows and select “Delete” from mouse menu.

Figure 6

STEP 7A: Let’s check and remove any duplicates.
STEP 7B: Select the Data tab. From the Ribbon section, “Data Tools” select the icon, “Remove Duplicates” (FIGURE 7).

Figure 7

STEP 8A: Select fields by checking the box of any fields you feel may help in finding duplicates.

STEP 8B: Select the OK button (FIGURE 8).

Figure 8

STEP 8C: We will view a notification of any duplicates found and removed. Select OK button (FIGURE 9).

Figure 9

STEP 9: For Employee_ID, column we must remove the first 2 digits and hyphen. ​Using the RIGHT() function, we can retain characters from the right side of the data (See FIGURE 10 for STEPS 9-11F). 

STEP 10: Select First_Name column column header to highlight it. Right-button mouse click it and select Insert from mouse menu. ​Inserted column appears in Column B with Column1 header label.

STEP 11A: In cell address B2: begin by entering the equal sign = since this is a formula (FIGURE 10).

STEP 11B: Enter: RIGHT(A2,7)

STEP 11C: Click on cell address A2 with mouse, since that is the first number we want to change. A2 appears in the parenthesis.

STEP 11D: Enter a comma, and then a 7. We want to enter 7 since that is the number of characters we want to retain from the right side of the data.

STEP 11E: Then enter the closed parenthesis )

STEP 11F: Press the Enter key.

Figure 10

STEP 12A: We get the data we desired; we now have the number reproduced, without the first 2 digits and hyphen (FIGURE 11).
 
STEP 12B: Make cell address B2 the active cell

Figure 11

STEP 13A: Move mouse down to bottom right corner of cell address B2 until we see the small thin black cross (FIGURE 12).
STEP 13B: Once you get that cross, keep mouse still and double-click mouse.
STEP 13C: All numbers in the Employee_ID column will be reproduced in Column1 without the first 2 numbers and hyphen. 

Figure 12

STEP 14A: Make cell B2 active. Press CTRL + Shift + down arrow. All data in Column B is highlighted (FIGURE 13).

STEP 14B: Right-button mouse click the highlighted data and select Copy.

STEP 14C: Make cell A2 active. Press CTRL + Shift + down arrow. All data in Column A is highlighted.

STEP 14D: Right-button mouse click the highlighted data and select Paste – Values

Figure 13

STEP 15: Delete Column1 by right-button mouse clicking that column and select “Delete” from mouse shortcut menu (FIGURE 14).

Figure 14

STEP 16: We now have a clean EMPLOYEES data table, shown below (FIGURE 15).

Figure 15

END OF LESSON


Back to: Intro to Excel 2016 Data Analysis > Module 2 – Intro to Excel 2016 Data Analysis

Leave a Reply