2.2 Organize and Clean Data Using Data Tools; Practice Ex. 3. Organize and Clean Data in the EMPLOYEES Excel table – Intro to Excel 2016 Data Analysis

2.2 Organize and Clean Data Using Data Tools Section on the Ribbon (Using the Data tab)

Text to Columns – appropriate when there are multiple data pieces in one column (field) and one of the data pieces needs to go into a separate column; i.e. city state zip each have to go into separate columns.

Flash Fill – good for splitting up data when too much is in one column. This is similar situation to Text to Columns but resolved differently. In an adjacent column, enter the zip code of the first listing of data in an adjacent column. Select “Flash Fill” feature on Data Tools section. Excel will enter all of the remaining zip codes below the first.

Remove Duplicates – checks and automatically removes duplicates in a table.

Data Validation – shows instructions for the Excel user to input data within a specific criteria range. If data enter is outside of that range, it is not allowed and the user is prompted to reenter within the range.

Consolidate – good for combining multiple sheets of data into one sheet

Relationships – Shows relationships (linked tables through a common field) between tables or create a new relationship by linking 2 or more tables together through a common field).

Manage Data Model – interfaces with the Power Pivot Excel Add-in, that allows you to create more sophisticated pivot tables or add larger datasets that a workbook can not handle directly. 

Practice Exercise #3 Organize and Clean Data in the EMPLOYEES Table

STEP 1: ​Let’s begin by cleaning our EMPLOYEES table from page 9, step 7 (FIGURE 1).
First, let’s get rid of unnecessary columns, row, and duplicates. 

Picture

FIGURE 1
 
STEP 2A: Let’s get rid of Column D; it is not needed.

STEP 2B: Right-button mouse click Column D and select “Delete” from mouse shortcut menu (FIGURE 2).


Picture

FIGURE 2
STEP 3A: We can check for empty rows by performing a sort on our data (FIGURE 3).
STEP 3B: Select the Data tab; in the Ribbon Sort and Filter section
STEP 3C: Select the large Sort icon, as shown to the right. It designates a Custom Sort.

Picture

FIGURE 3
STEP 4A: We can use just about any of the fields to sort on. We will use Last_Name (FIGURE 4).
STEP 4B: For the “Sort by” option, select the Last_Name choice in the drop-down menu.
STEP 4C: For the “Sort On” option, leave the default choice, “Cell Values”.
STEP 4D: For the “Order” option, leave the default choice, “A to Z”.
STEP 4E: Select OK button

Picture

FIGURE 4
STEP 5: ​The sort action shows all of the rows with missing data, as shown below (FIGURE 5).

Picture

FIGURE 5
STEP 6A: Select rows 2-7 headers to highlight those rows (FIGURE 6).
STEP 6B: Right-button mouse-click highlighted rows and select “Delete” from mouse menu.

Picture

FIGURE 6
STEP 7A: Let’s check and remove any duplicates.
STEP 7B: Select the Data tab. From the Ribbon section, “Data Tools” select the icon, “Remove Duplicates” (FIGURE 7).

Picture

FIGURE 7
STEP 8A: Select fields by checking the box of any fields you feel may help in finding duplicates.

STEP 8B: Select the OK button (FIGURE 8).
  


Picture

FIGURE 8
STEP 8C: We will view a notification of any duplicates found and removed. Select OK button (FIGURE 9).

Picture

FIGURE 9
STEP 9: For Employee_ID, column we must remove the first 2 digits and hyphen. ​Using the RIGHT() function, we can retain characters from the right side of the data. 
STEP 10: Select First_Name column column header to highlight it. Right-button mouse click it and select Insert from mouse menu. ​Inserted column appears in Column B with Column1 header label.
STEP 11A: In cell address B2: begin by entering the equal sign = since this is a formula (FIGURE 10).
STEP 11B: Enter: RIGHT(A2,7)
STEP 11C: Click on cell address A2 with mouse, since that is the first number we want to change. A2 appears in the parenthesis.
STEP 11D: Enter a comma, and then a 7. We want to enter 7 since that is the number of characters we want to retain from the right side of the data.
STEP 11E: Then enter the closed parenthesis )
STEP 11F: Press the Enter key.

Picture

FIGURE 10
STEP 12A: We get the data we desired; we now have the number reproduced, without the first 2 digits and hyphen (FIGURE 11).
 
STEP 12B: Make cell address B2 the active cell

Picture

FIGURE 11
STEP 13A: Move mouse down to bottom right corner of cell address B2 until we see the small thin black cross (FIGURE 12).
STEP 13B: Once you get that cross, keep mouse still and double-click mouse.
STEP 13C: All numbers in the Employee_ID column will be reproduced in Column1 without the first 2 numbers and hyphen. 

Picture

FIGURE 12
STEP 14A: Make cell B2 active. Press CTRL + Shift + down arrow. All data in Column B is highlighted (FIGURE 13).
STEP 14B: Right-button mouse click the highlighted data and select Copy.
STEP 14C: Make cell A2 active. Press CTRL + Shift + down arrow. All data in Column A is highlighted.
STEP 14D: Right-button mouse click the highlighted data and select Paste – Values

Picture

FIGURE 13
STEP 15: Delete Column1 by right-button mouse clicking that column and select “Delete” from mouse shortcut menu (FIGURE 14).

Picture

FIGURE 14
STEP 16: We now have a clean EMPLOYEES data table, shown below (FIGURE 15).

Picture

FIGURE 15
END OF PRACTICE EXERCISE 3