1 minute read

CLEANING THE DATA

ONCE WE HAVE THE DATA IN A DATA FRAME, WE CAN CLEAN IT UP. IT MAY INVOLVE REMOVING DUPLICATES, FILLING IN MISSING VALUES, OR CHANGING THE DATA FORMAT. IN THIS CASE, WE WILL REMOVE THE DUPLICATES. WE ALSO WANT TO REMOVE THE COMMENTS ASSOCIATED WITH THE TOP FIVE HOTELS SO THAT WE ONLY HAVE INFORMATION FOR THE THREE HOTELS IN OUR ANALYSIS. TO ACCOMPLISH THIS, WE WILL USE A REGEX. HERE IS WHAT THE REGEX LOOKS LIKE: THIS CODE WILL REPLACE ALL ” #” WITH SPACES AND ALL “&” WITH & AND APPEND THE COMMENT BEFORE OR AFTER.

Advertisement

This article is from: