
1 minute read
CLEANING THE DATA
ONCE WE HAVE THE DATA IN A DATA FRAME, WE CAN CLEAN IT UP. IT MAY INVOLVE REMOVING DUPLICATES, FILLING IN MISSING VALUES, OR CHANGING THE DATA FORMAT. IN THIS CASE, WE WILL REMOVE THE DUPLICATES. WE ALSO WANT TO REMOVE THE COMMENTS ASSOCIATED WITH THE TOP FIVE HOTELS SO THAT WE ONLY HAVE INFORMATION FOR THE THREE HOTELS IN OUR ANALYSIS. TO ACCOMPLISH THIS, WE WILL USE A REGEX. HERE IS WHAT THE REGEX LOOKS LIKE: THIS CODE WILL REPLACE ALL ” #” WITH SPACES AND ALL “&” WITH & AND APPEND THE COMMENT BEFORE OR AFTER.

Advertisement