
7 minute read
Social Media Data in Urban Research 3
Lu Yilin, Li Shiyu, Gao Jinyuan, Reka Tundokova
Bian Ming, Zhang Mengdi, Zhu Jiaxuan, Li Yifan
Advertisement
For further details please get in touch with:
• Gao Jinyuan (stephgao@connect.hku.hk)
• Reka Tundokova (reka@connect.hku.hk)
• Zhu Jiaxuan (jiaxuan6@connect.hku.hk)
• Siddharth Khakhar (khakhar@hku.hk)
The emergence of new data has provided new opportunities to explore tools and ways for applicability of such data in various fields of science and research. Many scholars started focusing on social media data analysis as a means to understand urban life experience in relation with urban morphology and other built environment phenomena. This report is a summary of findings about social media research methodology, its application in a given site, limitations of this type of research and recommendations for further research steps.
2.27 Use of Social media data in urban research
Data identification & Collection
Social media research is thus an emerging field, not yet clearly defined. This causes those various definitions both about social media platforms and social media research in terms of their use and applicability in the research field. For instance, (Mayr & Weller, 2017) studied data collection during political election campaign in Leibniz. The aim of the study was to compare the relevance of social media data as a complement to data from more typical sources like surveys (Kaczmirek & Vatrapu, 2013).
Generally, social media research entails multimedia or textual content, network data, or data traced from activities of sharing, liking, etc. Social media platforms that are most commonly known and used are Facebook, Twitter, YouTube, Wikipedia, reddit, Instagram, just to name a few. To obtain data from these platforms, one possibility is to manually compile datasets by copy paste or screenshots. A more structured way would be to obtain data via application programming interface (API) directly through the platforms or through platforms that provide access but usually involve a certain fee (Mayr & Weller, 2017).
Because of the multitude of sources and platforms social media research may deal with, it is crucial to understand the opportunities and limitations of extracted and analysed data and be clear about stating these factors throughout the research process. To give an example, extracting data from certain platforms is likely to be targeting particular groups of users, therefore narrowing down the outputs (Lorentzen & Nolin, 2015). It is suggested that experimentation with text based data sets useful to create sampling based on theoretical assumptions. Before conducting research, understanding the character of text-based content helps defining boundaries of the discussion. So while bias involved in textual data sampling is to some extent inevitable, what is stressed is understanding to the kind of bias as much as possible, afterwards key decisions can be undertaken (Lorentzen & Nolin, 2015). The restrictions of obtaining data and the limitations of its context usually leads to modifications of research questions, to tailor it to the availability of data.
One of the limitations highlighted in social media research is that different platforms have particular user groups, therefore data samples from individual platforms are not representative of the population as a whole as demographic details of each user are not known or disclosed with the shared content that is analysed. Moreover, technical, and legal regulations righteously pose challenges in accessing data about users. These restrictions are however not always clear or transparent, and so we can hardly assume what is a representative population of the platform itself. Collecting data from specific geo-locations only, again excludes users who have decided not to share their location creates further bias in terms of representative populations. This regards platforms where location association is highly relevant with population opinion (e.g. Twitter).
Furthermore, the social media research field does not have a defined and established methodological standard.
(Ilieva & McPhearson, 2018) examine social media data related to environmental sustainability and highlight aspects of value and emotions that are possible to capture through this research; e.g. measuring visitation of green space through geo-locations of textual and picture data, analysing microblogging to reveal incentives about physical activity, food and alcohol consumption etc.; changes of spatial experience in the course of the day can help highlight social equity issues, as well as travel trajectories supporting city road map tracking; or tracking consumption behaviours to identify touristic attractions or popular urban spaces.
On the other hand, it is extremely challenging to interpret textual content when sentiment in communication may often be expressed through sarcasm and colloquial language thus creates bias in interpretation of sentiment. The computational challenges derive from different access levels to social media data by different companies or research teams, considering the time and location dependence of data and the high volume available from numerous sources. Last but not least, there is always the question of ethics, especially since the sources of used data may be from multiple sources, subject to multitude of approaches and lack of control over the compliance with ethical norms within the whole process of human-research related dataset extraction.
Data Extraction
In terms of social media data, manual extraction of less data can be achieved. But when researchers want to get enough data, machines are often more suitable. There are three ways to extract data: manual, API Wrappers like Stevesie, data extraction software like octopus.
Manually extracting data can achieve the extraction of less data, but it encounters difficulties when faced with a large amount of data. Python/coding, API Wrappers and data extraction software can all extract large amounts of data. But python/coding requires a certain programming foundation and is not suitable for everyone. So we explored the other two directions. Both API Wrappers and data extraction software can obtain more comprehensive data, and API Wrappers are easier to use, so we finally chose Stevesie as the first choice, and the free data extraction software Octopus as another option.
Identifying Social media data platforms

Data scrapping
As the first step to extract data from Google Maps reviews, we applied manual extraction to (1) get a sense of the existing process and textual content, (2) to be more aware of the kind of content that we would be working with and (3) be able to relate the automatized process to the manual one with our own judgement and understanding of textual data.
To automatically extract Google Maps reviews, we used Stevesie platform. The first attempt involved using ‘place_search’ and consequently ‘place_details’ search functions of Google app through Stevesie using Google API key. However, this process is limited to extraction of 5 reviews only, despite a basic paid plan. The solution that enabled us to get more reviews was use the Scale SERP platform API key, run a place search based on a query (e.g. ‘parks’) with assigned geo-location and use ‘place_id’ from generated results. The obtained details were then used in Stevesie/ScaleSERP search and with each ‘place_id’ it was possible to extract a set of 10 reviews per page, therefore the extraction involved inserting details for respective places several times in order to receive a sufficient number of results.

Use place search in stevesie and location in last step to get specific longitude and latitude for specific location.

Use specific longitude and latitude for certain location to search detailed data in nearby search in Stevesie.
Next step is to fill in some required options to execute. Then go to the website of scale serp to create an account. That website is well-connected with Stevesie.
Get data including types, opening hours, comments, etc. for certain locations.



After filling the necessary things like locations, etc., press send API Request button. In the output JSON file on the right side, copy the date_id for certain park (place type).

Firstly, we go to the website of Stevesie, click on the Scale Serp API button.

Then the output would come out. We could download the file. Choose file of CSV to download. We could see the reviews of the first page, and the number is 10.
Then in the next page, choose Search Results.
At the end of the rows, there are pagination.next_page_token for next page. Just copy the token in the stevesie.
Overview of software used/explored

Then again, the same way to execute. The reviews of the next page come out. Repeat the process to get remaining reviews.
Problems encountered and resolved
At first, we successfully extract the location data for certain places, but failed to extract all of the review data with the limitation of five reviews in the maximum. Later, we found Serp API key, which could help us get the place_id that is useful in extracting the review data.
Limitations
In the frame of stevesie, we could only extract data in the way of 10 reviews each time. So if we have to extract 1000 reviews, we need to extract 100 times. That is a large workload for us.
Alternative strategies
This website could be used to do one-click extraction for all of the reviews, which is very convenient and fast.
We use stevesie(stevesie, 2022) to get google map reviews for Quarry Bay keywords.
The specific steps are as follows:
1) Get API KEY: Go to scaleserp playground website, create account to get API key
2)Get data_ID: search select google place, fill in query/location and run it, you can see the number of comments and the number of searches in the result. Get the data_ID of each query item from the places_results in the code result and copy it
3) Get a single-page comment for a single location: enter the stevesie interface, select the scale serp API in the search results item, click search results, enter the copied API KEY, data_ID, fill in query, search type, limit. After running, you can Get a singlepage review
4) Get next page token: Download the extended CSV from the obtained single-page comments, and open the file to get the next page token
5) Repeatedly obtain location comments: the original settings remain unchanged, copy the next page token to the next page token, and repeat the operation

6) Get more location reviews: Repeat steps 2-5
7) Finally get the google map comments of the CSV result

We use vicinitas (vicinitas, 2022) to get Twitter reviews of Quarry Bay keywords.
The specific steps are as follows:
1. Search for keyword Tweets
2. Get CSV data and visualize information
Web socialmention search

We used the site socialmention (Socialmention, 2022) to obtain sentiment analysis of all articles and reviews on Quarry Bay keywords on the site.
The specific steps are as follows:
1. Search keywords
2. Get sentiment/top keywords CSV data and visualizations
Alternate software/platforms
In this research stage, only the software and platforms for obtaining google map comments are found to be optional, and other platforms such as Twitter/Facebook/Instagram lack usability due to a series of technical limitations such as cumbersome operations and privacy protection.
We found that in the case of the free version, Botsol Crawl (Crawl, 2022) can use browser automation to quickly and efficiently obtain comments with limited google map comments. The specific steps are as follows:

1. Select Google reviews
2. Enter search keywords and click start bot to start running
3. Get CSV