Data Diplomacy - Updating diplomacy to the big data era

Page 17

Big data is often referred to in relation to the widely quoted ‘V’s’, originally proposed by Laney in 2001. Laney put forward three elements to describe big data V’s: volume (the size of the datasets), velocity (the speed at which big data is generated), and variety (the many different forms that big data takes).20 Later, this definition was updated with a fourth V: veracity (the complexities related to the analysis of big data and related questions of accuracy). A second approach considers big data according to the kind of analysis that is performed on them. Rather than looking at the characteristics of the dataset, they look at the new kind of information and knowledge that can be produced. For example, a Foreign Affairs article describes data as ‘the idea that we can learn from a large body of information things that we could not comprehend when we used only smaller amounts’. They point to the utility of big data in the identification of patterns and correlations. A drawback of this approach is that it is difficult to understand when data is ‘just’ data, and when it is considered to be ‘big data’; patterns and correlations can be found in almost all datasets, and big data is therefore not necessarily something new and revolutionary; rather, it is a new way of combining data for new insights. The definition adopted by the Oxford English Online Dictionary attempts to merge both approaches, and defines big data as ‘extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.’ As for this report, we needed a definition that is sufficiently concrete to practically relate it to diplomacy, yet sufficiently broad to be able to generate comprehensive conclusions and observe general effects. Therefore, chose to only look at big data that is generated automatically, usually for a purpose different than the analysis for which it is used in diplomacy, and we then look at how this data can serve diplomacy in creating new insights. The data sources that we particularly focus on are: • Data exhaust: passively collected data generated by people’s use of digital services (e.g. phone records, purchases, web searches). • Online information: web content (e.g. social media interactions, news articles, website content). • Physical sensors: satellite or infrared imagery (e.g. changing landscapes, traffic patterns). • Textual data: the large number of texts, reports, messages, and transcripts in digital format, produced by MFAs and other institutions in international affairs. Although this kind of data might be beyond the scope of certain technical definitions of big data, there is are

15

particular benefits for diplomats in conducting textmining analyses on such documents. 1.2.3.2 Big data analysis Rather than using the data as a starting point, the best way to start any data analysis is with a question. These questions should clarify the purpose of using the data, the kinds of behaviours that are to be studied, and the scope of the analysis. For example: How can assistance be provided most effectively after a natural disaster? Which cities are in danger of flooding due to rising sea levels? How positively do citizens feel about foreign countries? To illustrate how a diplomat might be using data analysis, we will describe the hypothetical case of a diplomat in the economic diplomacy unit of Alphaland. Alphaland will organise a trade mission to Betastan, and the diplomat is tasked to find out how the citizens of Betastan perceive Alphaland, and which segments of the population positively or negatively perceive the country. Betastan has a high Internet penetration and Twitter is one of the most popular social media outlets, so our diplomat decides to analyse conversations containing #Alphaland on Twitter, from users in Betastan. Once the question has been defined, the next step is to collect the data. The abundance of data that is being generated suggests that there is no challenge in the lack of available data. Nevertheless, it can be difficult to understand which types of datasets can be used, and to gain access to privatised databases. In our example, the diplomat will rely on Twitter’s Application Programming Interfaces (APIs), which allow users to interact with Twitter services and data. For example, Twitter’s Search API allows access to past tweets, although it will only display the last 3200 tweets of each user, or the last 5 000 tweets per keyword. Twitter’s streaming API monitors tweets as they become available, although Twitter only provides a sample. The Twitter Firehose API is comprehensive and makes all tweets available, but is very costly. Our diplomat choses to rely on Twitter’s search API, as she is only interested in the current situation, and will monitor the mentions of #Alphaland over the last seven days. Our diplomat gets in touch with a data scientist from the data diplomacy unit (Chapter 3) of the MFA, who helps her collect the relevant data – tweets containing #Alphaland – in a programming language, such as ‘Python’ or ‘R’.


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.