Social Media and Real Time Analytics Big Brother Watch 1. Big Brother Watch is a civil liberties and privacy campaign group that was founded in 2009. Specifically related to this inquiry we have produced research into the protection and handling of personal data in the NHS, local authorities and police forces. We have also worked extensively on reforming the Data Protection Act and commissioned polling on global attitudes to online privacy and the use of personal information by private companies. Summary of Key Points •
The major ethical concerns of using personal information include issues of consent, privacy and discrimination.
The new EU Data Protection legislation could potentially allow individuals more control over their information as well as imposing new guidelines on overseas transfers.
The UK’s data protection laws need improvement, specifically Section 55 of the Data Protection Act 1998 where custodial sentences should be introduced for serious offences.
What are the ethical concerns of using personal data and how is this data anonymised for research? Ethical Concerns 2. The most important issue in this area is that of consent. This is really the bedrock for all research projects that will involve the collection of personal information. Individuals should be given all of the relevant guidance on what will happen to their information and then be allowed to give or withhold their informed consent. However many projects involve the concept of implied consent, in these instances communication is key. Implied consent is usually found in opt-out systems where information will be processed unless an individual expressly opts-out. Concerns begin when organisations have failed to inform individuals about their ability to refuse or withdraw their consent. 3. This issue is well highlighted by the Government’s care.data scheme which relies on a system of implied consent. One of the major problems with this is that many people, polling for Big Brother Watch put it as high as 69%, had not been informed of their right to opt-out. Clearly it is wrong to just assume that people have consented to something that they haven’t even properly heard of or didn’t know they could opt-out from. Therefore any project that involving the collection and use of personal information should establish the informed consent of its participants as a priority. 4. A further issue is the effect that the use of personal information can have on individual privacy. The collection of a variety of disparate pieces of information can then be built up into a very accurate and intimate picture of a person. One example of this is the American retailer Target. The chain used “predictive analytics” to monitor their female customers’ changes in shopping habits and work out which ones were pregnant in order to target adverts. As the study Big Data and Due Process: Toward a Framework to Redress Predictive
Privacy Harms points out, one of the major problems was that by using this method Target were in fact manufacturing personally identifiable information. Customers therefore, would have no idea that their information was being used in this manner, even if they suspected that some of it might be being collected. 5. Another concern is that the use of personal data has the potential to lead to discrimination. An example of this would be in the application of big data analytics in the housing market. Whereas landlords are prohibited from advertising to potential buyers on the basis of classifications such as race, religion or gender, personal data and information could be used to sift out individuals deemed “undesirable”. This would circumvent current legislation which is only enforceable through monitoring adverts to check for the use of explicitly discriminatory language. Available properties could then only be advertised to specific groups. 6. The use of big data by law enforcement bodies is another area that can leave people’s privacy at risk. This is highlighted by a piece of software called Social Media Monitor. The software allows its users to find information through “keyword, geographic or individual targeted searches”.1 The press release that accompanied the launch of the product also stated that it would allow law enforcement personnel to build up a “social canvas within minutes”. This would include the location of individuals, allowing for the potential of completely innocent individuals to be tracked and monitored in real time. What is more if this product were to be used in the UK it is unclear how it would fit into the legal framework of RIPA. 7. When personal data is used for research purposes, there is the potential for individuals to stray into unethical behaviour. One example is given by a paper entitled Big Data for all: Privacy and User Control in the Age of Analytics where a psychology professor logged and analysed every phone call and message sent and received by a group of 175 teenagers. 2 Although consent had been given it could be considered unethical as it is doubtful that the subjects of the study could fully assess the implications to their privacy. This is of particular concern if the detail of “metadata”, also known as “communications data” is not well understood by the participants. 8. It is also important to point out that there are different types of information that can be used for research and as a result there are different levels of concern that can be attached to them. Taking just one platform: Twitter, this difference can be easily seen. Whilst the tweets themselves are already in the public domain the use of the data that surrounds them, such as geo-location information can raise privacy concerns. This is also a difference that is not always understood by users or researchers. 9. Taking location data as an example the dangers can be clearly shown. Earlier this year researchers at IBM developed an algorithm that could predict the location of people based 1 LexisNexis Launches New Social Media Investigative Solution for Law Enforcement: http://www.lexisnexis.com/risk/newsevents/press-release.aspx?id=1381851197735305 2 O. Tene and J. Polonetsky, Big Data for All: Privacy and user control in the Age of Analytics, p. 18
on their Tweets with almost 70% accuracy 3. More disturbing applications can be found in websites such as Sleeping Time which has the tagline of “Find the sleeping schedule of anyone on Twitter”. By analysing the times tweets were sent as well as the time zone of the tweeter the website claims to be able to predict when and how long Twitter users are likely to be sleeping4. 10. It is of course worth noting that the use of data and personal information in activities such as law enforcement and research is not in itself a bad thing. It hinges upon the intentions of the researcher or analyst. If it is used in a proportionate manner and the proper safeguards are in place then it can be beneficial, however it is done in a way that would harm personal privacy it can cause concerns. 11. A poll conducted for Big Brother Watch in 2013 showed that 41% of consumers felt that they were being harmed by companies gathering their personal data. As well as this the research showed that 79% of people surveyed were concerned about their privacy online. 5 We have also been at the forefront of the debate around the previously mentioned care.data scheme and have worked to highlight the potential privacy implications that the project could engender. Anonymisation 12. Anonymising data is the process of taking personally identifiable information and making it into non-identifiable data sets. There are a variety of ways that this process can take place. One of the most popular is “hashing” which scrambles previously identifiable information into an entirely new and anonymous code. 13. There are some major concerns around the process of anonymisation. Chief amongst them is the argument that it isn’t necessarily permanent and can be easily reversed or circumvented. One potential way of re-identifying information is to cross-reference it with other available datasets. With an increasing amount of personal information being held in a widening number of places the potential for this process to be used is becoming greater. In the paper Robust De-anonymisation of Large Datasets two researchers showed that knowing just a few details of an individual allowed them to re-identify them from a list of over 500,000.6 14. Our major concern is that in many cases the term ‘anonymous’ is being used to describe ‘deidentified’ data. The two are very different, whereas anonymous data implies that nothing can be used to reconnect the information to an identity, de-identified data’ leaves this 3 J. Mahmud, J. Nichols, C. Drews, Home Location Identification of Twitter Users: http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf 4 Sleeping Time website: http://sleepingtime.org/ 5 Big Brother Watch, New Research: Global Attitudes to Privacy Online: http://www.bigbrotherwatch.org.uk/home/2013/06/new-research-global-attitudes-to-privacy-online.html 6 A. Narayanan and V. Shmatikov, Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Dataset): http://arxiv.org/pdf/cs/0610105v2.pdf
option open and too often these terms are used interchangeably. As the above example illustrates personal information is always at risk from what is known as a ‘motivated reidentifier’, something that Big Brother Watch has previously warned about in evidence to the ICO. A motivated re-identifier is someone who has access to a ‘key’, for example another dataset, which can uncover the identities of the individuals on the de-identified list. This can be of great commercial value to companies, hence the motivation. Therefor the use of the term ‘anonymisation’ can be misleading as it overstated the level of protection that it affords. What impact is the upcoming EU Data Protection Legislation likely to have on access to social media data for research? 15. Linked to this is the inclusion of Article 17, the Right to be forgotten and erasure. This allows participants in research projects the ability to request the deletion of their information if it no longer serves a purpose.7 Article 17, along with the substance of Article 12: Procedures and mechanisms for exercising the rights of the data subject, shifts the burden of evidence from the individual to the researcher. Now, instead of the individual having to justify why the data should be deleted, the researcher must justify why it should be kept. 16. There could also be implications specific to researchers in non-EU states, an area covered by Chapter 5: Transfer of Personal Data to Third Countries or International Organisations. The new legislation sets out new rules for the transfer of data outside of the EEA. 8 In some cases this would mean that prior authorization from the requisite national data protection bodies or the EU Commission would be needed before any information was sent abroad 9. 17. Article 4, point 2(a) introduces the concept of pseudonymisation into legislation. The clause defines it as a process that doesn’t “permit direct identification” of a person but “allows the singling out of a natural person”. The process involves replacing the personally identifiable information in a dataset with a pseudonym, often consisting of a code. The justification to this amendment allows for lighter data protection obligations for data held in this way 10. Is UK legislation surrounding the collection and use of data fit for purpose? 18. The main piece of legislation in this area is the Data Protection Act of 1998. This covers all personal identifiable information.
7 Chapter 2, Article 7, Section 3, General Data Protection Regulation: http://ec.europa.eu/justice/dataprotection/document/review2012/com_2012_11_en.pdf 8 Chapter 5, Articles 40-46, Proposal for a Regulation of the European Parliament and of the Council on the protection of individuals with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation): http://ec.europa.eu/justice/dataprotection/document/review2012/com_2012_11_en.pdf 9 Chapter 5, Article 41, General Data Protection Regulation: http://ec.europa.eu/justice/dataprotection/document/review2012/com_2012_11_en.pdf 10 Amendment 85 of the Albrecht Report: http://www.europarl.europa.eu/meetdocs/2009_2014/documents/libe/pr/922/922387/922387en.pdf
19. The purpose of the Act is to ensure that any information collected is for legitimate purposes and when used it doesn’t adversely affect the individuals in question. It also reinforces the idea that any process should have transparent aims. Whilst these are all laudable principles there are some issues with the legislation. 20. Prominent amongst these are the penalties and sanctions that are can be handed down to those who break the law. Under the current legislation anyone who commits an offence under Section 55, which covers the unlawful obtaining and disclosure of personal information, will not face a custodial sentence. At most they will receive a fine. This is not a criminal offence and as such they will not receive a criminal record. This in itself is not an effective deterrent. We therefore believe, along with the Information Commission’s Office 11, the Joint Committee on the Draft Communications Data Bill 12, Lord Leveson13 as well as the Justice14 and Home Affairs Select Committees15 that custodial sentences should be introduced. The process of effecting these changes would be simple and would not require any further legislation. Under Section 77 of the Criminal Justice and Immigration Act 2008 the Home Secretary has the power to amend the Data Protection Act in order to impose a custodial sentence of up to 2 years for offences under Section 55. 21. When the current response to data protection cases is examined the need for reform is clearly shown. There have been many instances where the Act has been broken and trivial fines have been used as a punishment. One case that can be highlighted as an example is that of a Barclays employee who committed 23 offences under the Data Protection Act and was fined £2990, this works out at just £130 for each time the Act was broken. For the legislation to be effective fines need to be increased and for serious cases custodial sentences should be introduced. 22. One further weakness is that the Act fails to address the problem of third party reidentification. This process involves two or more separate and anonymous datasets being merged to reveal the identity of individuals whose information is on the lists. The failure to legislate against this issue means that the UK is now one of only two countries in Europe where this practise is legal. 23. As has already been shown the deficiencies in the Act are widely known and the case for strengthening it has received widespread support. Big Brother Watch has published a 11 Justice Committee, The functions, powers and resources of the Information Commissioner, Page 13, Paragraph 33: http://www.publications.parliament.uk/pa/cm201213/cmselect/cmjust/962/962.pdf 12 Joint Committee on the Draft Communications Data Bill, Final Report, Section 5, Paragraph 226: http://www.publications.parliament.uk/pa/jt201213/jtselect/jtdraftcomuni/79/7908.htm#a31 13 Rt. Hon. Lord Justice Leveson, An Inquiry into the Culture, Practises and Ethics of the Press, Vol. III, Part H, Chapter 5, Paragraph 2.93 14 BBC News, MPs call for tougher personal data abuse laws: http://www.bbc.co.uk/news/uk-politics15465349 15 Home Affairs Select Committee, Report on Private Investigators, p. 14: http://www.publications.parliament.uk/pa/cm201213/cmselect/cmhaff/100/100.pdf
number of reports which highlight the issue, including Local Authority Data Loss in 2011 and Private Investigators in 2013.