The Proceedings of the International Conference on Cloud Security and Management - ICCSM 2013 by ACPIL

Proceedings g of the International Conference on Cloud Security Management g Centre for Information Assurance and Cyber security University of Washington Seattle, USA 17-18 October 2013

Edited by Dr Barbara Endicott-Popovsky University of Washington S Seattle, l USA A conference managed by ACPI, UK www.academic-conferences.org

The Proceedings of the International Conference on Cloud Security Management ICCSM-2013 Hosted by the Center for Information Assurance and Cybersecurity University of Washington, Seattle, USA 17-18 October 2013 Edited by Dr. Barbara Endicott-Popovsky Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA

Copyright The Authors, 2013. All Rights Reserved. No reproduction, copy or transmission may be made without written permission from the individual authors. Papers have been double-blind peer reviewed before final submission to the conference. Initially, paper abstracts were read and selected by the conference panel for submission as possible papers for the conference. Many thanks to the reviewers who helped ensure the quality of the full papers. These Conference Proceedings have been submitted to Thomson ISI for indexing. Please note that the process of indexing can take up to a year to complete. Further copies of this book and previous yearâ&#x20AC;&#x2122;s proceedings can be purchased from http://academic-bookshop.com E-Book ISBN: 978-1-909507-69-2 E-Book ISSN: 2051-7947 Book version ISBN: 978-1-909507-67-8 Book Version ISSN: 2051-7920 CD Version ISBN: 978-1-909507-70-8 CD Version ISSN: 2051-7939 The Electronic version of the Proceedings is available to download at ISSUU.com. You will need to sign up to become an ISSUU user (no cost involved) and follow the link to http://issuu.com Published by Academic Conferences and Publishing International Limited Reading UK 44-118-972-4148 www.academic-publishing.org

Contents Paper Title

Author(s)

Page No.

Preface

iii

Committee

Biographies

On Preserving Privacy Whilst Integrating Data in Connected Information Systems

Mortaza Bargh Shoae and Sunil Choenni

An Analysis of the Use of Amazonâ&#x20AC;&#x2122;s Mechanical Turk for Survey Research in the Cloud

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler

Extracting and Visualizing Relevant Data From Internet Traffic to Enhance Cyber Security

Teresa Escrig, Jordan Hanna, Shane Kwon, Andrew Sorensen, Don McLane and Sam Chung

Needed: A Strategic Approach to Cloud Records and Information Storage

Patricia Franks

Proxy Impersonation Safe Conditional Proxy ReEncryption

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi and Sree Vivek

Clearing the Air in Cloud Computing: Advancing the Development of a Global Legal Framework

Virginia Greiman and Barry Unger

Picture Steganography in the Cloud Era

Dan Ophir

Forensic Readiness for Cloud-Based Distributed Workflows

Carsten Rudolph, Nicolai Kuntze and Barbara EndicottPopovsky

A Quantitative Threat Modeling Approach to Maximize the Return on Security Investment in Cloud Computing

Andreas Schilling and Brigitte Werners

PHD Research Papers

Security as a Service using Data Steganography in Cloud

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee

An Analytical Framework to Understand the Adoption of Cloud Computing: An Institutional Theory Perspective

Rania El-Gazzar, Fathul Wahid

A Privacy Preserving Profile Searching Protocol in Cloud Networks

Sangeetha Jose, Preetha Mathew and Pandu Rangan

On Providing User-Level Data Privacy in Cloud

Madhuri Revalla, Ajay Gupta and Vijay Bhuse

106

Cloud Security: A Review of Recent Threats and Solution Models

Betrand Ugorji, Nasser Abouzakhar and John Sapsford

115

Masters Research Papers

125

Malware Analysis on the Cloud: Increased Performance, Reliability, and Flexibilty

Michael Schweiger, Sam Chungand Barbara EndicottPopovsky

127

A Cloud Storage System based on Network Virtual Disk

Liang Zhang, Mingchu Li, Xinxin Fan, Wei Yang, Shuzhen Xu

136

Non Academic Paper Disaster Recovery on Cloud - Security Compliance Challenges and Best Practices

147 Sreenivasan Rajendran, Karthik Sundar, Rajat Maini

Work in Progress Papers

149 159

Paper Title

Author(s)

Page No.

Records in the Cloud (RiC): Profiles of Cloud Computing Users

Georgia Barlaoura, Joy Rowe and Weimei Pan

161

Records in the Cloud â&#x20AC;&#x201C; A Metadata Framework for Cloud Service Providers

Dan Gillean, Valerie LeveillĂŠ and Corinne Rogers

166

Preface These Proceedings are the work of researchers contributing to the inaugrual International Conference on Cloud Security Management Security (ICCSM 2013), hosted this year by the Center for Information Assurance and Cybersecurity at the University of Washington, Seattle, USA in collaboration with the Cloud Security Allinace Seattle Chapter, on 17-18 October 2013. The conference chair is Dr. Barbara Endicott-Popovsky from the Center for Information Assurance and Cybersecurity and the Programme Co-Chairs are Dr. Yale Li, Cloud Security Alliance, Seattle, USA and Dr David Manz, Pacific Northwest National Laboratory (PNNL), USA. As organizations rush to adopt Cloud Computing at a rate faster than originally projected , it is safe to predict that, over the coming years, Cloud Computing will have major impacts, not only on the way we conduct science and research, but also on the quality of our daily human lives. Computation research, education, and business communities have been exploring the potential benefits of Cloud Computing and the changes these imply. Experts have predicted that the move to the cloud will alter significantly the content of IT jobs, with cloud clients needing fewer hands-on skills and more skills that administer and manage information. Bill Gates was recently quoted: “How you gather, manage, and use information will determine whether you win or lose.” Cloud Computing impacts will be broad and pervasive, applying to public and private institutions alike. Regardless of the rate of uptake, Cloud Computing has raised concerns. Despite the fact that it has huge potential for changing computation methods for better and provides tremendous research and commercial opportunities, it also creates great challenges to IT infrastructure, IT and computation management and policy, industry regulation, governance, the legal infrastructure and—of course—to information security and privacy. Wikileaks demonstrated the ease with which a massive set of confidential documents, collected and maintained in digital form, can be disseminated. The move to the cloud poses an even greater challenge, aggregating even more massive amounts of information, opening up even greater vulnerabilities, before we have even gained an understanding of the security implications. ICCSM aims to bring together the academic research community with industry experts working in the field of Cloud Security to hear the latest ideas and discuss research and development in this important area. In addition to the papers in these proceedings being presented there will be keynote addresses from Prof. Howard A. Schmidt, Ridge Schmidt Cyber, Mike Hamilton, CISO, City of Seattle, Scott Charney, CVP, Microsoft Corp, Jim Reavis, Cofounder and Executive Director, Cloud Security Alliance and Dr. Luciana Duranti, School of Library, Archival and Information Studies, University of British Columbia (UBC), Canada. With an initial submission of 50 abstracts, after the double blind, peer review process there are 17 research papers published in these Conference Proceedings, including contributions from Canada, China, Germany, India, Israel, Norway, the UK and the USA. We wish you a most enjoyable conference. Dr. Barbara Endicott-Popovsky Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA Dr. Yale Li, Cloud Security Alliance, Seattle, USA Dr David Manz, Pacific Northwest National Laboratory (PNNL), USA

iii

Conference Executive Barbara Endicott-Popovsky, Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA Dr. Yale Li, Cloud Security Alliance, Seattle, USA Marc Pinotti, Cloud Security Alliance, Seattle, USA

Mini Track Chairs

Barbara Endicott-Popovsky, Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA Dr Volodymyr Lysenko, Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA Dr Nasser Abouzakhar, School of Computer Science, University of Hertfordshire, UK

Committee Members The conference programme committee consists of key people in the information assurance, information systems and cloud security communities around the world. Dr Nasser Abouzakhar, University of Hertfordshire, UK; Dr William Acosta, University of Toledo, USA; Dr Todd Andel, University of South Alabama, USA; Darko Androcec, University of Zagreb, Croatia; Dr Olga Angelopoulou, University of Derby, UK; Mario Antunes, Polytechnic Institute of Leiria & CRACS (University of Porto), Portugal; Dr Alexander Bligh, Ariel University Center, Ariel, Israel; Professor Andrew Blyth, University of Glamorgan, UK; Colonel (ret) Dr Colin Brand, Graduate School of Business Leadership, Pretoria, South Africa; Dr Svet Braynov, University of Illinois at Springfield, USA; Professor Bill Buchanen, Napier University, UK; Professor David Chadwick, University of Kent, UK; Haseeb Chaudhary, Plymouth University and RBS, Uk; Dr. Joobin Choobineh, Texas A&M University, USA; Professor Sam Chung, University of Washington, Tacoma, USA; Dr Nathan Clarke, University of Plymouth, UK; Dr. Ronen A. Cohen, Ariel University Center, Israel; Professor Manuel Eduardo Correia, DCC/FCUP Oporto University, Portugal; Dr. Paul Crocker (University of Beira Interior, Portugal); Geoffrey Darnton, Requirements Analytics, Bournemouth, UK; Evan Dembskey, UNISA, South Africa; Frank Doelitzscher, University of Applied Sciences Furtwangen, Germany; Patricio Domingues, Polytechnic Institute of Leiria, Portugal; Prokopios Drogkaris, University of Aegean, Greece; Daniel Eng, C-PISA/HTCIA, China; Dr Cris Ewell, Seattle Children's Hospital, USA; Dr John Fawcett, University of Cambridge, UK; Professor and Lieutenant-Colonel Eric Filiol, Ecole Supérieure en Informatique, Electronique et Automatique, Laval, France; Professor Steve Furnell, Plymouth University, UK; Dineshkumar Gandhi, Adroit Technologies, India; Dr Javier Garci'a Villalba, Universidad Complutense de Madrid, Spain; Dr Samiksha Godara, Shamsher Bahadur Saxena College of Law, India; Assistant Professor Virginia Greiman, Boston University, USA; Dr Michael Grimaila, Air Force Institute of Technology, USA; Professor Stefanos Gritzalis, University of the Aegean, Greece; Dr Marja Harmanmaa, University of Helsinki, Finland; Jeremy Hilton (Cranfield University/Defence Academy, UK); Professor Aki Huhtinen, National Defence College, Finland; Dr Berg Hyacinthe, Assas School of Law, Universite Paris II/CERSA-CNRS, France; Professor Pedro Inácio, University of Beira Interior, Portugal; Dr Abhaya Induruwa, Canterbury Christ Church University, UK; Ramkumar Jaganathan, VLB Janakiammal College of Arts and Science (affiliated to Bharathiar University), India; Professor Hamid Jahankhani, University of East London, UK; Dr Helge Janicke, De Montfort University, UK; Dr Kevin Jones, EADS Innovation Works, UK; Dr Nor Badrul Anuar Jumaat, University of Malaya, Malaysia; Assistant Professor Maria Karyda, University of the Aegean, Greece; Dr Vasilis Katos , Democritus University of Thrace, Greece; Jyri Kivimaa, Cooperative Cyber Defence and Centre of Excellence, Tallinn, Estonia; Spyros Kokolakis, University of the Aegean, Greece; Dr Marko Kolakovic, Faculty of Economics & Business, Croatia; Ahmet Koltuksuz, Yasar University, Turkey; Theodoros Kostis, Hellenic Army Academy, Greece; Associate Professor Ren Kui, State University of New York at Buffalo, USA; Professor Rauno Kuusisto, Finish Defence Force, Finland; Harjinder Singh Lallie (University of Warwick (WMG),United Kingdom); Dr Laouamer Lamri, Al Qassim University, Saudi Arabia; Juan Lopez Jr., Air Force Institute of Technology, USA; Dr Volodymyr Lysenko, Center for Information Assurance and Cybersecurity, University of Washington, Seattle, USA; Dr Bill Mahoney, University of Nebraska, Omaha, USA; Dr Hossein Malekinezhad, Islamic Azad University of Naragh, Iran; Professor Mario Marques Freire, University of Beira Interior, Covilhã, Portugal; Ioannis Mavridis, University of Macedonia, Greece; Dr John McCarthy, Cranfield University, UK; Rob McCusker, Teeside University, Middlesborough, UK; Mohamed Reda Yaich, École nationale supérieure des mines , France; Dr Srinivas Mukkamala, New Mexico Tech, Socorro, USA; Dr. Deanne Otto, Air Force Institute of Technology Center for Cyberspace Research, USA; Tim Parsons, Selex Communications, UK; Dr Andrea Perego, European Commission - Joint Research Centre, Ispra, Italy; Dr Yogachandran Rahulamathavan, City University London, UK; Dr Ken Revett, British University in Egypt, Egypt; Dr. Keyun Ruan (University College Dublin, Ireland); Professor Henrique Santos, University of Minho, Portugal; Ramanamurthy Saripalli, Pragati Engineering College, India; Chaudhary Imran Sarwar, Creative Researcher, Lahore, Pakistan; Sameer Saxena, IAHS Academy, Mahindra Special Services Group , India; Professor Corey Schou, Idaho State University, USA; Professor Dr Richard Sethmann, University of Applied Sciences Bremen, Germany; Dr Armin Shams, University College Cork, Ireland; Dr Yilun Shang, University of Texas at San Antonio, USA; Dr Dan Shoemaker, University of Detroit Mercy, Detroit, USA; Paulo Simoes, University of Coimbra, Portugal; Professor Jill Slay, University of South Australia, Australia; Professor Michael Stiber, University of Washington Bothell, USA; Professor Iain Sutherland, University of Glamorgan, Wales, UK; Anna-Maria Talihärm, Tartu University, Estonia; Professor Sérgio Tenreiro de Magalhães, Universidade Católica Portuguesa, Portugal; Professor Dr Peter Trommler, Georg Simon iv

Ohm University Nuremberg, Germany; Dr Shambhu Upadhyaya, University at Buffalo, USA; Renier van Heerden, CSIR, Pretoria, South Africa; Rudy Vansnick, Internet Society Belgium; Dr Stilianos Vidalis, Newport Business School, Newport, UK; Prof. Kumar Vijaya, High Court of Andhra Pradesh, India; Dr Natarajan Vijayarangan, Tata Consultancy Services Ltd, India; Nikos Vrakas, University of Piraeus, Greece; Professor Matt Warren, Deakin University, Australia; Dr Tim Watson, De Montfort University, UK; Dr Santoso Wibowo, Central Queensland University, Australia; Dr Omar Zakaria, National Defence University of Malaysia; Dr Zehai Zhou, University of Houston-Downtown, USA; Professor Andrea Zisman, City University London, UK

Biographies Conference Chair Dr Barbara Endicott-Popovsky hols the post of Director for the Center of Information Assurance and Cybersecurity at the University of Washington, an NSA/DHS Center for Academic Excellence in Information Assurance Education and Research, Academic Director for the Masters in Infrastructure Planning and Management in the Urban Planning/School of Built Environments and holds an appointment as Research Associate Professor with the Information School. Her academic career follows a 20-year career in industry marked by executive and consulting positions in IT architecture and project management. Barbara earned her Ph.D. in Computer Science/Computer Security from the University of Idaho (2007), and holds a Masters of Science in Information Systems Engineering from Seattle Pacific University (1987), a Masters in Business Administration from the University of Washington (1985).

Programme Co-Chairs Dr Yale Li is one of the earliest cloud security experts (CCSK) in the world. He currently serves as Board Member & Research Director at CSA (Cloud Security Alliance) - Greater Seattle Area Chapter. At CSA global level, he has contributed to TCI (Trusted Cloud Initiative) Reference Architecture as a member architect and contributed to Security as a Service SIEM (Security Information and Event Management) as a co-lead. Yale is also Principal Security Architect at Microsoft, and is responsible for cloud security, data security, applications development security, and emerging markets security in the company. Additionally, Yale works closely with the Center of Information Assurance and Cyber Security at University of Washington as a graduate education partner. Most recently, he has joined the Board of G-Tech Institute as the founder of G-Tech/CSA Research Lab. During his 20+ year IT and security professional service, Yale has created strategy, designed architecture, developed solutions, and published books and papers across industry and applied research areas. Dr David Manz has been a cyber security research scientist at PNNL since January 2010. He is currently a Senior Cyber Security Scientist in the National Security Directorate. He holds a B.S. in computer information science from the Robert D. Clark Honors College at the University of Oregon and a Ph.D. in computer science from the University of Idaho. David's work at PNNL includes enterprise resilience and cyber security, secure control system communication, and critical infrastructure security. Prior to his work at PNNL, David spent five years as a researcher on Group Key Management Protocols for the Center for Secure and Dependable Systems at the University of Idaho (U of I). David also has considerable experience teaching undergraduate and graduate computer science courses at U of I, and as an adjunct faculty at WSU. David has co-authored numerous papers and presentations on cyber security, control systems, and cryptographic key management.

Keynote Speakers Howard Schmidt serves as a partner in the strategic advisory firm, Ridge Schmidt Cyber, an executive services firm that helps leaders in business and government navigate the increasing demands of cybersecurity. He serves in this position with Tom Ridge, the first secretary of the Department of Homeland Security. He also serves as executive director of The Software Assurance Forum for Excellence in Code (SAFECode). Howard A. Schmidt brings together talents in business, defense, intelligence, law enforcement, privacy, academia and international relations, gained from a distinguished career spanning 40 years. He served as Special Assistant to the President and the Cybersecurity Coordinator for the federal government. In this role Mr. Schmidt was responsible for coordinating interagency cybersecurity policy development and implementation and for coordinating engagement with federal, state, local, international, and private sector cybersecurity partners. Previously, Mr. Schmidt was the President and CEO of the Information Security Forum (ISF). Before ISF, he served as Vice President and Chief Information Security Officer and Chief Security Strategist for eBay Inc., and formerly operated as the Chief Security Officer for Microsoft Corp. He also served as Chief Security Strategist for the US-CERT Partners Program for the Department of Homeland Security. Howard also brings to bear over 26 years of military service. Beginning active duty with the Air Force, he later joined the Arizona Air National Guard. With the AF he served in a number of military and civilian roles culminating as Supervisory Special Agent with the Office of Special Investigations (AFOSI). He finished his last 12 years as an Army Reserve Special Agent with Criminal Investigation Division’s (CID) Computer Crime Unit, all while serving over a decade as police officer with the Chandler Police Department. Mr. Schmidt holds a bachelor’s degree in business administration (BSBA) and a master’s degree in organizational management (MAOM) from the University of Phoenix. He also holds an Honorary Doctorate degree in Humane Letters. Howard was an Adjunct Professor at GA Tech, GTISC, Professor of Research at vi

Idaho State University and Adjunct Distinguished Fellow with Carnegie Mellon’s CyLab and a Distinguished Fellow of the Ponemon Privacy Institute. Dr. Luciana Duranti is Chair of Archival Studies at the School of Library, Archival and Information Studies of the University of British Columbia (UBC), and a Professor of archival theory, diplomatics, and the management of digital records in both its master’s and doctoral archival programs. She is also Faculty Associate Member of the UBC College for Interdisciplinary Studies, Media and Graphics Interdisciplinary Centre, and Affiliate Full Professor at the University of Washington iSchool. Duranti is Director of the Centre for the International Study of Contemporary Records and Archives (CISCRA, www.ciscra.org) and of InterPARES (www.interpares.org), the largest and longest living publicly funded research project on the long-term preservation of authentic electronic records (1998-present), the Digital Records Forensics Project, and the Records in the Clouds Project. She is also co-Director of “The Law of Evidence in the Digital Environment” Project. Luciana is active nationally and internationally in several archival associations and in boards and committees, such as the UNESCO International Advisory Committee of the Memory of the World Program (for whom it has chaired the program of the 2012 Vancouver Conference); the Canadian and American National Standards Committee for records management, as well as the International Standards Organization committee on the same matters; and the China Research Centre on Electronic Records Board. For her university work Duranti has been honoured with the Faculty Association's Academic of the Year Award in 1999. Her research has been recognized in 2006 with the Emmett Leahy Award for her contributions to records management and the drafting of the American and European standards for recordkeeping (DoD 5015.2 and MoReq 1 & 2), and for the InterPARES research; with the British Columbia Innovation Council Award, which is annually presented to “an individual who has opened new frontiers to scientific research;” with the Killam Research Prize; and in 2007 with the Jacob Biely Research Prize, the University of British Columbia’s “premier research award.” In 2012 she was awarded the Inaugural ARMA (Association of Records Managers and Administrators) International “Award for Academic Excellence in teaching, research, and contribution to the global citizenry.” Jim Reavis, Executive Director of the CSA. He was recently named as one of the Top 10 cloud computing leaders by SearchCloudComputing.com. Jim is the President of Reavis Consulting Group, LLC, where he advises security companies, large enterprises and other organizations on the implications of new trends and how to take advantage of them. Jim has previously been an international board member of the ISSA and formerly served as the association’s Executive Director. Jim was a cofounder of the Alliance for Enterprise Security Risk Management, a partnership between the ISSA, ISACA and ASIS, formed to address the enterprise risk issues associated with the convergence of logical and traditional security. Jim currently serves in an advisory capacity for many of the industry’s most successful companies.

Mini Track Chairs Dr Nasser Abouzakhar is a senior lecturer at the University of Hertfordshire, UK. Currently, his research area is mainly focused on cloud security and forensics, critical infrastructure protection and applying machine learning solutions to various Internet and Web security and forensics related problems. He received MSc (Eng) in Data Communications in 2000 followed by PhD in Computer Sci Engg in 2004 from the University of Sheffield, UK. Nasser worked as a lecturer at the University of Hull, UK in 2004-06 and a research associate at the University of Sheffield in 2006-08. He is a technical studio guest to various BBC World Service Programmes such as Arabic 4Tech show, Newshour programme and Breakfast radio programme. Nasser is a BCS (British Computer Society) assessor for the accreditation of Higher Education Institutions (HEIs) in the UK, BCS chartered IT professional (CITP), CEng and CSci. His research papers were published in various international journals and conferences.

Dr Volodymyr Lysenko is a graduate of the Ph.D. program in Information Science at the Information School of the University of Washington, Seattle. He also has a degree in Physics. Volodymyr’s research interests are in the area of political cyberprotests and cyberwars in the international context."

vii

Biographies of Presenting Authors Otis Alexander is currently a student at the University of Washington, Tacoma. He is working towards a Master of Science degree in Computer Science and Systems. His research interests include application level intrusion detection systems for Industrial Control Systems (ICS) and artificial intelligence based solutions for cybersecurity. Mortaza S. Bargh (Ph.D.) was a scientific researcher at Telematica Instituut (Netherlands, 1999-2011). Currently Mortaza is a research professor in Rotterdam University of Applied Sciences. He works part-time as researcher at Research and Documentation Centre (Netherlands). His research topics include preserving privacy in Big Data infrastructures, collaborative security and distributed Intrusion Detection Systems. Georgia Barlaoura is a recent graduate of the Master of Archival Studies program at the University of British Columbia. She participates in the Records in the Cloud and InterPares Trust international research programs. The area of her research is the management of records in relation to risk management. Sam Chung, PhD, is an Endowed Professor of Information Systems and Information Security and an Associate Professor and founder of the Information Technology and Systems Program at the University of Washington, Tacoma. His research interests include Secure Service-Oriented Software Reengineering in a cloud computing environment, software assurance, and Attack-Aware Cyber Physical Systems. Charles Costarella earned his MSCSS from UW Tacoma June 2013. His research interests are honeynets, botnets, and virtualization. He has worked as a software developer at the USAF Flight Test Center at Edwards AFB. He has taught at Antelope Valley College in California, Highline College in Des Moines, WA, and teaches ITS at UW Tacoma. Rania Fahim El-Gazzar is a PhD Research Fellow in the Department of Information Systems at University of Agder. She holds a Bachelor degree in Business Administration, major MIS and a Master degree in Information Systems from the Arab Academy for Science and Technology and Maritime Transport. Her current research is in the area of cloud computing. Teresa Escrig is Visiting Associate Professor of the Institute of Technology at the University of Washington, Tacoma and Associate Professor at U. Jaume I (UJI), Spain. Dr Escrig founded the Cognition for Robotics Research group at UJI in 2002. Her research interests include Qualitative Representation and Reasoning and its application to Cybersecurity. Patricia Franks, PhD, CRM, coordinates the Master of Archives and Records Administration degree program at San JosĂ¨ State University. She is a member of the SSHRC-funded InterPARES Trust interdisciplinary research team exploring the topic of Trust in the digital environment. Her most recent publication, Records and Information Management, was published by ALA/Neal-Schuman in 2013. Virginia Greiman is Assistant Professor at Boston University in international law, cyberlaw and regulation and megaproject management and is an affiliated faculty member at the Harvard Kennedy School in cyber trafficking. She has held high level appointments with the U.S. Department of Justice and served as International legal counsel with the U.S. Department of State and USAID. Sangeetha Jose is a PhD scholar in the Indian Institute of Technology (IIT) Madras, Chennai, India. She is working under the guidance of Prof. C. Pandu Rangan. Her research interests are in provable security mainly focus on the design and analysis of public key encryption and digital signatures and security of cloud computing. Shane Kwon is a graduate student in the Master of Science in Computer Science and Systems program at the University of Washington, Tacoma. He has been researching graph databases in the context of cybersecurity in order to enhance analysis performance. His research interests include graph databases, graph-based network traffic analysis, and in-memory computing. Nicolai Kuntze received his Diploma of computer science from the Technical University of Darmstadt in 2005. Since then he has been employed at the Fraunhofer SIT as a researcher in IT security. The focus of his research activities lies within the areas of mobile and embedded systems, applied trusted computing, and forensics and digital evidence viii

Valerie Léveillé is currently pursuing a Dual Masters degree in Archival Studies and Library and Information Studies at the School of Library, Archival and Information Studies (SLAIS) at the University of British Columbia, Canada. She is involved as a Graduate Research Assistant on both the Records in the Cloud project (RiC) and the InterPARES Trust project. Rajat Maini works with Infosys as a Senior Consultant. He is a PMP Certified professional and has over 8 years of IT Industry experience. He holds a Bachelor of Engineering in Electronics and Communication. His strengths include process analysis, driving service optimization and continual service improvement initiatives. Currently he is focused on disaster recovery and process improvement engagements. Dan Ophir has a PhD from the Weizmann Institute of Science, Israel in Computer Science and Mathematics. He is Senior Lecturer and Researcher in the Ariel University and in Afeka Academic College for Engineering, Israel. Dan is advisor and codeveloper in Hightech Software and Defense Industries, author of scientific articles, and has participated in tents algorithmicmathematical scientific conferences. Anita Balaji Ramachandran has a BE degree in Electronics and Communication from Bharathidasan University and ME degree in computer science from Anna University. Currently, she is doing research in the area of cloud computing under the guidance of Dr Saswati Mukherjee. Her research interests include cryptography and information security in cloud computing and also Big Data scenario. Madhuri Revalla is currently pursuing a PhD in Computer Science at Western Michigan University, Kalamazoo, MI, USA. Her doctoral dissertation is on security in cloud computing. Before coming to USA, she worked for 5 years in Hyderabad, India. Currently she is working as a programmer intern at Weidenhammer, Kalamazoo. Corinne Rogers is a doctoral candidate and sessional instructor at SLAIS, University of British Columbia, Vancouver, Canada, where she teaches digital diplomatics and digital records forensics. Her doctoral research investigates concepts of authenticity of digital records, documents, and data, and the assessment of authenticity of digital documentary evidence in the legal system. Carsten Rudolph received his PhD at Queensland University of Technology, Brisbane in 2001. He is now head of the research department Trust and Compliance at the Fraunhofer Institute for Secure Information Technology. His research concentrates on information security, formal methods, security engineering, cryptographic protocols, and security of digital evidence. Andreas Schilling received his Bachelor’s degree and his Master’s degree in Applied Computer Science from Ruhr University Bochum, Germany, in 2010 and 2012, respectively. He is currently pursuing a PhD focused on the problem of measuring and improving security risks in cloud computing, with special attention to quantitative methods and mathematical optimization. Michael Schweiger graduated from the University of Washington Tacoma with a MS in Computer Science in June 2013 and is pursuing a career in cyber security. His long term interests are centered on penetration testing/hacking and malware analysis. He expects to continue research in these areas in the future. Andrew Sorensen is an undergraduate student in the Computer Science and Systems program at the University of Washington, Tacoma. Andrew has experience with web and Linux security issues, and takes leadership the computer security student organizations at the University of Washington. His research interests consist of security through isolation, and Linux userspace security controls. Rajendran Sreenivasan is an Associate Consultant with Infrastructure Transformation Services at Infosys, India. He is an IBM Certified professional focusing in Disaster Recovery. He holds a Masters Business Administration in Systems and Bachelor of Engineering in Electrical and Electronics Engineering. His current focus is on Data Centre consolidation and optimization along with Disaster recovery planning. Karthik Sundar works with Infosys as a Senior Consultant and is a core member of the Data Center transformation team. He has expertise in the fields of Virtualization, Data Center Automation, Green IT and private cloud efficiency. His focus includes IT Infrastructure architecture, including application, database, network and storage, disaster recovery and business continuity planning. Jeff Thorne holds the position of Sr Sales Engineer at Zscaler the leader is cloud based security solutions. Prior to joining Zscaler Jeff was a member of the security engineering teams at Trend Micro and VMware. Jeff has also held various engineering ix

roles at exciting startups such as Ooyala, Third Brigade, and Entrust based out of Silicon Valley California. Jeff is currently based out of Redmond, WA and has attended Trent University and the University of Washington. Betrand Ugorji studied MSc. in Secure Computing Systems and is currently a Database Development Engineer and PHD student researching on IDS in cloud based environment at the University of Hertfordshire UK.

On Preserving Privacy Whilst Integrating Data in Connected Information Systems Mortaza Bargh1and Sunil Choenni2 1 Rotterdam University of Applied Sciences, Rotterdam, The Netherlands 2 Research & Documentation Centre (WODC), Ministry of Security and Justice, The Hague, The Netherlands m.shoae.bargh@hr.nl r.choenni@hr.nl

Abstract: Currently information systems collect, process and disseminate a huge amount of data that usually contains privacy sensitive information. With the advent of cloud computing and big data paradigms we witness that data often propagates from its origin (i.e., data subjects) and goes through a number of data processing units. Along its path the data is quite likely processed and integrated with other information about the subject in every data processing unit. One can foresee that at a point in the chain of data processors a data processor may infer more privacy sensitive information about the data subject than the data subject agrees with. This can lead to privacy breaches if inadequate privacy preserving measures are in place. In this contribution we firstly elaborate on the necessity of a so-called forward policy propagation mechanism in a chain of data processors. This mechanism allows propagating the privacy preferences of individuals along with their data items. These data items, if appropriate measures are not taken, can potentially be merged with other information about the subject in a way that privacy sensitive information about the user is derived. Knowing the privacy preferences, a data processing unit can ensure the enforcement of user privacy preferences on the processed data by, for example, re-anonymising it according to the privacy preferences. We further argue that, on ground of on-going initiatives and trends, one also needs to inform the upstream data processors whose information items are used to infer more privacy sensitive information about the user than it is allowed. This will enable the upstream data processors to revisit their data sharing policies in light of the occurred privacy breaches. Therefore we further propose a so-called backward event propagation mechanism to report data breach events towards upstream data processors. We also briefly touch upon the legal implications and legislative trends/gaps in this regard and the impacts of such a feedback mechanism on Open Data initiatives. Keywords: data fusion, data integration, information systems, privacy-by-design, privacy policies

1. Introduction Personal and private data such as names, addresses, birthdates, geo-locations, photos, opinions are scattered throughout the Internet. In the age of Big Data there is a growing number of information systems â&#x20AC;&#x201C; such as those behind Online Social Networks (OSNs), e-government portals and Web-shops â&#x20AC;&#x201C; that collect, process and store personal and private information about users and citizens. These information systems are bound by the law or business/moral obligations to protect the privacy rights of individuals. Unfortunately information systems are fairly vulnerable to information leakage, as we have witnessed many privacy breaches in recent years (Bargh et al. 2012). Further, there is a trend of sharing information among information systems in both public and private domains. For example, consider information sharing cases that occur within merging organisations, for business intelligence, or as part of Open Data initiatives. Sharing information increases the exposure likelihood of privacy sensitive data about a user due to using multiple data sources and advanced data mining techniques. Individuals, businesses and the society endure enormous costs due to privacy breaches. Individuals can experience financial losses, emotional embarrassments, employment or business opportunity losses, and increased health and life insurance fees. Organizations and businesses can face direct, indirect and implicit costs. Direct costs range from legislative fines, shareholder lawsuits, third party and customer compensations, profit losses, and legal defence costs. Implicit costs include the costs associated with upgrading and maintaining of system safeguards. Implicit costs encompass the costs associated with reputation and branding damages, losses of goodwill, declined turnover and customer loyalties. At large, privacy breaches impact the society as a result of their negative impact on the collective trust of people in online services. Design of privacy preserving systems is a tedious and complex problem (Choenni et al. 2010; Choenni and Leertouwer 2010) because the designer needs to simultaneously consider three contending deriving forces: the preferences/wishes of users, the limitations/constraints of technologies, laws and regulations; and the

Mortaza Barghand Sunil Choenni malevolence of adversaries. User privacy preferences, in turn, depend not only on context (e.g., the location, time and situation of users) but also on personality (i.e., the differences among users). Cavoukian (2009) has recently proposed the Privacy by Design (PbD) guiding principles that aim at nurturing a privacy conscious mindset in information system design. Jonas operationalizes Cavoukian PbD principles in a Big Data analytic platform by defining a number of privacy preservation features in a sense-making way (Cavoukian and Jones 2012). Such a sense-making approach/viewpoint implies that with every observation something should be learned and the privacy preservation features are not rigid. The latter implies that Jonas’ features are inspirations to be improved upon in other practices (Cavoukian and Jones 2012). Emerging information systems and their integration result in complex and dynamic systems that, in turn, may create privacy breaches as a by-product. For example, integrated information systems may inadvertently enable re-identification of individuals over large data sets. In this contribution we aim at preserving privacy when integrating information systems in Big Data settings. We introduce two new concept features called forward policy propagation and backward event propagation. We argue that these features should necessarily be embedded into the information systems that share information with each other. Although our contributions are formulated within the Big Data context, they are platform agnostic and can directly be applicable to, for example, cloud computing settings. Our presentation uses data processors as the system building blocks that deliver data intelligence services. These services are steadily migrated to and provided via cloud platforms. As such, the cloud counterparts of data processors inherit similar privacy issues and benefits to those discussed in this contribution, particularly in the Software as a Service model of cloud computing (Takabi et al. 2010). Our contributions are mainly based on our own experience gained from integrating information systems. Our proposed features can be considered as extensions of Jonas’ features, which are, in turn, derived from applying the sense-making approach to existing implementations. We present the rationale behind our features and elaborate upon their implications on Open Data initiatives. In the following we outline the background information and guiding principles of our work (Section 2). Subsequently we present a couple of scenarios with cascaded data integration processes that result in reidentification of anonymised data (Section 3). We then discuss two possible solutions regarded as extensions of Jonas’ suite of privacy-enhancing features (Section 4). At the end we draw our conclusions (Section 5).

2. Background 2.1 A privacy model Protecting privacy is concerned with personal data. According to the Data Protection Act (IOC 2013a), personal data is the data that relates to a living individual who can directly be identified from the personal data or from the combination of the personal data with any other information. Identification of an individual occurs when one can distinguish the person within a group. Simply unknowing someone’s name does not mean that the person is unidentifiable (IOC 2013b). Those personal data items that could be used in a discriminatory way, like about someone’s ethnical origin, religious believes, political opinions, mental or physical health, sexual life, etc. are considered as sensitive personal data that should be treated with more care than other personal data. In legal terms there are three entities related to personal data, using which we would like to explain our viewpoint. These entities are data subject, data controller, and data processor, whose relations are shown schematically in Figure 1. Data subject is the one whom the personal data is about. Data controller is the person, recognised in law as individuals, organisations, corporate bodies, etc., who determines the purpose and the way of processing the personal data. Data processor is any person who processes the personal data on behalf of the data controller. Processing functionality includes obtaining, recording, holding, and doing operations (such as adaptation, retrieval, disclosure by transmission or dissemination, alignment, combination) on the data. Often it is difficult to determine who the data controller or who the data processor is due to complexity of (business) relationships (IOC 2013c). Therefore, for the purpose of this paper, we show them as one entity as illustrated in Figure 1.

Mortaza Barghand Sunil Choenni

Figure 1: A data privacy model with three entities of data subject, controller and processor. The definition of personal data according to the Data Protection Act embeds an interesting property that is particularly noticeable on the verge of the information system integration era. The personal data includes also the data that can be related to an individual when it is combined with other information. Therefore the Data Protection Act considers the data items that can potentially be related to individuals as personal. The question that we would like to focus on is how we can protect privacy of such pieces of data, being potentially related to individuals, in integrating information systems where we do not know which other information is going to be fused with this data in the future. Data anonymisation can be used to remove the relation of data to a specific user. In other words, data anonymisation removes the unique distinguishability of an individual (i.e., the user identification capability) from the data under consideration. Differential privacy (Dwork 2009) provides a metric to measure how well/uniquely a data subject is distinguishable from the data by comparing the statistics of identification capability before and after data anonymisation.

2.2 Privacy and context Privacy is subjective and context-dependent. Nissenbaum (2004) considers “contextual integrity” as the benchmark of privacy. According to this contextual integrity view, privacy is infringed when one or more “information norms” are violated in a given situation. These information norms are of two types: appropriateness, which governs what information about persons is appropriate to reveal in a context, and flow or distribution, which governs how far information about persons is transferred in a context. The context here is a sphere in which the information is shared (e.g., location, politics, convention, cultural expectation) and it captures the whole environment including the audience. In other words, privacy is about knowing how far to share information in an appropriate way. For example, individuals share their personal information in social networks because they want to gain friendship, support, recognition, knowledge, etc. The privacy problem arises, however, when this information is shared with or used by information systems out of the context (Boyd 2010). In policy/legal domains privacy is about someone being in control of his/her own information. The subjectivity and context-dependency of privacy imply that privacy is more than just being in control of own information (Boyd 2010; Kalidien 2010). As such we believe that privacy laws, like the Data Protection Act, capture in its generic sense the baseline privacy preferences of users. Using the term baseline we would like to acknowledge that some individuals may prefer more restrictive privacy rules to be imposed upon their personal information than what is required in privacy protection laws.

2.3 Privacy preservation In this age of Big Data, legislators, system designers and data protection activists are deeply concerned with preserving privacy. Cavoukian (2009) has recently proposed 7 guiding principles of the Privacy-by-Design (PbD) concept in order to enable individuals to gain “personal control over one’s information” and to enable organizations to gain “a sustainable competitive edge”. Although being comprehensive, Cavoukian’s PbD principles are basically high-level guidelines and one should still figure out how to realize these principles in a

Mortaza Barghand Sunil Choenni system design and engineering practice (Gürses et al. 2011). As a step forward in translating privacy concerns into operating artefacts (i.e., whatever engineering speak about), Jonas have created a Big Data analytic platform. The platform embeds a number of system features to “enhance, rather than erode, the privacy and civil liberties of data subjects” (Cavoukian and Jones 2012). These hardwired PbD features are: 

Full attribution: The recipients of an observation (record), produced by the system engine, are able to trace every contributing data point back to its source.



Data tethering: Adds, changes and deletes of records must appear immediately across the informationsharing ecosystem in real time.



Analytics on anonymised data: Due to the ability of performing advanced analytics, organizations can anonymise more data before sharing it.



Tamper resistant audit logs: Users’ search should be logged in a tamper-resistant manner to enable data access audits.



False negative favouring methods: Often missing a few things is better than making false claims affecting someone’s civil liberties.



Self-correcting false positives: Whenever a new data point is presented, the validity of the prior assertions that depend on this data point must be re-evaluated/repaired in real time.



Information transfer accounting: Every transfer of data to a tertiary system must be recorded to let the stakeholders know how their data flows.

Jonas’ features provide a number of inspiring guidelines to embed privacy protecting capabilities in design of Big Data processing systems. In this contribution, based on our experience in designing such privacy preserving systems, we outline two additional features that extend Jonas’ features.

3. Data integration Integration of information systems has become a trend in the current era in various public, private and semi public sectors/domains. For example (van den Braak et al. 2012; Choenni and Leertouwer 2010) describe an initiative in the public sector to interconnect the databases of the Dutch Ministry of Security and Justice. Similarly, with integration of various Google services (like Gmail, Google+, Google Drive) and acquisition of Instagram, an integration of various databases has taken place within Google and Facebook, respectively. Open Data initiatives, on the other hand, aim at releasing public sector data to citizens as a measure of government transparency. Such initiatives enable semi-public data integration cases, where the released public data gets integrated with other data sources to deliver added value services. In this section we elaborate upon the issues that may arise when cascading data controllers/processors in information systems integration settings.

3.1 A generic case Figure 2 below illustrates a generic information system that consists of a data subject and a couple of data controllers and data processors. In this generic model we assume, without loss of generality, that the data processor and controller are collocated. Such a data controller/processor, denoted by Dcp, may feed its data to a subsequent data controller/processor in a setting. In Figure 2 there are two distinct data controllers/processors that collect the (sensitive) personal data of data subject Ds, who has a unique identifier ID. Subsequently, these data controllers/processors, denoted by Dcp1 and Dcp2, produce and store anonymised Information Items (II), denoted by II1 and II2, with pseudo identifiers ID1 and ID2, respectively, about the user. Without loss of generality, we assume that data controllers/processors Dcp1 and Dcp2 apply data anonymisation according to a privacy policy that captures the user’s privacy preferences.

Mortaza Barghand Sunil Choenni

Figure 2: A generic information integration scenario. Now let’s assume that data controller/processor Dcp3 obtains the anonymised items II1 and II2 from data controllers/processors Dcp1 and Dcp2. Subsequently Dcp3 derives information item II3 as shown in Figure 2. According to a use-case studied in (Choenni, van Dijk and Leeuw 2010), although the anonymised information items about the user are being fed to Dcp3, it is still possible for Dcp3 to link such anonymised information items and infer that II1 and II2 refer to the same object (i.e., the same user) with a high probability of about 95%. Here Dcp3 does not necessarily know the unique ID of the user, however, it infers that ID1 ≈ ID2.

3.2 Implications When merging information from two different sources a data controller/processor may associate two information items to the same data object. The union of the two information items may reveal who the data subject they belong to. This is called re-identification of data subject, which does not necessary means that we know the ID (e.g., the name) of the user. In legal term a data subject is identifiable when one can uniquely become distinguishable within the group (s)he belongs to (IOC 2013a). For example, if controller/processor Dcp2 (accidently) obtains the Zip code, birthdate and gender from two controllers/processors Dcp1 and Dcp2 it is possible to infer to whom the information items belong with a probability of 63% in the USA (Golle 2006). It is inevitable that at some point a data controller/processor down in the chain of data controllers/processors infers more privacy sensitive information about a data subject than it is allowed by law (as a baseline) or by user privacy preferences. It can even become possible for a data controller/processor to de-anonymise the user as mentioned above. This is a violation of user privacy and forbidden by law. Knowing the privacy preferences, a data controller/processor can try to re-anonymise the inferred data according to these preferences. Often within public domains the privacy policies are well-defined by legislations and the trusted data controllers/processors apply data (re-)anonymization methods appropriately, as required by laws and legislations (see for example (Choenni et al. 2010; Choenni and Leertouwer 2010; van den Braak et al. 2012). Let’s assume a scenario in which the data processor Dcp3 also obtains the un-anonymised information item of II4 from, for example, a social network provider denoted by Dcp4, see Figure 3. Knowing information items obtained from different data controllers/processors it is possible for controller/processor Dcp3 to identify the user with a reasonable certainty. For example, if controller/processor Dcp3 – who knows/obtains the Zip code, birthdate and gender of the data subject by merging data items from two controller/processor Dcp1 and Dcp2 – can also obtain the conventional ID (i.e., the name) of the data subject from the Zip code (e.g., from the area where (s)he lives), birthdate and gender that the user published in a social network Dcp4, then the Dcp3 can also infer that ID1 ≈ ID2 ≈ ID with a rather high probability of 63% in the USA..

Mortaza Barghand Sunil Choenni

Figure 3: A generic scenario for information integration among private/public organisations. As an example, consider the website of (Rechtspraak 2013a) that publishes various information items about, among others, court verdicts within the Dutch Judiciary and the Supreme Court. The justification for publishing such information is the interest of having an open and transparent justice system. Publishing this information, however, should not mean that the website may store and provide access to personal data in a structured way (Rechtspraak 2013b). On this website therefore, for privacy concerns, the names, birthplaces and exact birthdates of suspects, witnesses and victims are removed or obfuscated. Other information about the investigation, prosecution and court procedures are openly communicated and published. Given some keywords provided on the site about a specific homicide/murder case, we were able to discover some of the removed information items about the case by using a simple Google search. For example, it was possible to retrieve the full name of the victim from online news agencies. Also it was possible to derive the first name, the initials of the surnames and the birthplaces of the offenders/suspects from the news items published by these agencies. The result of this exercise does not mean that we have discovered a critical flaw/illegality in the way that the website protects privacy of the parties involved. As also mentioned in (Rechtspraak 2013b), the website does not intend to provide a bulletproof data anonymisation process whereby the identities never get detected after information publication. We conclude, however, that similar privacy infringements are possible in those systems that open their data to the public. The gain of organizational transparency as the main objective of publishing data, we believe, can diminish if not enough care is given to the amount of privacy infringements in post data-publication phases.

4. Discussions Privacy preferences, for example in the form of privacy policies, must naturally be known and complied with by all data controllers/processors. The first hop processors (e.g., Dcp1 and Dcp2 in our scenarios) might be aware of such user preferences/policies readily because they are directly connected to the data subjects. There should be, however, a mechanism that enables the other data controllers/processors down in the information processing chain to know individualsâ&#x20AC;&#x2122; privacy preferences/policies so that they can abide by these preferences/policies. A question that here arises is: How can we inform data processors about user privacy preferences/policies, especially those processors that reside on nodes farther than the first node (e.g., Dcp3 in our examples above)? Note that, as explained before, these data controllers/processors might derive more detailed and privacy sensitive information about a data subject due to using modern data analytics (e.g., Dcp3 knowing both II1, II2, about the data subject). Knowing the privacy policy, a data processor can try to take appropriate actions, for example, to anonymise the inferred information according to the privacy preferences/policy. The effectiveness of the proposed solution depends on the level of commitment or compliance of downstream data

Mortaza Barghand Sunil Choenni controllers/processors in reinforcing the privacy preferences/policies of data subjects. Moreover, laws and regulations need to be in place to require and govern such privacy compliances. Note that the existence and effectiveness of these laws and regulations are out of our scope in this paper. In addition to anonymisation of the newly inferred data according to the privacy preferences/policy, we believe that the data controllers/processors should also inform the upstream data controllers/processors about their information items being used to infer more privacy sensitive information about the user than it is allowed in the privacy preferences/policy. This awareness can enable the upstream data controllers/processors to revisit their anonymisation process in light of the observed possibilities for privacy breaches. This requirement ensures data owners to be in control of their own data. Therefore we need also to address the following question from the viewpoint of designing appropriate information systems. How can we inform upstream data processors when their information items is used to infer privacy sensitive information? The effectiveness of the proposed solution depends on the level of commitment or compliance of downstream data controllers/processors in reporting the privacy infringements incidents to upstream data controllers/processors. We suspect that, although being out of our scope in this paper, there might exist a legal requirement for reporting such privacy breaches. An evidence for this suspicion is a Dutch law proposal that requires data controllers to report harmful data breaches to the data subjects (News-item 2013). Regardless of having or not having a legal base to report such incidents to upstream data controllers/processors, we believe that there is a (moral and logical) necessity to do so if we want to win public trust in organisations that control, share and process data and participate in integration of information systems. Note that in public sectors, like government institutes/organisations, all data controllers/processors are committed to preserve the baseline privacy of users as defined by law (i.e., they can be considered as Trusted Third Parties). Such a commitment, however, might not be in place and might not even work in more open environments with multiple data controller/processor hops. Moreover, in Open Data settings, informing upstream data controllers/processors about the privacy breaches down in the information processing chain may demotivate such organisations in making their data public and in sharing their data publicly. Studying such negative impacts of enabling user awareness is for future studies. In order to address the issues raised in the first question we propose to propagate user privacy preferences, in the form of for example privacy policies, among downstream data processing units. This forward propagation of privacy preferences or policies can be considered as an extension to Jonas’ “full attribution feature”. The technical realisation of the forward policy propagation can be similar to that of the data dissemination policies in traditional information systems. In such systems one may attach attributes and the so-called sticky policies (Chadwick and Lievens 2008) to data objects when disseminating them from data producers to consumers (Krishnan et al. 2009). This mode of information sharing originates from the 1980’s (used in originator-control systems) and the 1990’/2000’s (used for Digital Rights Management); for which recently two policy languages of eXtensible rights Markup Language (XrML) and Open Digital Rights Language Initiative (ODRL) have been introduced (Krishnan et al. 2009). In order to address the issues raised in the second question we propose to propagate user privacy preference/policy violating incidents in the form of a report, alert, event, etc. among the upstream data controllers/processors. This backward propagation of events of privacy breaches can be considered as an extension to the “information transfer accounting feature” of Jonas’ PbD features. For a technical realisation of the proposed backward propagation of privacy breach events we discern the “full attribution” feature of Jonas as a prerequisite. Hereby it becomes possible to trace every contributing data point back to its source. Moreover, for an effective implementation of the proposed feature one should closely follow the lessons learnt from various initiatives (News-item 2013, Dennett 2013) that aim at notifying individuals whose privacy is seriously breached. A challenge in notifying individuals in case of privacy breaches is to determine when a notification is appropriate, especially in low risk breaches (Dennett 2013). Moreover, the proposed mitigation feature influences only the future releases of personal information by upstream data controllers/processors and it cannot remedy the impacts of the already disseminated personal information. It is for future research to define and design the technical solutions that realise the features proposed in this contribution.

Mortaza Barghand Sunil Choenni

5. Conclusions Design of robust and reliable information systems is a complex problem, especially if one needs also to preserve privacy. An information system that is perceived as acceptable at some point in time might turn out to be highly vulnerable when it is deployed and used for some time or in a new context. In addition to such temporal/contextual degradation of system vulnerability status, integration of information systems results in complex and dynamic systems that, in turn, may create privacy breaches as a by-product. Organisations in charge of data processing are (often) committed to preserving the baseline privacy of users as defined by law. Such baseline privacy, however, might not be what the user (i.e., the data subject) wishes for, given a contextual situation. We argued that privacy preferences, for example in the form of privacy policies, must be propagated to and complied with by all parties in the data processing chain. This requires a mechanism to inform downstream data controllers/processors (which are not directly connected to the data subject) about individuals’ privacy preferences. Knowing such preferences, the downstream data processors can take the required measures to comply with these preferences. We further argued that, in addition to this forward propagation of user privacy preferences and compliance with these preferences, the data processors should inform the upstream data controllers/processors about the fact that their information items can be used for inferring more privacy sensitive information about the user than it is allowed in the privacy preferences/policy. This awareness will enable the upstream data controllers/processors to revisit their anonymisation processes in light of the (potential) privacy breaches observed. We suspect that there should be a legal requirement for reporting such privacy breaches. Also the feedback mechanism suggested, we foresee, may have adverse effect on the Open Data initiatives. How to realise the proposed features and to measure their impacts (e.g., on system performance, Open Data initiatives, laws and legislations) is for future research.

References Bargh, M.S., Choenni, R., Mulder, I. and Pastoor, R. (2012) “Exploring a Warrior Paradigm to Design out Cybercrime”, Proceedings of Intelligence and Security Informatics Conference (EISIC’12), Odense, Denmark, 22-24 August, pp. 8490. Boyd, D. (2010) “Privacy and Publicity in the Context of Big Data”, [Online], http://www.danah.org/papers/talks/2010/WWW2010.html. Braak, S. van den, Choenni, S., Meijer, R. and Zuiderwijk, A. (2012) “Trusted Third Parties for Secure and Privacy-Preserving th Data Integration and Sharing in the Public Sector”, Proceedings of the 13 Annual International Conference on Digital Government Research, June, USA; pp. 135-144. Cavoukian, A. (2009) “Privacy by Design, the 7 Fundamental Principles”, [online], http://www.ipc.on.ca/english/Resources/Discussion-Papers/Discussion-Papers-Summary/?id=883. Cavoukian, A. and Jonas, J. (2012) “Privacy by Design in the Age of Big Data”, [Online], http://privacybydesign.ca/content/uploads/2012/06/pbd-big_data.pdf. Chadwick, D.W. and Lievens, S.F. (2008) “Enforcing Sticky Security Policies Throughout a Distributed Application”, Proceedings of ACM Workshop on Middleware security (MidSec), pp. 1-6. Choenni, S., van Dijk, J. and Leeuw, F. (2010) “Preserving Privacy whilst Integrating Data: Applied to Criminal Justice”, Information Polity, 15(1), pp. 125-138. Choenni, S. and Leertouwer, E. (2010) “Public Safety Mashups to Support Policy Makers”, Proceedings of Electronic Government and the Information Systems Perspective (EGOVSI), pp. 234-248. Dennett, J. (2013) “Inquiry into Privacy Amendment (Privacy Alerts) Bill 2013”, [Online], http://www.oaic.gov.au/newsand-events/submissions/privacy-submissions/inquiry-into-privacy-amendment-privacy-alerts-bill-2013#ftn21. Dwork. C. (2009) “The Differential Privacy Frontier (Extended Abstract).” In Theory of Cryptography, Lecture Notes in Computer Science, Springer, pages 496–502. th Golle, P. (2006) “Revisiting the Uniqueness of Simple Demographics in the US Population”, Proceedings of the 5 ACM workshop on Privacy in electronic society (WPES), October, pp. 77-80. Gürses, S., Troncoso, C. and Diaz, C. (2011), “Engineering Privacy by Design”, In Computers, Privacy & Data Protection. IOC (2013a) “Information Commissioner’s Office, Key Definitions of the Data Protection Act”, [Online], http://ico.org.uk/for_organisations/data_protection/the_guide/key_definitions. IOC (2013b) “Information Commissioner’s Office, What is Personal Data? A Quick Reference Guide”, [Online]:http://www.ico.org.uk/upload/documents/library/data_protection/detailed_specialist_guides/160408_v1.0 _determining_what_is_personal_data_-_quick_reference_guide.pdf. IOC (2013c) “Information Commissioner’s Office, Identifying ‘Data Controllers’ and ‘Data Processors’ Data Protection Act 1998”, [Online] http://www.ico.org.uk/~/media/documents/library/Data_Protection/Detailed_specialist_guides/data_controllers_an d_data_processors.pdf.

Mortaza S. Barghand Sunil Choenni Kalidien, S., Choenni S. and Meijer, R. (2010) “Crime Statistics On Line: Potentials and Challenges”, In Public Administration Online: Challenges and Opportunities, proceedings of the 11th Annual International Conference on Digital Government Research, Puebla, Mexico, May 17-20, pp. 131-137. Krishnan, R., Sandhu, R., Niu, J., and Winsborough, W.H. (2009) “Foundations for Group-Centric Secure Information Sharing th Models”, Proceedings of the 14 ACM symposium on Access control models and technologies, pp. 115-124. News-item (2013) “Wetsvoorstel meldplicht datalekken naar Tweede Kamer”, [Online], http://www.rijksoverheid.nl/nieuws/2013/06/21/wetsvoorstel-meldplicht-datalekken-naar-tweede-kamer.html. Nissenbaum, H. (2004) “Privacy as Contextual Integrity”, Washington Law Review, 79(1), pp. 101–158. Rechtspraak (2013a), Website of the Dutch Judiciary and the Supreme Court of the Netherlands, [Online], http://zoeken.rechtspraak.nl/default.aspx. Rechtspraak (2013b), “Anonymization Guidelines”, [Online], http://www.rechtspraak.nl/Uitspraken-enRegisters/Uitspraken/Anonimiseringsrichtlijnen/Pages/default.aspx. Takabi, H., Joshi, J.B. and Ahn, G.J. (2010) “Security and Privacy Challenges in Cloud Computing Environments”, IEEE Security & Privacy, 8(6), pp. 24-31.

An Analysis of the Use of Amazon’s Mechanical Turk for Survey Research in the Cloud Marc Dupuis1, Barbara Endicott-Popovsky2 and Robert Crossler3 1 Institute of Technology, University of Washington, Tacoma, United States of America 2 The Information School, University of Washington, Seattle, United States of America 3 College of Business, Mississippi State University, Starkville, United States of America marcjd@uw.edu endicott@uw.edu rob.crossler@msstate.edu Abstract: Survey research has been an important tool for information systems researchers. As technologies have evolved and changed the manner in which surveys are administered, so have the techniques employed by researchers to recruit participants. Crowdsourcing has become a common technique to recruit participants for different kinds of research, including survey research. This paper examines the role of one such crowdsourcing platform, Amazon’s Mechanical Turk (MTurk). MTurk allows everyday people to create an account and become a worker to perform various tasks, called HITs (human intelligence tasks). HITS are posted by requestors, which may be researchers, corporations, or other entities that have generally simple tasks that can be performed through crowdsourcing. We examine MTurk in the context of five different surveys conducted using the MTurk platform for the recruitment phase of the studies. Our discussion includes both practical things to consider when using the platform and an analysis of some of our findings. In particular, we explore the use of qualifiers in conducting studies, things to consider in both longitudinal and cross-cultural studies, the demographics of the MTurk population, and ways to control and measure quality. Although MTurk participants do not perfectly represent the U.S. population from a demographics standpoint, we found that they provide good overall diversity on several key indicators. Furthermore, this diversity that MTurk samples provide will often be as good, if not better, than the typical participants recruited for research (e.g., college sophomores). Similar to most recruitment methods, using MTurk to conduct research does have its drawbacks. Nonetheless, the evidence does not indicate that these drawbacks are either significant enough to preclude the use of such a platform, or in any way more significant than the drawbacks associated with other techniques. In fact, quality is generally high, the cost is low, and the turnaround time is minimal. We do not suggest that MTurk should replace other techniques for participant recruitment, but rather that is deserves to be part of the discussion. Keywords: crowdsourcing, surveys, Mechanical Turk, cloud

1. Introduction rd

Survey research has a long and rich history that can be traced back to the censuses of the Old Kingdom (3 millennium BC) in ancient Egypt (Janssen 1978). Administration of surveys during this time period occurred inperson, but new capabilities and technologies that emerged in the twentieth century led to additional administration techniques. For example, the use of mail and telephones to administer surveys became the norm (Armstrong and Overton 1977; Fricker et al. 2005; Kaplowitz et al. 2004; Kempf and Remington 2007). Toward the end of the century, we would again see the use of new administration techniques based on emerging technologies. The Internet allowed for the use of email to collect survey data, followed shortly thereafter with web-based administration of surveys (Krathwohl 2004; Schutt 2012; Sheehan 2001). Along with these shifts in administration techniques, there have also been new methods employed to recruit participants within any one of these techniques. Most recently, this has included crowd-sourcing to recruit participants to complete surveys on the Web (Howe 2006; Kittur et al. 2008; Mahmoud et al. 2012; Ross et al. 2010). This paper examines the use of a particular crowdsourcing platform to perform this type of research, Amazon’s Mechanical Turk (MTurk). In particular, we will begin by discussing some of the background literature. This is followed by an examination of using the MTurk platform in practice, along with some analysis from several studies. Finally, some concluding thoughts will be given on the use of MTurk to conduct research in the cloud. The paper makes an important contribution by further exploring the role the MTurk platform may play in research. It expands on earlier research in this area by providing an up-to-date analysis of current trends, demographics, and uses of MTurk as a tool for researchers. Survey research has been an incredibly important tool for researchers, including within the information systems domain (Anderson and Agarwal 2010; Atzori et

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler al. 2010; Burke 2002; Chen and Kotz 2000; Crossler 2010; LaRose et al. 2008; Liang and Xue 2010; Zeng et al. 2009). Thus, it is an important tool to examine in further in the context of cloud security research.

2. Background literature Traditionally, individuals and organizations interested in having work performed in exchange for compensation were limited by the relatively small marketplace available to them. Methods that could be employed to expand the available marketplace were often expensive, time-consuming, and not always practical. However, the Internet and its broad spectrum of users have changed this dynamic considerably. It has made the expansion of the marketplace often cost effective, quick, and practical. In this section, we discuss crowdsourcing in general and the use of Amazon’s Mechanical Turk (MTurk) in particular.

2.1 Crowdsourcing According to Mason and Watts (2010), in crowdsourcing “potentially large jobs are broken into many small tasks that are then outsourced directly to individual workers via public solicitation” (p. 100). Crowdsourcing has become quite popular in recent years due primarily to the possibilities the Internet provides for individuals and organizations. An early example of the power of crowdsourcing is iStockphoto. Stock photos that used to cost hundreds of dollars to license from professionals could often be had for no more than a dollar a piece (Howe 2006). Rather than just professionals contributing images to the site, students, homemakers, and other amateurs would contribute several images to earn some extra money. The condition that allows crowdsourcing to work so effectively is that many of the workers perform tasks during their spare time. In other words, it is generally not their main source of income, but rather supplements other possible income sources. These workers are not limited to a few dollars at a time either. Since 2001, corporate R&D departments have been using Eli Lilly’s InnoCentive to find intellectual talent that can solve complex problems that have been stumping their own people for a while (Howe 2006). These solvers, as they are called, may earn anywhere from $10,000 to $100,000 per solution. While these types of workers may be limited to those with the requisite skills and talent, MTurk is available to the masses.

2.2 Amazon’s Mechanical Turk MTurk allows everyday people to create an account and perform various tasks as workers. These tasks are called HITs (human intelligence tasks) and are posted by requesters (Mason and Watts 2010). HITs usually pay anywhere from $0.01 to a few dollars each, generally depending on the skill level required and the amount of effort involved. The opportunities for researchers are great, but there are naturally several questions that arise, such as: demographics, quality, and cost. 2.2.1 Demographics First, the composition of any subject pool is always of great interest to the researcher. The MTurk workers do represent a special segment of the population; namely, those that have Internet access and are willing to complete HITs for minimal pay. However, this is true of any participants that agree to participate in social science research (Horton et al. 2011). MTurk participants are also generally younger than the population they are meant to represent, although their age is generally more representative than what may often be found in university subject pools (Paolacci et al. 2010). Additionally, U.S. workers are disproportionately female, while workers from India are disproportionately male (Horton et al. 2011; Ipeirotis et al. 2010; Paolacci et al. 2010). Nonetheless, they are generally comparable to other populations often recruited for research, including Internet message boards (Paolacci et al. 2010). Beyond age and gender, MTurk participants also represent a diverse range of income levels (Mason and Watts, 2010). In a study comparing MTurk participants with other Internet samples, Buhrmester, Kwang, and Gosling (2011) found that “MTurk participants were more demographically diverse than standard Internet samples and significantly more diverse than typical American college samples” (p. 4). Thus, the demographics of MTurk participants are comparable to other types of samples often used, and in some instances they may be superior.

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler 2.2.2 Quality If demographics are not an issue in using MTurk participants, what about the quality of the results? Interestingly, quality has not been a major limiting factor in using MTurk for research purposes. In some instances, quality may be better than university subject pools and those recruited from Internet message boards. One way to measure quality is to devise one or more test questions, also called “catch trials”. These types of questions have obvious answers that anyone paying adequate attention to the wording should be able to answer correctly with ease. In one study, MTurk participants had a failure rate on the catch trials of 4.17 percent, compared to 6.47 and 5.26 percent for university subject pool participants and Internet message boards participants, respectively (Paolacci et al., 2010, p. 416). A majority of incorrect answers can generally be associated with only a small subset of participants rather than widespread gaming of the system (Kittur et al., 2008). Likewise, their survey completion rate for MTurk participants (91.6 percent) was significantly higher than those recruited from Internet message boards (69.3 percent) and not that far behind university subject pool participants (98.6 percent). Additionally, quality is not impacted by either the amount paid for a HIT or the length of time required to complete the task, although both of these factors will impact how long it takes to recruit a given number of participants (Buhrmester et al., 2011; Mason and Watts, 2010). Finally, another important test for quality is the psychometric properties of completed surveys. Similar to the other measures for quality, MTurk participants had absolute mean alpha levels in the good to excellent range (Buhrmester et al., 2011, pp. 4–5). This was true for all compensation levels and for all scales. Test-retest reliability was also very high with a mean correlation of 0.88. All of this is comparable to traditional methods and suggests that the psychometric properties of survey research from MTurk participants are acceptable for academic research purposes. 2.2.3 Cost The final question that we will address from the background literature relates to cost. Is MTurk a cost-effective solution for conducting academic research? Multiple studies have demonstrated that the cost to use MTurk is not only reasonable, but quite low (Buhrmester et al., 2011; Horton et al., 2011; Kittur et al., 2008; Mason and Watts, 2010; Paolacci et al., 2010). In one instance, a HIT that paid only $.01 to answer two questions received 500 responses in 33 hours (Buhrmester et al., 2011). In another instance, MTurk participants were compared with offline lab participants. Whereas offline lab participants received a $5 show-up fee, their MTurk counterparts received only $0.50 (Horton et al., 2011). Thus, MTurk provides an opportunity for researchers to perform research on the Web at often a fraction of the cost of traditional methods. Quality and demographics are comparable to these other methods, while the speed at which data can be collected is generally superior. In the next section, we will continue this discussion with an examination of results from recent studies we conducted using the MTurk platform.

3. MTurk in practice: considerations and analysis There are many things to consider when using MTurk. This section is not meant to be exhaustive, but rather will touch on a few important considerations while using the MTurk platform. Additionally, the MTurk platform offers a powerful API that provides additional capabilities. These capabilities often go beyond what can be done through the Web platform. However, for many researchers this may not prove practical if one does not have the requisite technical skills. Therefore, the focus here will be on what can be done using the Web interface on MTurk, possible workarounds, and inherent limitations. This will be based on multiple HITs the authors have posted over the past 12 months.

3.1 Recruitment Using MTurk to recruit participants is relatively straightforward. After an account is created, you must fund the account with an amount adequate to cover both the compensation amount you plan on providing to participants and Amazon’s fee, which is 10 percent of the total amount paid to the workers (Amazon, 2013). For example, if the goal is to recruit 300 participants at $0.50 each, then the account should be pre-funded $165. It is a good idea to build some flexibility into how much you fund your account so that you can easily and quickly adjust the amount paid per HIT, if necessary.

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler Another important consideration is on the description provided for the project. The description should be clear, accurate, short, and simple (Amazon Web Services LLC, 2011). Furthermore, if a survey is being conducted then it should be emphasized how short the survey is or how quick it will be to complete it. This is important given that MTurk provides a marketplace in which the primary goal is for each worker to maximize total income as a product of his or her time. The use of ambiguous terms (e.g., few) is less likely to be effective than explicit descriptions (e.g., 5 multiple-choice questions). Once the requestor has created a project, a batch must be created for that project before workers are able to view it. Batches may be extended longer or ended early.

3.2 Compensation and time In a qualification survey we conducted, we found that price was a large factor in how quickly assignments were completed. After it was determined that the rate was too low, the amount paid for the HIT was increased. This was done multiple times with each subsequent HIT available only to those that did not complete one of the earlier ones. Below is a table that illustrates the amount paid, total time available, average time per assignment, and number of completed HITs. Table 1: Qualifying survey HIT results HIT # 1 2 3 4

Compensation $0.05 $0.11 $0.12 $0.13

Time Available ≈ 2 hours ≈ 1 day, 9 hours ≈ 1 day ≈ 19 hours

Average Time Per Assignment 1 minute, 29 seconds 2 minutes, 5 seconds 2 minutes, 9 seconds 2 minutes, 25 seconds

Completed 10 368 106 200

A couple of quick observations are worth noting. First, the average time per assignment increased as the compensation increased (r = 0.985; p < .05). Second, it appears that some workers may have a specific price point that must be met prior to completing a HIT. The number of completed assignments at each price point was expected to be higher, but the wording of the HIT may have played a role in not attracting a greater number of workers. In another HIT administered August of 2012 to U.S. residents, the worker was required to complete a relatively long survey that was estimated to take between 15 and 20 minutes. Workers were paid $0.75 for completing the HIT. Results from 303 workers were obtained in approximately 12 hours with an average time per assignment of 10 minutes and 39 seconds. This suggested that the amount paid could be lower. In July of 2013, a HIT for a survey that was similar in length and also limited to U.S. residents was created. Workers were paid $0.50 for this HIT and it was completed in approximately five hours. The average time per assignment was nine minutes and 28 seconds. An identical HIT was created for residents of India with an average time per assignment of nine minutes and 27 seconds, virtually identical to the U.S. population.

3.3 Qualifications When creating the project, you may also specify certain predetermined qualifiers, which allows the requestor to limit the HITs only to those meeting the qualifications, such as location (e.g., United States). The location qualifier allows you to choose the country, but does not provide any additional granularity (e.g., state). Other built-in qualifiers include the number of HITs completed and the acceptance rate. In addition to predetermined qualifiers, requestors (i.e., the researcher) may also create certain qualifications and assign scores between 0 and 100. You can have workers complete a short survey through the MTurk platform or an external survey platform (e.g., Qualtrics, Survey Gizmo, Survey Monkey) and update worker qualifications based on this data. The most efficient method to do this using the Web-based MTurk platform involves creating the qualifications and then downloading the worker CSV (comma separated values) file that contains historical information on all of your workers. The CSV file contains requestor-created qualifications, columns with current values for requestor-created qualifications, as well as columns to update the qualifications. The requestor can then update the CSV file and upload it back to the system, which Amazon will process and update accordingly. Qualifications can be assigned manually through the Web-based platform, but

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler for larger numbers of workers for which one would like to assign qualifications to, this becomes quite impractical.

3.4 Quality Quality is an important concern for any researcher. The primary test for quality that will be analyzed here involves the use of quality control questions, also referred to as catch trials. Five studies were conducted that included a quality control question. Studies four and five were follow-up surveys to studies two and three and only those that passed the quality control question in the earlier survey were eligible to complete the second survey, which was administered approximately five weeks later. The table below illustrates these results. Table 2: Quality control question failure rate Study 1 2 3 4 5

Population U.S. U.S. India U.S. India

Total Number of Submissions 303 170 212 110 131

Failure Rate 2.31% 8.82% 29.25% 2.73% 15.27%

A couple of observations are worth noting. First, the failure rate for residents from India is very high in general and in comparison to residents from the U.S. (r = 0.899; p < .05). It is unclear why the failure rate is so much higher. Possibly, language or cultural factors may play a role. Second, while the failure rate for U.S. residents is still relatively low, it is unclear why this rate jumped considerably from study one, conducted in August of 2012, to the second study. The quality control questions were slightly different and the HIT paid a different amount. It is unclear if either of these factors contributed to the difference in failure rates.

3.5 Longitudinal studies Conducting longitudinal studies is an important method for many different types of research. The discussion earlier on assigning qualifications to workers is necessary for fixed-sample longitudinal studies. Basically, those that successfully complete the first phase of the study are given a qualification for future phases. Then, when creating subsequent projects on the MTurk platform, the requestor will limit the HIT to only those with the requisite qualification. This was done for a longitudinal study examining the Edward Snowden situation. However, workers will not necessarily see the HIT as anything special. In other words, they will generally be as likely to view the HIT as any other HIT. In our follow-up survey, we increased the pay from $0.50 to $0.65. Workers from India responded quite well, which may be due in part to the relatively high value the HIT pays as well as possibly fewer HITs available overall to residents of India. Approximately 85 workers from India completed the second survey within 48 hours compared to only 18 from the U.S. In a comparative cross-cultural longitudinal study, these numbers become quite problematic. The API and associated tools do provide a mechanism to send an email to workers. However, this was not as simple using the Web-based MTurk platform. Email messages could be sent through the platform, but involved navigating to the project that contained the results from phase one, filtering out those that did not pass the quality control question, and finally emailing only those that had not yet completed phase two. Nonetheless, completing this process helped immensely. Messages were sent to the appropriate workers informing them of the HIT with the URL. The number of U.S. participants increased from approximately 18 to over 90 within 48 hours. Likewise, the number of participants from India increased from approximately 85 to 120. A second message was sent to both groups of workers, which resulted in a total of 110 submissions from the U.S. and 131 from India. Therefore, it is possible to conduct longitudinal research using the MTurk platform, but extra steps may be required to obtain an acceptable number of responses. Other longitudinal designs that do not require a fixed-sample would alleviate some of these issues, but would also introduce new ones in the process.

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler

3.6 International populations Throughout the discussion, we have examined specific facets of using MTurk as a research platform. Within this discussion, has been some comparison between participants from the U.S. and those from India. In summary, participants from India have a significantly higher failure rate on quality control questions, but are also more likely to participate in subsequent phases of a study when a fixed-sample longitudinal design is employed. Originally, workers from other countries were sought in addition to the U.S. and India to participate in the Snowden study. However, it became untenable when only one worker from the U.K. completed the HIT for the first phase of the study over a 16 hour time-period. The amount paid for the HIT was increased from $0.50 to $0.75, but no additional workers participated. Likewise, there was only one worker from Sweden that responded to the survey over a 50-hour time-period. Thus, consistent with other research that has used MTurk, most workers are from the U.S. and India (Horton et al. 2011; Ipeirotis et al. 2010). While the studies examined here did not rule out other countries (e.g., Canada), the evidence at least suggests that requestors will be the most successful when recruiting workers from either India or the U.S.

3.7 Demographics Earlier, we discussed the demographics of the MTurk workers based on prior research. In this section, we will revisit the issue of demographics and examine the composition of MTurk workers from four different studies we conducted with a more in-depth analysis of workers that completed the qualifying survey. In the table that follows, we look at the age, gender, and educational attainment levels of MTurk workers from four different studies. What may be the most interesting are the changes in gender distribution between males and females. Other research has generally found a greater percentage of female workers than males (Horton et al. 2011; Ipeirotis et al. 2010; Paolacci et al. 2010), but this is not found in studies four and five. It may be due to the limited time the HIT was available. In other words, females may be more apt to complete a HIT during certain hours of the day, while males may be more apt to do so during other hours of the day. The qualifying survey shows that approximately 18.4 percent of MTurk workers are unemployed, retired, or a homemaker and 10.3 percent are students. The study that would largely mitigate this possible effect is the qualifying survey due to the length of time it was available and the overall sample size. While the gender distribution is not heavily male as it is in studies four and five, it is also not heavily female as it has been in other studies and as is reflected in study one. In fact, the ratio is almost 50:50 and largely consistent with the U.S. population as a whole. Table 3: Age, gender, and educational attainment levels U.S. Sample, Study 1

U.S. Sample, Study 4

India Sample, Study 5

Qualifying Survey

U.S. Population

Sample Size

296

101

107

702

Time Period

August 2012

August 2013

July 2013

2011/2012

18-29

41.55%

37.62%

41.12%

48.4%

22.0%

30-39

25.34%

30.69%

40.19%

25.7%

17.0%

40-49

16.89%

14.85%

8.41%

12.2%

18.2%

50-59

11.49%

9.90%

4.67%

10%

18.1%

60+

4.73%

6.93%

5.61%

3.6%

24.7%

Source: U.S. Census Bureau, 2011/2012

Age

Gender

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler

Male

42.6%

63.37%

64.49%

50.4%

49.1%

Female

57.1%

36.63%

35.51%

49.3%

50.9%

Some High School

N/A

0.99%

.93%

1.1%

8.58%

High School (or GED)

N/A

8.91%

.93%

11.1%

30.01%

Some College

N/A

32.67%

9.35%

29.4%

19.46%

College Graduate

N/A

47.52%

57.01%

48.8%

27.59%

N/A

6.93%

31.78%

7.7%

8.4%

N/A

2.97%

1.9%

1.36%

Education

Masterâ&#x20AC;&#x2122;s / Professional Degree Doctorate

If we further examine the data from the qualifying survey, we can explore a few additional demographic indicators. For example, the pie chart that follows illustrates the diversity in household income levels from MTurk workers. As we can see, there is a broad spectrum of household income levels represented.

Figure 1: Annual household income distribution Finally, we will examine the geographic distribution of MTurk workers. A table that compares the geographic distribution of the U.S. population with the MTurk workers that completed the qualifying survey follows. Table 4: Regional distribution of U.S. MTurk workers Region Northeast Midwest South West

Qualifying Survey 19.4% 24.7% 32.2% 23.6%

U.S. Population 18.01% 21.77% 36.91% 23.31%

The geographic distribution of MTurk workers in the U.S. closely resembles that of the U.S. population. To the extent the MTurk workers do not closely resemble the U.S. population, they do nonetheless provide a much greater degree of diversity on key demographic indicators than the typical college sophomore (Sears 1986).

Marc Dupuis, Barbara Endicott-Popovsky and Robert Crossler

4. Conclusion The MTurk platform is a relatively new method that can be used to recruit participants in not only survey research, but several other types of research activities as well. Similar to most recruitment methods, using MTurk to conduct research does have its drawbacks (e.g., incentives, motivation, quality) (Horton et al., 2011; Mason and Watts, 2010). Nonetheless, the evidence does not indicate that these drawbacks are either significant enough to preclude the use of such a platform, or in any way more significant than the drawbacks associated with other techniques (Buhrmester et al. 2011; Gosling et al. 2004; Sears 1986). In fact, quality is generally high, the cost is low, and the turnaround time is minimal. We do not suggest that MTurk should replace other techniques for participant recruitment, but rather that is deserves to be part of the discussion.

References Amazon, 2013. Amazon Mechanical Turk: Requestor FAQs [WWW Document]. URL https://requester.mturk.com/help/faq#how_fees_cost_computed (accessed 5.5.13). Amazon Web Services LLC, 2011. Requestor Best Practices Guide [WWW Document]. URL http://mturkpublic.s3.amazonaws.com/docs/MTURK_BP.pdf (accessed 4.15.13). Anderson, C.L., Agarwal, R., 2010. Practicing Safe Computing: A Multimethod Empirical Examination of Home Computer User Security Behavioral Intentions. MIS Q. 34, 613–643. Armstrong, J., Overton, T., 1977. Estimating nonresponse bias in mail surveys. J. Mark. Res. 14, 396–402. Atzori, L., Iera, A., Morabito, G., 2010. The internet of things: A survey. Comput. Networks 54, 2787–2805. Buhrmester, M., Kwang, T., Gosling, S.D., 2011. Amazon’s Mechanical Turk A New Source of Inexpensive, Yet High-Quality, Data? Perspect. Psychol. Sci. 6, 3–5. Burke, R., 2002. Hybrid recommender systems: Survey and experiments. User Model. User-Adapt. Interact. 12, 331–370. Chen, G., Kotz, D., 2000. A survey of context-aware mobile computing research. Technical Report TR2000-381, Dept. of Computer Science, Dartmouth College. Crossler, R.E., 2010. Protection Motivation Theory: Understanding Determinants to Backing Up Personal Data. Koloa, Kauai, Hawaii, p. 10. Fricker, S., Galesic, M., Tourangeau, R., Yan, T., 2005. An Experimental Comparison of Web and Telephone Surveys. Public Opin. Q. 69, 370–392. Gosling, S.D., Vazire, S., Srivastava, S., John, O.P., 2004. Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. Am. Psychol. 59, 93. Horton, J.J., Rand, D.G., Zeckhauser, R.J., 2011. The online laboratory: Conducting experiments in a real labor market. Exp. Econ. 14, 399–425. Howe, J., 2006. The Rise of Crowdsourcing. Wired 14. Ipeirotis, P.G., Provost, F., Wang, J., 2010. Quality management on Amazon Mechanical Turk, in: Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM, Washington DC, pp. 64–67. Janssen, J., 1978. The Early State in Ancient Egypt, in: Claessen, H.J., Skalník, P. (Eds.), The Early State. Walter de Gruyter, pp. 213–234. Kaplowitz, M.D., Hadlock, T.D., Levine, R., 2004. A comparison of web and mail survey response rates. Public Opin. Q. 68, Kempf, A.M., Remington, P.L., 2007. New Challenges for Telephone Survey Research in the Twenty-First Century. Annu. Rev. Public Health 28, 113–126. Kittur, A., Chi, E.H., Suh, B., 2008. Crowdsourcing user studies with Mechanical Turk, in: Proceedings of the Twenty-sixth Annual SIGCHI Conference on Human Factors in Computing Systems. ACM, Florence, Italy, pp. 453–456. Krathwohl, D., 2004. Methods of educational and social science research : an integrated approach, 2nd ed. ed. Waveland Press, Long Grove Ill. LaRose, R., Rifon, N.J., Enbody, R., 2008. Promoting Personal Responsibility for Internet Safety. Commun. ACM 51, 71–76. Liang, H., Xue, Y., 2010. Understanding Security Behaviors in Personal Computer Usage: A Threat Avoidance Perspective. J. Assoc. Inf. Syst. 11, 394–413. Mahmoud, M.M., Baltrusaitis, T., Robinson, P., 2012. Crowdsourcing in emotion studies across time and culture, in: Proceedings of the ACM Multimedia 2012 Workshop on Crowdsourcing for Multimedia. ACM, Nara, Japan, pp. 15–16. Mason, W., Watts, D.J., 2010. Financial incentives and the performance of crowds. ACM SigKDD Explor. Newsl. 11, 100– Paolacci, G., Chandler, J., Ipeirotis, P., 2010. Running experiments on amazon mechanical turk. Judgm. Decis. Mak. 5, 411– Ross, J., Irani, L., Silberman, M., Zaldivar, A., Tomlinson, B., 2010. Who are the crowdworkers?: shifting demographics in mechanical turk, in: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems. ACM, pp. 2863–2872. Schutt, R., 2012. Investigating the Social World : The Process and Practice of Research, 7th ed. Sage Publications, Thousand Oaks, California. Sears, D.O., 1986. College sophomores in the laboratory: Influences of a narrow data base on social psychology’s view of human nature. J. Pers. Soc. Psychol. 51, 515. Sheehan, K.B., 2001. E-mail Survey Response Rates: A Review. J. Comput.-Mediat. Commun. 6. Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S., 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. Pattern Anal. Mach. Intell. IEEE Trans. 31, 39–58

Extracting and Visualizing Relevant Data From Internet Traffic to Enhance Cyber Security Teresa Escrig, Jordan Hanna, Shane Kwon, Andrew Sorensen, Don McLane and Sam Chung Institute of Technology, University of Washington, Tacoma, WA, USA mtescrig@uw.edu jhanna01@uw.edu shanekw@uw.edu andrewx@uw.edu dmclane@uw.edu chungsa@uw.edu

Abstract: Society is facing an increasingly severe problem, the lack of cyber security. Cyber security is a very hard, multifaceted problem. To be effective, cyber security solutions need to: a) Obtain automatic knowledge from data to support automatic decision processes and predictions; b) Process large amounts of network traffic in real-time; c) Include intelligence, to automatically identify known and unknown attacks; d) Provide evidence or explanation on the decision that a suspicious Internet activity corresponds to a new unknown attack; and e) Be scalable, meaning that the solution provided for a part of the Internet system, should be able to be straightforward extended to provide a solution for the whole system. To provide the intelligence necessary for cyber security solutions, we argue that the type of knowledge needed is common sense, which is what people use to encode experiences in life. Qualitative Models have been demonstrated to be the best approach to model common sense knowledge, by transforming data with uncertainty and incompletion into knowledge. In this article we introduce a novel approach, which integrates Qualitative Models (representation, reasoning and learning) to automate cyber security analysis. Our final goal is to develop a scalable automatic analysis of a particular aspect of network traffic, with the intent of detecting and preventing attacks. Keywords: Cyber security, common sense knowledge, graph databases, data visualization, ontologies, qualitative models, system architecture, automatic analysis

1. Introduction The Internet's takeover of the global communication landscape has been almost instant in historical terms (slightly over a decade): the term “Internet” was firstly defined in 1995; in 2000 it accounted for 51% of the information flowing through two-way telecommunication networks; and in 2007 more than 97% (Wikipedia 2013). This fast growth has also created a huge problem for cyber security. Due to its urgency, the cyber security landscape consists of an ad hoc patchwork of solutions with no global satisfactory result. The current solutions have failed to prevent cybercrime or fraud losses, which amount to billions of dollars each year. Computer attacks against the Pentagon currently average 5,000 each day, and President Obama has declared that the “Cyber-threat is one of the most serious economic and security challenges we face as a nation” (WhiteHouse.gov, 2013). Existing security tools provide marginal protection “at best”, i.e. if they are correctly used. Security management is in a state of profound change. Cyber security is a very hard and multifaceted problem (Hively, 2011). There is an obvious need for a holistic solution to the cyber security problem, which includes greater system intelligence. Artificial Intelligence (AI) seems to be the research field that can provide such a holistic solution (Morel, 2011; Pradhan, 2012). However, AI has extensively been used for cyber security for over 20 years (Pradhan, 2012). An Intrusion Detection System (IDS) monitors the events that take place in a computer system or computer network, and analyzes them for signs of attempts to compromise information security, or to bypass the security mechanisms of a computer or network. AI has been used for both approaches of IDS - Anomaly Detection (AD) and Misuse Detection (MD) (Pradhan, 2012). For AD, several AI methods have been used: statistical models (Javitz, 1993; Bace, 2000), expert systems (Axelsson, 1999), neural networks (Henning, 1990), and others. For MD, other AI methods have been used: rule-based languages (Lindqvist, 1999), Petri automata (Kumar, 1994), and genetic algorithms (Ludovic, 1998). Although current AI techniques used in IDS provide good results for some aspects of the problem, they lack in other areas. Current techniques are not scalable, do not focus on the problem as a whole, and some act as "black boxes," in the sense that they provide solutions with no explanation to help justify decisions or report results. Moreover, the techniques used in an IDS are passive, and do not do anything to stop attacks.

Teresa Escrig, Jordan Hanna, Shane Kwon With the intent of overcoming some of the IDS drawbacks, the emerging technology of Intrusion Prevention Systems (IPS) is appearing. An IPS is proactive and functions as radar to monitor the stream of network traffic, where it detects, identifies and recognizes patterns of security violation. Furthermore, unlike IDS, it is capable of preventing the attack before it happens (Stiawan, 2012). To be effective, cyber security solutions need to: 

Obtain automatic knowledge from data. Converting raw data into information (data in the context of other data) and hence into knowledge (information in the context of other information), is critical to support automatic decision processes and predictions. Knowledge-based decisions cannot process arbitrary instructions and therefore are not hackable (Hively, 2011).



Process network traffic Big Data in real-time. The current approach of storing raw network traffic data for later off-line processing doesn't provide the needed real-time attack detection.



Include intelligence to automatically identify not only known attacks, but also suspicious activity that might correspond to new attacks.



Provide evidence or explanation on the decision that a suspicious Internet activity corresponds to a new unknown attack. A solution based on a “black box” is not acceptable.



Scalable, meaning that the solution provided for a part of the Internet system should be straightforwardly extended to provide a solution for the whole system.

Experts at CMU, such as Dr. Morel (Morel, 2011) argue that cyber security calls for new and specific AI techniques developed with the cyber security application in mind. He advocates for the use of: 

Knowledge Based Systems (representation)



Probabilistic (reasoning)



Bayesian (learning)

Representation, reasoning and learning are indeed the basic principles of human intelligence, and therefore necessary to provide a holistic solution to cyber security. However, probabilistic and Bayesian models have been extensively used in other AI areas, such as Robotics and Computer Vision, with very good initial results, that have not been scalable. The issues with probabilistic approaches are twofold: 

It is a brute force method with high computational cost.



No common sense, or any other cognitive approach, is being used to make sense of the numbers. Therefore, after the first initial promising results, further improvements are limited.

As it happened in the Robotics field, we claim that the type of knowledge that the cyber space needs to obtain is common sense knowledge, the one used by people in their daily life. Contra intuitively, common sense knowledge is more difficult to model than expert knowledge, which can be quite easily modeled by expert systems. The concept of common sense knowledge is introduced in Section 2. Qualitative Models have been demonstrated to be the best approach to model common sense knowledge (Hernandez, 1994; Štěpánková, 1992; Freksa, 1991; Dague, 1995; Escrig, 2005; Peris, 2005), by transforming incomplete and uncertain data from the environment into knowledge. The key concepts of qualitative models are introduced in Section 3. In order to automate the analysis of the cyber security domain of application, one of the first actions we need to do is to “speak the same language” to be able to organize, search, compare and interchange information. We need to use standards for common enumeration, expression and reporting. We need to have in place vocabulary management and data interoperability to be able to handle the complexity of the cyber security domain. We need ontologies. The key concepts of ontologies are introduced in Section 4. In this article we introduce a novel approach, which integrates Qualitative Models (representation, reasoning and learning) to Automate Cyber Security Analysis, as it is introduced in Section 5.

Teresa Escrig, Jordan Hanna, Shane Kwon Our final goal is to develop a scalable automatic analysis of a particular aspect of network traffic, with the intent of preventing and stopping attacks before they happen, or at the very least to learn from the ones that do happen, so they do not occur a second time.

2. Common sense knowledge A significant part of common sense knowledge encodes our experience with the physical world we live in (Štěpánková, 1992; Shapiro, 1987; Kuipers, 1979; Davis 1987). Common sense is defined as "the common knowledge that is possessed by every schoolchild and the methods for making obvious inferences from this knowledge" (Davis, 1987). Common sense reasoning relaxes the strongly mathematical formulation of physical laws. The aquarium metaphor (Freksa, 1991) illustrates the essence of commonsense: two people situated close together are looking at an aquarium and they try to speak about how wonderful the fish are; they need to identify the fish by their relative position (Figure 1).

Figure 1: The aquarium metaphor to describe common sense knowledge The aquarium metaphor emphasizes a few issues, which are present in all spatial perception, representation, and identification situations (Werthner, 1994): 

The perceptual knowledge is necessarily limited with regard to resolution, features, completeness, and certainty;



There is a more or less well-defined context;



Perceptions are finite;



The neighborhood of objects and conceptual neighborhood of relations between objects provide very useful information for spatial reasoning.

Besides representation, common sense has also to do with reasoning. Human reasoning mechanism is efficient, robust and trustworthy enough to solve important problems, and humans seem to just "pick it up without any effort" (Kuipers, 1979). This adaptability and flexibility of the human reasoning process for incomplete knowledge has been defined formally and logically, and it is called commonsense reasoning (Štěpánková, 1992). Our argument in this paper is that commonsense can also be used to formalize automatic processes to deal with cyber space and cyber-attacks.

3. Qualitative models to represent common sense knowledge The method most widely used to model commonsense reasoning in space and time domains is qualitative models (Hernandez, 1994; Štěpánková, 1992; Freksa, 1991; Werthner, 1994; Dague, 1995). Qualitative models help to express poorly defined problem situations, reason with partial information, support the solution process, and lead to a better interpretation of the final results (Werthner, 1994). A qualitative representation can be defined (Hernandez, 1994) as that representation which makes only as many distinctions as necessary to identify objects, events, situations, etc. in a given context. In other words, it provides relevant information from the environment. Qualitative models do not structure domains homogeneously, (i.e. with uniform granularity of physical entities) as quantitative representations do. Rather, they focus on the boundaries of concepts: the representation may be viewed as having low resolution for different values corresponding to the same quality, and high resolution near the concept boundaries (Werthner, 1994). Thus, qualitative representations may be viewed as regions from the viewpoint of quantitative representations.

Teresa Escrig, Jordan Hanna, Shane Kwon Qualitative methods have many advantages, which include: 

It might be expensive, time-consuming, impossible, or too complex to get complete information (Davis, 1987). Therefore, a reduction of data without loss of information remains an important goal (Freksa, 1991).



High-precision quantitative measurements are not as universally useful for the analysis of complex systems, as was believed at the beginning of the computer age (Freksa, 1991).



Qualitative knowledge is robust under transformations (Werthner, 1994).



Reasoning with partial information allows the inference of general rules that apply across a wide range of cases (Davis, 1987). Therefore, qualitative methods possess a higher power of abstraction (Hernandez, 1994), which can be viewed as that aspect of knowledge, which critically influences decisions (Freksa, 1993).



Qualitative representations handle vague knowledge by using coarse granularity levels, which avoid having to be committed to specific values in a given dimension. Only a refinement of what is already known is needed (Hernandez, 1994).



Qualitative representations are independent of specific values and granularities of representation. In this way, qualitative representations allow for top-down approach to characterizing situations, in comparison to bottom-up approaches suggested by quantitative representation (Werthner, 1994).

Qualitative methods have one drawback; they are not deterministic, in the sense that they might correspond to many "real" situations (Hernandez, 1994). Qualitative Models can be used in many application areas from everyday life in which spatial knowledge plays a role, particularly in those that are characterized by uncertainty and incomplete knowledge (Hernandez 1994). A survey of the techniques and applications on Qualitative Reasoning can be found in (Dague, 1995). Our claim is that Qualitative Models can also play an important role in providing a better landscape of solutions for cyber security.

4. Qualitative models to automate cyber security analysis (QMACSA) approach We propose in this paper a novel approach to automate cyber security analysis using qualitative models (Qualitative Models to Automate Cyber Security Analysis -QMACSA) (Figure 2). After obtaining a network traffic capture, we extract the most relevant features that would be of assistance to finding network related attacks. The selection of features is done without consideration of the particular type of application using the network at the time, and it will remain this way for several more iterations of our research. The qualitative representation is done in the Data Visualization module, which presents the most relevant features from a graph database (sub-section 4.1) and visualizes them with different web-based data visualization techniques (sub-section 4.2). The result of this module is a particular representation of relevant features that feeds the module of Cognitive Deep Analysis. In the Cognitive Deep Analysis module, ontologies (Mattos-Rosa, 2013) are used to automatically create models of normal and abnormal behavior corresponding to regular traffic data and cyber-attacks, respectively. The relevant features extracted in the Data Visualization module are constantly compared with the existing models of normal traffic and attacks. If there is a set of relevant features that do not correspond to any normal traffic or known attacks, it is classified as a suspicious activity. The traffic associated with a suspicious activity is tested in a sandbox environment to reproduce the protected system without any dangerous consequence. The result of the test of the suspicious traffic data in the sand box (being an attack or normal traffic) corresponds to meta-data that feeds the Cognitive Deep Analysis module to automatically create new models of attacks or normal traffic data, respectively. Section 4.4 explains in more detail the function of the sandbox.

Teresa Escrig, Jordan Hanna, Shane Kwon

Figure 2: System architecture for Qualitative Models to Automate Cyber Security Analysis (QMACSA)

4.1 Graph databases Graph databases are databases that store data using native graphs and support graph data models for access and query methods. There are three types of models commonly referred to as graph databases: hypergraphs, property graphs, and triple-stores. Hypergraphs consists of relationships (hyper-edge), which can have multiple start and end nodes. Property graphs are similar but relationships have only one start and end node. Triple-stores use the form subject-predicate-object for storing data and are technically not graph databases. The most common model used today is property graph databases for their practicality (Robinson 2013). Unlike relational data models, graph representations can be conceptually more natural to use in many domains such as social networks and linked data (CudrĂŠ-Mauroux, 2011). This is because for these applications the most salient features of the data are the entities and their relations to each other rather than specific normalized schema. In particular, graph databases are optimized for traversal and adjacency queries. Although graph databases are not always more efficient than relational databases, the performance in graph operations are substantially better. In addition, relational databases such as MySQL have indexing performance issues compared to Neo4j and its index-free adjacency when working with dense graph data (Buerli, 2012). Objective tests on query performance run by Batra et al. also indicate better performance than MySQL (Batra, 2012). Graph databases are a natural fit for analyzing network traffic. Network traffic flows have been modeled as graphs for threat detection and traffic profiling by various researchers (Le, 2012; Iliofotou, 2011; Collins, 2011) and these techniques have been noted as being accurate for these purposes. Additionally there has been some work using graphs for anomaly detection in network traffic datasets (Noble 2003). A proof-of-concept proprietary graph database was used in the implementation of the Graph-based Intrusion Detection System (Staniford-Chen, 1996). The graph database will allow us to represent large volumes of network traffic. The graph database technology chosen for this research is Neo4j (Neo4j, 2013) due mainly to its fast growing community of users. The system offers high-speed processing, configurable data entry from multiple sources, and the management of networks with billions of nodes and connections from a desktop PC. Users can quickly and easily identify interrelated records by formulating queries based on simple values such as names and keywords. Until now, this was possible to a certain extent using database technology, but Neo4j extracts new information from interrelated data and improves the speed and the capacity to perform complex queries in large data networks. Neo4j is a schema-less NoSQL database that uses the property graph model for data. It consists of three main elements: nodes, relationships, and properties. Nodes and relationships correspond to vertices and edges, and

Teresa Escrig, Jordan Hanna, Shane Kwon both can contain one or more properties, which are basically key-value pairs (Robinson, 2013). The previous works and the intrinsic graph structure and high volume of data of network traffic show that a graph representation and the use of a graph database are appropriate for our approach. Also, as noted earlier, there may be performance advantages over using a relational database.

4.2 Web-based data visualization In the area of network security, large amounts of data are not a big surprise. Thousands of packets move every second in even modestly sized networks. Therefore, when attacks occur, quick analysis of large amounts of data is required. The purpose of data visualization is to convey meaningful information in reference to potentially large amounts of data (Ferebee, 2008). Visual data analysis helps us to perceive patterns, trends, structures, and exceptions in even the most complex data sources (Shiravi, 2012). We analyzed several tools and libraries capable of data visualization, but only one fit our needs. Our final decision came down to a few core aspects that were of priority for our design: 

Ability to perform in real-time: Considering the final goal is to actively prevent network attacks, real-time analysis is a key component in our visualization.



Dynamically interactive: For data visualization to truly become a tool, it should be capable of real-time change upon user interaction. This aspect transforms a security visualization into a security tool.



Flexible: Our visualization techniques and approach may alter considerably as we further develop our system of analysis. This adds the requirement of our tools being as flexible as our approach and techniques.

After reviewing what we needed our data visualization to offer, we found that the JavaScript library D3.js (Bostock, 2013) was a perfect fit. D3.js (Data-Driven Documents) is a web-based JavaScript library, and requires no client-side extensions. It works seamlessly with existing web technologies, and provides the flexibility of known HTML, CSS, and SVG APIs to create real-time interactive visualizations. This inherent flexibility of D3 is further enhanced by the large amount of documentation, examples, community, and accessibility of the development team. Being web-based and backed by JavaScript, D3 is highly capable of integrating interaction with dynamic responses. With the large strides in JavaScript rendering made by recent browser technologies, D3 looks to be a leading contender in web-based security visualization.

4.3 Cyber ontology There are many efforts to standardize the expression and reporting of cyber-security-related information, such as Making Security Measurable (Kumar, 1994) by MITRE, which leads the development of several open community standards. These standards are primarily designed to support security automation and information interoperability, as well as facilitate human security analysis across much of the Cyber-Security Information and Event Management (CSIEM) lifecycle. Some of the major security-related activities supported by the standards are: vulnerability management, intrusion detection, asset management, configuration guidance, incident management and threat analysis. Federal government organizations and security tool vendors are moving toward adoption of Security Content Automation Protocol (SCAP) validated products to ensure baseline security data and tool interoperability (Stiawan, 2012). While Making Security Measurable (MSM)related standards are valuable for enabling security automation, insufficient vocabulary management and data interoperability methods as well as domain complexity that exceeds current representation capabilities impede the adoption of these important standards. To solve this problem, a few ontologies have been developed in the cyber security domain, which will enable data integration across disparate data sources (Parmelee, 2010; Obrst, 2012; Swimmer, 2008). Formally defined semantics will make it possible to execute precise searches and complex queries. Initially, this effort focused on malware, which is one of the most prevalent threats to cyber security. Ultimately, due to the complexity of the cyber security space, the ontology architecture, which was required to include all cyber security expressions, will contain a number of modular sub-ontologies.

Teresa Escrig, Jordan Hanna, Shane Kwon

4.4 Sandboxing Sandboxing is a technique to isolate the effects of a particular process or event on a system’s general state (Greamo, 2011). When implemented correctly, the sandboxing technique can be used to prevent otherwise harmful operations from affecting our system. Our models alone cannot determine whether unknown patterns of traffic are malicious or not. By replaying traffic in a sandboxed environment, we are able to determine whether or not a particular request to our service has caused any damage to the application. The results of the sandbox can then be used to create models for the system to use when detecting future occurrences of the same attack. Sandboxing requires knowledge of the application level, because it relies on multiple instances of the software being running and a way to keep changes contained to one particular instance of the software. The amount of data we need to create a model of an attack depends on the amount of cases we want to cover with the model, but must include whether or not the event replayed was a successful attack or not. Sandboxing can be implemented in a number of ways, mainly by the program implementing a sandbox in itself, or by an unmodified version of the program being run in a hypervisor (Greamo, 2011). In order to attain as great of a compatibility level as possible, we will likely choose the hypervisor model. Our sandboxing system must meet a few additional requirements, which will be left as future work for our project: 

The sandbox must be identical to the production system (though on its own copy of data/resources) so that the execution results will be the same as they were on the production system.



The sandbox cannot be exploited in itself (i.e. denial of service) by specially crafted requests.



The implementation of the sandbox must be simple enough to be formally verified.

5. Results The preliminary system uses a Python script to import the packet capture (pcap) data into a Neo4j database assigning IP addresses as properties of the nodes and packets as relationships. The relationships have the TCP flags and packet size as properties. Using a Node.js web application, we perform a cypher query and transform the results into a JavaScript Object Notation (JSON) object which is parsed by the Data-Driven Documents library to render the visualization. The construction of the visualization is as follows. Nodes are represented as blue circles and the size is determined by the number of relationships attached to it. Relationships are individual packets and the size is determined by the packet size. Figure 3 shows the resulting pattern obtained from visualizing a SYN flood Distributed Denial of Service (DDoS) attack. The data was obtained from an example pcap file from Wireshark.org. The central, and largest, node is the DDoS victim and the smaller outlying nodes are the attackers. The visualization results in a distinctive structure that can be profiled and measured as evidence for analytical systems.

Figure 3. Resulting pattern obtained from visualizing a SYN flood DDOS attack

Teresa Escrig, Jordan Hanna, Shane Kwon

6. Conclusions Currently a machine or a network is extremely easier to attack than it is to defend. The most urgent aspects to improve in the cyber security area are: 

We need to obtain automatic knowledge from data.



We need to process network traffic big data in real-time.



We need to have more intelligence in the automatic network traffic analysis.



We need to provide evidence or explanation on the decisions taken.



The solution that we provide needs to be scalable.

Although many AI techniques have been applied to cyber security, there is no evidence of an approach that includes all the previous mentioned aspects. In the area of Robotics, qualitative models have been successfully applied to automatically implement human common sense. In this paper we have defined a new architecture to start using human commonsense in cyber security, which includes the use of graph databases, 3D data visualization ontologies, and sandboxing. The first results have been obtained at visualizing a SYN flood DDoS attack.

References Axelsson, S., (1999) “Research in intrusion-detection systems: a Survey,” TR 98-17. Goteborg, Sweden: Department of Computer Engineering, Chalmers University of Technology. Bace, R. G. (2000). Intrusion detection. Sams. Batra, Shalini, and Charu Tyagi. (2012) “Comparative Analysis of Relational and Graph Databases.” International Journal of Soft Computing, Vol 2. Bostock, M. (2013) “Data Driven Documents”, [online], http://d3js.org/. Buerli, Mike, and Cal Poly San Luis Obispo. (2012) “The Current State of Graph Databases.” [online] https://wiki.csc.calpoly.edu/csc560/raw-attachment/wiki/Buerli_Bibliography/Graph_Databases_Survey.pdf Collins, M. Patrick. (2011) “Graph-based analysis in network security”, Proceedings of Military Communications Conference 2011 (MILCOM 2011), Baltimore, MD, USA, pp 1333-1337. Cudré-Mauroux, Philippe, and Sameh Elnikety. (2011) “Graph data management systems for new application domains.” International Conference on Very Large Data Bases (VLDB). “Cybersecurity”. (2013). In WhiteHouse.gov. Retrieved from http://www.whitehouse.gov/administration/eop/nsc/cybersecurity Dague, P. (1995). “Qualitative Reasoning: A Survey of Techniques Applications”. AI communications, Vol 8, No. 3, pp 119192. Davis, E., “Commonsense reasoning,” in Shapiro, E., (editor). Encyclopedia of Artificial Intelligence. Wiley, pag. 1288-1294, 1987. “The Digital Universe Is Still Growing”. (2011). In EMC Corp., Retrieved from www.emc.com/leadership/digitaluniverse/expandingdigitaluniverse.htm. Escrig, M.T., Peris, J.C, (2005) “The use of a reasoning process to solve the almost SLAM problem at the Robocup legged league”, Catalonian Conference on Artificial Intelligence. Escrig, M.T., Toledo, F., (1998) “Qualitative Spatial Reasoning: Theory and Practice. Application to Robot Navigation,” IOS Press, Frontiers in Artificial Intelligence and Applications. Amsterdam. Ferebee, D. And Dasgupta, D. (2008) “Security Visualization Survey”, Proceedings of the 12th Colloquium for Information Systems Security Education, University of Texas, Dallas. Freksa, C. (1991). “Qualitative spatial reasoning”. Cognitive and Linguistic Aspects of Geographic Space, pp 361-372. Kluwer Academic Publishers, Dordrecht. Freksa, C., & Röhrig, R. (1993). “Dimensions of qualitative spatial reasoning”. Graduiertenkolleg Kognitionswiss., Univ. Hamburg. Greamo, C.; Ghosh, A., (2011) “Sandboxing and Virtualization: Modern Tools for Combating Malware,” Security & Privacy, IEEE , Vol 9, No. 2, pp 79,82. Henning, K. L. F. R. R., Reed, J. H., & Simonian, R. P. (1990). “A Neural Network Approach Towards Intrusion Detection”. In NIST (Ed.) Proceedings of the 13th National Computer Security Conference, pp 25-134. Hernandez, D. (1994). “Qualitative representation of spatial knowledge”, Vol 804. Springer. “History of the Internet”. (2013). In Wikipedia. Retrieved from http://en.wikipedia.org/wiki/History_of_the_Internet Hively, L., Sheldon, F., & Squicciarini, A. (2011). “Toward Scalable Trustworthy Computing Using the Human-PhysiologyImmunity Metaphor”. Security & Privacy, IEEE, Vol 9, No. 4, pp 14-23. Iliofotou, M. (2011). Analyzing Network-Wide Interactions Using Graphs: Techniques and Applications. (Doctoral dissertation).

Teresa Escrig, Jordan Hanna, Shane Kwon Javitz, H. S., Valdes, A., & NRaD, C. (1993). “The NIDES statistical component: Description and justification”. Contract, Vol 39, No. 92-C, 0015. Kuipers, B., (1979) “Commonsense Knowledge of Space: Learning from Experience,” in Proceedings of the 6th International Joint Conference on Artificial Intelligence, pp 499-501. Los Altos, California. Morgan Kaufman. Kumar, S., & Spafford, E. H. (1994). “A pattern matching model for misuse intrusion detection.” In NIST (Ed.), Proceeding of the 17th National computer security conference. National Institute of Standards and Technology (NIST). pp 11-21. Le, D. Q., Jeong, T., Roman, H. E., & Hong, J. W. K. (2012). “Communication patterns based detection of anomalous network traffic.” In Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on, pp 185-185. Lindqvist, U., Porras, P.A., (1999) “Detecting Computing and networking misuse through the production-based expert system toolset (P-BEST),” in L. Gong & M. Reiter (eds.) Proceeding of the IEEE symposium on security and privacy, IEEE Computer Society, pp 146-161. Ludovic, M., (1998) “ASSATA: A Genetic algorithm as an Alternative tool for security Audit Trails Analysis,” RAID. Mattos-Rosa, Th., Olivo-Santin, A, Malucelli, A., (2013) “Mitigationg XLM Injection Zero-Day Attack through Strategy-based Detection system”, in IEEE Security & Privacy, Vol 11, Issue 1. Morel, B., (2011) “Artificial Intelligence a Key to the Future of CyberSecurity,” 4th ACM workshop on Security and artificial intelligence. “The Neo4j Manual.” (2013). In Neo4j. Retrieved from http://docs.neo4j.org/ Noble, Caleb C., and Diane J. Cook. (2003) "Graph-based anomaly detection." In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 631-636. Obrst, L., Chase, P., Markeloff R. (2012). “Developing an Ontology of the Cyber Security Domain”. In Proceedings of the Seventh International Conference on Semantic Technologies for Intelligence, Defense, and Security, Fairfax, VA, Vol 966, pp 49-56. Parmelee, M. (2010) “Toward an Ontology Architecture for Cyber- Security Standards.” George Mason University, Fairfax, V A: Semantic Technologies for Intelligence, Defense, and Security (STIDS). Peris, J.C, Escrig, M.T., (2005) “Cognitive Maps for mobile Robot Navigation: A Hybrid Representation Using Reference Systems”, 19th International Workshop on Qualitative Reasoning, Graz, Austria, pp 179-185. Graz, Austria. Pradhan, M., Pradhan, S., Sahu, S., (2012) “A Survey on Detection Methods in Intrusion Detection Systems,” International Journal of Computer Application, Issue 2, Volume 3. Robinson, I., Webber, J., Eifrem, E. (2013) Graph Databases (Early Release). O’Reilly. Shapiro, E., (1987). Encyclopedia of Artificial Intelligence. Wiley. Sheldon, F. T., & Vishik, C. (2010). “Moving toward trustworthy systems: R&D Essentials. Computer,” Vol 43, No. 9, pp 3140. Shiravi, H., Shiravi, A., and Ghorbani, A. (2012) "A Survey of Visualization Systems for Network Security", IEEE Transactions on Visualization and Computer Graphics, Vol. 18, No. 8, August. Staniford-Chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagland, J., & Zerkle, D. (1996). “GrIDS-a graph based intrusion detection system for large networks.” In Proceedings of the 19th national information systems security conference, Vol 1, pp 361-370. Štěpánková, O. (1992). “An Introduction to Qualitative Reasoning. In Advanced Topics in Artificial Intelligence.” pp 404-418. Springer Berlin Heidelberg. Stiawan, D., Yaseen, A., Idris, M., Bakar, K., Abdullah, A. (2012) “Intrusion Prevention System: a Survey,” Journal of Theoretical and Applied Information Technology, Vol 40, No. 1, pp 44-54. Swimmer, M. (2008) “Towards An Ontology of Malware Classes.” [Online] http://www.scribd.com/doc/24058261/Towards-an-Ontology-of- Malware-Classes. Werthner, H. (1994). “Qualitative reasoning: modeling and the generation of behavior.” Springer. “Why Antivirus Companies Like Mine Failed to Catch Flame and Stuxnet.” (2012). In Wired. Retrieved from http://www.wired.com/threatlevel/2012/06/internet-security-fail/

Needed: A Strategic Approach to Cloud Records and Information Storage Patricia Franks School of Library and Information Science, San José State University, San José, CA, USA patricia.franks@sjsu.edu Abstract: In recent years, a number of peer-reviewed papers have been published related to the adoption of cloud storage by business, industry, and government organizations. Many of the papers describe the perceived benefits and risks of cloud storage, and some explore the technological solutions that can be implemented by the cloud storage providers. In addition to scholarly publications, because of the rapidly developing nature of cloud computing, materials presented by professional organizations and vendors also contain information that can be used to provide guidance to those developing a cloud storage strategy and selecting cloud storage solutions. This paper is exploratory in nature and based upon a review of both current scholarly literature and materials made available by vendors and professional organizations. It seeks to examine a number of issues surrounding the selection of a trusted cloud storage solution that meets the organization’s security needs, including statutory compliance, privacy and confidentiality, integrity of the data, access policies, and information governance. It also considers legal issues, such as ownership of data stored on third-party servers and policies and procedures for addressing legal hold and e-discovery requests. This paper then builds upon this research by presenting steps that can be taken to develop a cloud storage strategy before entering into an agreement to store records and information in the clouds. Keywords: cloud storage, security, privacy and confidentiality, governance, risk, compliance

1. Introduction Increasingly, organizations worldwide are turning to the cloud to reduce costs, increase operational efficiency, and improve their business processes. According to a Gartner survey, about 19% of organizations use the cloud for production computing, while 20% use public cloud storage services (Butler, 2013). Since cloud storage providers constitute an emergent sector, some will likely fail or be forced to change their business models, which could result in a modification of their terms of services policies and the functionality provided. Due diligence is advised, but it is not always easy to predict future events. In the case of cloud computing, it is essential to identify the options available for outsourcing storage to the clouds and the accompanying benefits and risks. This requires an understanding of the security, compliance, and governance requirements faced by the organization. This article describes cloud computing, including three original categories of web-hosted services as well as specializations recently added to the original list. It provides a review of peer reviewed and non-peer reviewed literature, including research identifying the key drivers for cloud computing. It also reviews literature citing the risks involved in utilizing cloud storage, including information from a document published by ARMA International, the professional association for records and information managers. Finally, the information gathered is used to develop a framework that can provide guidance for developing a strategy to meet the organization’s goals for secure cloud storage solutions.

2. Cloud computing context The National Institute of Standards and Technology (NIST) describes cloud computing as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Mell and Grance, 2011). Cloud computing involves web-based hosted services that can be divided into the following three original categories: 

Software as a service (SaaS): Software as a service means delivering software over the Internet, eliminating the need to install the software on the organization’s own computers. For example, a number of SaaS-hosting providers are equipped to deploy Microsoft Office as a hosted service in a secure and reliable environment.

Patricia Franks 

Platform as a service (PaaS): The best-known example comes from Salesforce.com, which has been providing customer relationship management (CRM) applications since 1999. Salesforce.com offers a set of tools and application services called Force.com that Internet service vendors and corporate IT departments can use to build new and better applications for tasks such as human resource management (HRM), supply chain management (SCM), and enterprise resource planning (ERP).



Infrastructure as a service (IaaS): Infrastructure as a service is the delivery of computer infrastructure, generally virtualized platform environments, as a service. This service typically is considered a utility, like electricity and water, which is billed based on the amount of resources consumed. Amazon.com Web Services and Rackspace are two examples of this type of cloud service (Franks, 2013).

As the cloud market matures, specializations are being added to the original list of services, including Desktopas-a-Service (DaaS), Metal-as-a-Service (MaaS), Disaster Recovery-as-a-Service (DRaaS), and Storage-as-aService (STaaS). Cloud adoption is growing according to a recent survey of 1,242 IT professionals, 39% of organizations either implemented or maintained a cloud solution in 2012, up from 28% in 2011. Respondents to the survey were divided into 8 industrial sectors: small business, medium business, large business, federal government, state/local government, healthcare, higher education, and K-12 education. Three of the sectors—large business, federal government, and healthcare—indicated conferencing and collaboration as their top priority. Five of the sectors—small business, medium business, state/local government, higher education, and K-12 education—indicated that the top service or application they have either migrated or are in the process of migrating to the clouds is storage (CDW, 2013). This last service—Storage-as-a-Service (STaaS) is the focus of this paper.

3. Literature review A recent search using the term “cloud storage” in the EBSCO database returned 295 peer reviewed articles. A number of the articles were either mathematically- or scientifically oriented, such as Data Migration from Grid to Cloud Computing. Other articles were specific to an environment, such as Digital Storage and Archiving in Today’s Libraries, or application specific, such as Geopot: a Cloud-based geolocation data service for mobile applications. Articles that apply more broadly to cloud storage were referred to in writing this article. One written by Rajathi and Saravanan and published in the Indian Journal of Science and Technology, A survey on secure storage in cloud computing, cites the major risks of storage in the cloud as security, data integrity, network dependency, and centralization. The authors present and analyze various cloud storage techniques to provide security including identity-based authentication, effective a secure storage protocol, and optimal cloud storage systems (Rajathi and Saravanan, 2013). An article written by Wang, Chow, Wang, Ren, and Lou and published in 2013, Privacy-preserving public auditing for secure cloud storage, suggests that data integrity protection is challenging because users no longer have possession of the outsourced data. Rather than place the onus of investigating the third party provider, the authors suggest the use of public auditability for cloud storage using a third-party auditor (TPA) to check the integrity of the outsourced data. This proposal suggests outsourcing not only the data but its integrity checking as well (Wang et al., 2013). In addition to peer-reviewed scholarly literature, non-peer reviewed literature from practitioners and white papers and other materials from vendors can shed light on the types of cloud services available and the results of their practical application. One such article, Cloud Formations: The nature of cloud-based storage and networking, explains the origin of the cloud symbol and describes the differences between cloud services and cloud storage in simple terms. According to the author, cloud storage is a repository for bits and cloud services are an extended set of capabilities that can include functionality such as content delivery management, transcoding, processing, and even “cloud editing” (Paulsen, 2012). Cloud storage, also known as Storage-as-a-Service or STaaS, falls under the umbrella of cloud computing. Storage-as-a-Service (STaaS) provides backup and storage services on the Internet. Think of your own computer system with all of your important files. A hard drive crash would cause irreparable damage unless you had a backup of those files. Rather than synchronize your work to a second hard drive linked to your

Patricia Franks computer (as I had done for years), you can now back up all of your files to the equivalent of a hard drive in the clouds. Free or low-fee options suitable for individuals include Dropbox (www.dropbox.com) and Box.com (www.box.com). Enterprise licenses are also available.

3.1 Benefits of cloud storage “Storing Information in the Cloud—A Research Project,” published in the Journal of the Society of Archivists 2011, reported on a survey of 41 information professionals that investigated the management, operational and technical issues surrounding the storage of information in the cloud (Ferguson-Boucher and Convery, 2011). The key drivers for cloud computing are shown in Figure 1.

Figure 1: Key drivers for cloud computing. Source: “Information in the Cloud—A Research Project” published in the Journal of the Society of Archivists 2011. According to the same researchers, “storing information in the cloud can range from simple storage and repository approaches for inactive records to applications that have document or even records management functionality similar to traditional in-house Electronic Document and Records Management Systems (EDRMS). Outsourcing information storage can free up internal computing resources, save cost and enable information management specialists to concentrate on the management of active, vital information (Ferguson-Boucher and Convery, 2011). In a separate survey of 1,300 American and UK companies, 88 percent of respondents claimed that their small business saved money through employing cloud services (LaPonsie, 2013). This indicates that the fact that reduced ICT spending is not considered a benefit in Table 1 requires further analysis to determine the reasons for this wide discrepancy. In addition to commercial storage providers, it might surprise some to realize that if they are using popular social media technology, their files are stored in the Cloud as well. According to Tim Regan, Microsoft Research, “The greatest trick the cloud’s creators ever pulled was convincing the world that it doesn’t exist.” Douglas Heaven, New Scientist reporter, adds that “54% of people claim to never use cloud computing. Yet 95% of them actually do” (Heaven, 2013).

Patricia Franks

3.2 Risks of cloud storage Cloud providers constitute an emergent sector, and some cloud providers will likely fail or be forced to change their business models, resulting in a reduction of the functionality delivered for a specific price. In 2011, one well-known storage provider discontinued its public cloud storage business. The firm said the end date for the service would be no sooner than 2013 and offered to help customers migrate to another provider or return their data. A competitor offered to migrate the data for customers to their own cloud storage network free of charge. The original company continues to offer a higher-value service that combines archiving with indexing and classification capabilities, but customers of the commodity-based service were forced to make other arrangements (Franks, 2013). Access to files can also occur without prior warning, as illustrated by the experience of Kyle Goodwin, an Ohio videographer who lost data crucial to his business when Megaupload was raided in January of 2012 by the United States Department of Justice. Goodwin, who runs a business video taking high school sports across Ohio, kept his video archive on a personal hard drive and backed it up on Megaupload. In January of 2012, his hard drive crashed, and Megaupload’s servers were seized. He called on the assistance of the Electronic Frontier Foundation to negotiate with the federal government for the return of his data. The EFT has asked the courts to establish a precedent for innocent third parties to have their property returned. A decision has not yet been made, but according to the federal government, Goodwin forfeited his property rights when he uploaded the files to Megaupload (Heaven, 2013). In 2010, ARMA International published a document, Guidelines for Outsourcing Records Storage in the Clouds. In addition to immediate loss of your files if the servers on which they are stored are confiscated and the necessity of moving your files to a new provider if the terms of service provided by the original cloud provider change, the ARMA document presents the following risks that could be mitigated before entering into an agreement with the cloud storage provider: 

Accessibility – Providers may state they provide 24/7 access. Find out what they are doing to prevent access outages, e.g., mirroring of servers at different locations, alternate Internet routing for network outages, etc.



Data security – The security of the organization’s data and access to the application is completely dependent on the service provider’s policies, controls, and staff. Determine what controls are in place and if they are sufficient. For example, is there a protocol and agreement with the provider to lock down the data (initiate a legal hold) in the face of an obligation to preserve it to avoid spoliation issues and unwanted sanctions?



Data location – Sharing resources can mean that data and applications are not in a specific, physically identifiable location. Information may be stored in foreign countries and subject to those privacy and confidentiality rules. Determine if your data will be transferred across borders. If so, take steps to mitigate risks, e.g., U.S. firms can register for Safe Harbor protection to certify annually that the organization complies with privacy principles consistent with those of the European Union.



Data Segregation – Multiple organizations may share an application and a resource (such as a server). If so, it is critical to know and understand how data is segregated and protected. Commingling of data can make segregation problematic, and confidential data may be inadvertently shared with others.



Data Integrity – the storage provider should back up Data and its associated application. If a third party is used to backup the data and applications, it must meet the same standards as the cloud storage provider. If records are stored, it is necessary to destroy backup data in compliance with the retention schedule and an audit trail is necessary to prove integrity.



Data Ownership – This question goes back to the first two examples of risk. Who owns the data under different circumstances? For example, in the event of a contract dispute, can the cloud provider hold the data hostage? What happens if the cloud storage provider goes bankrupt or is acquired by another organization? (ARMA International, 2010)

It is easy to become overwhelmed with the volume of information available on cloud storage. Once information has been gathered, it should be used to develop a strategic approach to storing both records and information in the clouds.

Patricia Franks

4. Develop a cloud storage strategy Because of the large volume of digital information for which organizations are responsible, there is a growing debate over the need to distinguish between records and information. The result of the debate will have an impact on an organization’s storage needs and related expenses. Organizations faced with e-discovery requests cannot distinguish between records and information. They are responsible for all of their digital data and must comply with requests to present the relevant information. However, organizations that have legally defensible records retention schedules can dispose of a great deal of their Electronically Stored Information (ESI) in a legally defensible and consistent manner before a discovery request is made. This will reduce the cost not only of storage but also of retrieval, review by legal staff, and redaction of personally identifiable information (PII) and confidential information. A strategy can be thought of as a roadmap to take the organization where they wish to go—in this case related to cloud storage. The best way to develop a strategy is to write it in a way that it leads to actions that can be taken to meet the organization’s goals. Of the many perceived benefits of cloud storage, which are the most important to your organization—for example are you most interested in lowering ICT cost or backing up data for disaster recovery and business continuity? Once you do that, you can proceed with steps similar to the following: Determine the current state of storage of digital data/information: 

Examine your organization’s polices and practices regarding digital data, especially your organization’s records and information management policies and records retention schedule.



Identify your organization’s existing digital information, including type, volume, owners and current location.



Determine which data/information is suitable for storage in the clouds, what volume will be stored, and the anticipated rate of growth.

Do your homework: 

Explore legal issues, such as jurisdiction in which data can/must be stored.



Investigate regulations that may impact the way data must be stored in your industry; for example, is your firm governed by Health Insurance Portability and Accountability Act (HIPAA) regulations?



Identify vendor issues that may pose risks, such as cost of moving to another service.



Analyze risks identified, determine their consequences and likelihood of occurring, and then determine what is acceptable based on the firm’s risk appetite.

Prepare a list of criteria the cloud storage provider must satisfy and then evaluate vendors: 

Are you looking for a public, private hybrid cloud solution?



What data types do you wish to store (e.g., text, images, audio)?



How can your data be classified (e.g., PII, confidential, vital records)?



What type of access do you require (e.g., secure, public)?



Must users be validated? Are permissions reviewed regularly?

Gather vendor information including: 

Do the vendor’s terms of service agreements satisfy the firm’s goals and objectives? If not, can they be negotiated?



How are audits of the vendor’s systems and processes handled?



Where are the vendor’s servers located? Does the vendor outsource storage? If so, where are the thirdparty’s servers located?



How long has the vendor been in business? In the cloud storage business?



Can the vendor provide a list of clients you can contact?



Does the vendor provide public, private or hybrid cloud solutions?

Patricia Franks 

What is the vendor’s backup strategy? (medium used? Location of backed up files?)



Are data encrypted if using a public cloud solution when at rest? In transit?



Can the vendor provide a disaster recovery and/or business continuity plan?



How are records management issues—such as legal holds, retention, and disposition—handled?



Is there a cost to move your data to another provider?



Are there provisions for you to download your data without cost if another firm acquires the vendor, the vendor ceases to offer the same services, or the vendor enters into bankruptcy?

Make a decision and monitor results: 

Select the most appropriate cloud storage provider for your firm’s needs.



Develop a method to evaluate the results of that decision.



Keep an eye on emerging trends in cloud storage and changes the regulatory and legal environment in which you operate to determine if changes must be made to your cloud storage strategy.

The above steps will be modified and expanded based upon the firm’s size, industry, appetite for risk, and other criteria. However, they can provide the organization with a place to start when developing their own cloud storage strategy.

5. Conclusion A cloud storage strategy involves considering not only the benefits of cloud storage but also the risks. Organizations that have implemented an information governance program will have the structure in place to bring together all of the stakeholders to provide input into the Cloud Storage strategy. If there is no Information Governance committee in place, the decision should be made with the input of Information Technology, Records Management, Business Units, and Legal professionals. The Cloud Computing environment, as well as the legal and regulatory landscape, is evolving rapidly, so an evaluation process must be included in the strategy to ensure that the cloud storage solution select continues to meet the organization’s needs, including statutory compliance, privacy and confidentiality, integrity of the data, data controls and access policies, and information governance.

References ARMA International. (2010) Guideline for Outsourcing Records Storage to the Cloud. ARMA International, Overland Park, KS. Butler, B. (2013) “Gartner: Top 10 cloud storage providers”, NetworkWorld, 3 January. http://www.networkworld.com/news/2013/010313-gartner-cloud-storage-265459.html CDW. (2013) “Silver Linings and Surprises: CDW’s 2013 State of the Cloud Report”. http://webobjects.cdw.com/webobjects/media/pdf/CDW-2013-State-Cloud-Report.pdf Ferguson-Boucher and Nicole Convery. (2011) “Storing Information in the Cloud—A Research Project”, Journal of the Society of Archivists, Vol 32, No. 2, October, pp 227-228. Franks, P. C. (2013) Records and Information Management, ALA/Neal-Schuman, Chicago, IL, p 161. Heaven, D. (2013) “Lost in the clouds”. New Scientist, Vol 216, No. 2910, 30 March, pp 35-37. LaPonsie, M. (2013) “Can cloud services improve small business profits?” Small Business Computing.com. 11 March. http://smallbusinesscomputing.com/tipsforsmallbusiness/can-cloud-services-improve-small-business-profits.html Mell, P. and Grance, T. (2011) The NIST Definition of Cloud Computing. SP800-145. National Institute of Standards and Technology. September. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf. Paulsen, Karl. (2012) “Cloud formations: The nature of cloud-based storage and networking”. Digital Video, October. Rajathi, A. and Saravanan, N. (2013) “A survey on secure storage in cloud computing”. Indian Journal of Science and Technology, Vol 6, No. 4, April. Wang, C.; Chow, S.; Wang, Q.; Ren, K.; and Lou, W. (2013) “Privacy-preserving public auditing for public auditing for secure cloud storage”. IEEE Transactions on Computers Vol 62, No. 2, February, pp 362-374.

Proxy Impersonation Safe Conditional Proxy Re-Encryption Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi and Sree Vivek Indian Institute of Technology, Madras, Chennai, India dheerajgandhi@gmail.com prangan55@gmail.com ssreevivek@gmail.com sharmioshin@gmail.com Abstract: Proxy Re-Encryption (PRE) allows a proxy to convert a ciphertext encrypted under the public key of a user A to a ciphertext encrypted under the public key of another user B, without knowing the plaintext. If A wishes that encrypted message (under his public key) in the cloud be accessible / readable to another user B, then a protocol involving A, B and proxy is run to generate re-encryption key. Proxy may now convert any message encrypted under the key of A to the another ciphertext encrypted under the key of B. In order to prevent the proxy from converting all the encrypted

messages, the notion of Conditional Proxy Re-encryption (CPRE) was introduced in the literature. In CPRE, the user A specifies not only the target user B, but also the type of messages that the proxy is allowed to reencrypt for B. One obvious security requirement for such a scheme is that the proxy should not be able to obtain the secret key of A or B by colluding with B or A respectively. Designing a collusion resistant CPRE is an interesting and challenging task. While the existing ID based CPRE schemes have the collusion resistance property, they lack another important security requirement which we refer as the Proxy Impersonation (PI). Suppose, B gets a re-encrypted ciphertext through a proxy. If this enables B to convert this encrypted message from A to another message for user C (without the involvement of proxy or A), then B is said to Impersonate the proxy. If such an impersonation is possible, then that would lead to distribution rights violation of encrypted contents, specifically, in the context of media content streaming and networked file storage on cloud. We first show that the existing ID based CPRE scheme suffers from Proxy Impersonation weakness. Then, we move to the design of a novel ID based CPRE that is secure against Proxy Impersonation. We formally prove the security property in random oracle model. Keywords: Identity based encryption, proxy re-encryption, bilinear pairing, proxy impersonation, collusion resistant, conditional proxy re-encryption

1. Introduction Proxy Re-Encryption (PRE) is the cryptographic primitive which involves re-encrypting a message encrypted under the public key of a delegator by a semi trusted proxy on behalf of the delegator under the public key of delegate. Let us consider an example of E-mail forwarding. As we can see in Fig. 1, Proxy Re-encryption involves three participants- Alice (delegator), Bob (delegate) and a semi trusted proxy (third party). Let us consider a situation where Alice proceeds on a vacation and she wants Bob to look after her mails in her absentia without Bob knowing her secret key. In this scenario, PRE facilitates translation of ciphertext encrypted under Alice's public key to another ciphertext under Bob's public key without the proxy getting to read the plaintext.

Figure 1: Email Forwarding

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi Conditional Proxy Re-Encryption (CPRE) is a derivative of PRE where a proxy is given the re-encryption key related to particular condition 'c' and it re-encrypts ciphertexts encrypted under an identity and condition 'c' to another ciphertext under different identity, retaining the same plaintext. During the above re-encryption proxy is not allowed to see the plaintext. Considering the example in Fig.2, the delegator provides proxy with three re-encryption keys, each corresponding to the conditions A, B and C. The proxy re-encrypts the incoming ciphertexts with condition 'A' to another ciphertext under the identity of delegate A. Similarly, the incoming ciphertexts with conditions 'B' and 'C' are re-encrypted by proxy to another ciphertext under the identity of delegatees B and C respectively. Here, the delegates get only those messages which are relevant to them and on the other hand, the privacy of the delegator is maintained, since, not all the messages of delegator are being delegated.

Figure 2: CPRE The PRE schemes generally are vulnerable to attacks at various levels that may lead to leakage of secret data or keys. Specifically, IBPRE schemes involve a trusted third party called the PKG, who can generate secret key for any identity, since it holds the master secret key. Further, proxy holds the Re-Encryption key, that is derived from the delegator's secret key and delegatee's public key. If Proxy and the delegatee collude and share their information they can compute either the secret key of delegator or the re-encryption key for another delegatee, which is not desired by the delegator. These types of attacks are broadly classified as Collusion attacks. Proxy shares the components of the Re-Encryption key and delegatee share the components of its secret key to compute the secret key of the delegator. It is desirous for any PRE scheme to be Collusion Resistant (CR). Here, we discus another important security requirement which we refer as the Proxy Impersonation. Suppose B receives a re-encrypted ciphertext through a proxy. If this enables B to convert this re-encrypted message to another message for user C (without the involvement of proxy or A), then B is said to impersonate the proxy. We shall demonstrate this weakness in subsequent sections.

2. Related Work Mambo and Okamoto proposed a scheme for delegating the decryption rights in (Okamoto, 1997). The first PRE scheme was presented by (M. Blaze, 1988) and it was bidirectional in nature. The first unidirectional Proxy Re-encryption scheme, based on bilinear pairings was presented by (G. Ateniese K. F., 2005)and (G. Ateniese K. F., 2006). (Hohenberger, 2007)proposed a construction of CCA secure bidirectional PRE scheme using bilinear pairings. This marked the advent of a lot of research work in the field of proxy re-encryption. Subsequently, (Verged, 2008)presented a replayable CCA-secrue (RCCA) unidirectional PRE scheme from bilinear pairings. RCCA is actually a relaxed security definition than the CCA security. In CANS 2008, (R.H. Deng, 2008) proposed a CCA-secure bidirectional PRE scheme without pairings, secure in the random oracle model. Later, (J. Weng M. C., 2010) proposed an efficient CCA-secure unidirectional proxy re-encryption scheme in the standard model. At Pairing 2008, (B Libert, 2008) proposed the idea of traceable proxy re-encryption, where malicious proxies colluding and sharing their re-encryption keys can be identified. In Asia CCS'09, (Jian Weng, 2009)proposed conditional proxy re-encryption, which can regulate the proxy at a fine-grained level and only selective ciphertext can be re-encrypted from delegator to delegatee as per the condition defined by the delegator. One of the first Identity Based Proxy re-encryption scheme was proposed by (Matsuo, 2007). Based on Boneh and Franklin's identity-based encryption system (Boneh D. a., 2001), (Ateniese, 2007) presented CPA and CCA-secure Identity Based PRE schemes in the random oracle model. Later (Chu, 2007) presented the constructions of CPA and CCA-secure Identity Based PRE schemes without random oracles. (Jian Weng,

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi 2009)proposed a CCA secure CPRE scheme in the PKI setting. Later (Vivek, 2011)proposed a more efficient CPRE scheme with reduced pairings. At IEEE ICC 2011 (Jun Shao, 2011)proposed a CCA secure CPRE scheme based on bilinear pairings and proved secure in the random oracle model.

3. Definitions In this section we briefly discuss the preliminary concepts required, including bilinear pairings, associated hard problems, security and correctness requirements.

3.1 Bilinear Maps Let

and •

be two cyclic multiplicative groups of order for some large prime number . A bilinear map between these two groups must satisfy the following three properties :is bilinear iff for all and all Bilinear: A map

to the identity in . • Non-degenerate: The map does not send all pairs in for any . • Computable: There is an efficient algorithm to compute A bilinear map satisfying the three properties above is said to be an admissible bilinear map. is said to to be a bilinear group if the above group properties in and bilinear map both efficiently computable.

are

3.2 Computational Bilinear Diffie Hellman (CBDH) Problem • • • •

Let

be a cyclic multiplicative group of order for some large prime number . Define :=T, where and . The computational bilinear Diffie-Hellman problem is to compute the value random . The CBDH assumption asserts that the problem for all PPT algorithms

and a bilinear map

given is

hard

i.e.

3.3 PI safe CPRE The proposed CPRE scheme is a tuple of algorithms ( , ).

Setup( ). This algorithm takes as input, the security parameter , and outputs the master public parameters i.e the message space and the ciphertext space , the master public key , which is distributed to the users and the master secret key , which is kept secret by the PKG. KeyGen ( ) This algorithm is executed by the PKG. Here, a user submits as input his identity . PKG uses the master secret key , and computes a secret key , corresponding to the identifier . This secret key is transferred to the user on a secure channel. Encrypt ( ) In this algorithm, a user computes the ciphertext corresponding to a plaintext message and condition . The inputs are the identity , the plaintext message and the condition and output is the first level conditional ciphertext . ReKeyGen( encryption key (delegator), condition

) This algorithm is executed by the delegator in order to compute the re. The input to this algorithm are the secret key corresponding to the identity and the identity

(delegatee), and the output is the re-encryption key

which is used by the proxy(third party) for re-encryption of the first level conditional ciphertext .

from

, to

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi ReEncrypt (

) Executed by the proxy, this algorithm takes as input the re-encryption key

, condition and the first level ciphertext under identity secondary level conditional ciphertext

encrypted under identity of the delegatee.

and outputs the

) This algorithm is executed by the user with identifier (delegator) to decrypt Decrypt1 ( the first level conditional ciphertext encrypted under his secret key and condition , and the algorithm outputs the plaintext message . Decrypt2 ( ) Executed by the user with identifier (delegatee), this algorithm decrypts under the identifier and outputs the plaintext message . any re-encrypted conditional ciphertext

3.4 Correctness Any PRE scheme would be proven correct if decryption of a ciphertext C using the Decrypt1 algorithm, which was encrypted using the Encrypt algorithm, outputs the original message M. Further, if this ciphertext is reencrypted by the Re-Encrypt algorithm using the re-encryption key generated using the ReKeyGen algorithm, ' and this re-encrypted ciphertext C is decrypted by the Decrypt2 algorithm, we still get the original message M. Formally, we shall prove the correctness of the scheme in subsequent sections.

3.5 Security notion Security of our ID based CPRE scheme is defined as per the following IND-CPRE-CCA game. Setup : Challenger runs the Setup algorithm and hands over the public parameters to adversary

makes the following queries:-



Phase 1 : Upon receiving the public parameters,



submits an identifier id and challenger returns the secret key KeyGen (id): executing KeyGen query.



ReKeyGen (id,id ,c): Challenger accepts from the adversary an identifier pair (id,id ) and executes the . ReKeyGen query and returns the re-encryption key



ReEncrypt (params, ): For a given ciphertext , submitted by , encrypted under ID and condition c, the challenger runs the ReEncrypt query and returns the re-encrypted conditional ciphertext encrypted under .



Decrypt ( , ,c) : Challenger accepts a ciphertext , encrypted under the identifier id and condition c from the adversary and executes the Decrypt query and returns the corresponding plaintext m.



submits an identifier and two equal length messages as the challenge. Challenge : Challenger confirms that adversary is playing fair by ruling out that the following queries have not been made:

  

Extract( ) Extract( )} for any identifier RKExtract( , ,c)

Once confirmed, challenger then picks a random value of to . •

Phase 2 : Phase 1 is revisited, with the constraint that     

to the adversary after

cannot make the following queries :

Extract( ) ReKeyGen( , ) ; Extract( ) for an identifier , ,c) for an identifier ReKeyGen ( , ) and Decrypt( ReEncrypt( , , ,c) and Extract( ) for an identifier ;

and passes the ciphertext

and ciphertext

;

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi Decrypt( , Decrypt(

  •

Guess :

,c); , ) for an identifier

and

Reencrypt(

);

outputs its guess .

The advantage of

can be denoted as

. It can be assumed that if there does not exist an

adversary using any probabilistic polynomial time algorithm that has a non-negligible advantage in the above IND-CPRE-CCA game, then the scheme is IND-CPRE-CCA secure.

4. Demonstration of Proxy Impersonation In this section we outline a critical security weakness, that remains un-addressed in a host of PRE and CPRE schemes i.e. the ’Proxy Impersonation’. We shall explain the weakness in the following scenario. Refer to Fig. 3 & 4, there exists an online music store with large number of encrypted music files in its central database. Whenever, a user logs a purchase request, the online store computes a re-encryption key and hands over to the proxy for re-encrypting the requested file under the identity of the customer and thus sends the reencrypted file to the customer. The customer can play this re-encrypted file on his device, since it is encrypted under his device’s identity. However, the system prohibits the user to get the plaintext file from the received re-encrypted file. Now under a scheme that is not Proxy Impersonation safe, the customer can further reand forward it to them, thus violating the encrypt this file under the identity of any arbitrary users copyright laws. This illegitimate re-encryption takes place without the information of either the Online Store Manager or the Proxy.

Figure.3 Online Music Request

Figure.4 Impersonation

To formally demonstrate the property, we shall consider the scheme in (Jun Shao, 2011). We argue that a delegatee is able to re-encrypt the ciphertext, originally re-encrypted under its identity to another ciphertext of another entity without :under • • •

De-crypting the ciphertext. Colluding with the proxy. Colluding with further delegatees

The Re-encrypted ciphertext •

in scheme (Jun Shao, 2011) has the following five components:

• • • • where Computing we write

and as follows :as

Since the delegatee knows compute the value of

; (components of the received ciphertext) and his secret key from the above equation and subsequently compute

, he can . Now with

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi the knowledge of , delegatee computes another re-encryption another identity, lets say as follows: •

of the re-encrypted ciphertext

under

• • • • uses his secret key to decrypt to recover original message . We show the correctness of re-re-encrypted ciphertext by using Decrypt algorithm from the same , ). scheme - Decrypt( •

Compute

•

Compute

•

Compute

• • • •

Compute Compute Check if If yes, output,

by an

operation between

and

and as the message, else output .

Since the above conditions are satisfied for , hence we prove that Delegatee has successfully impersonated as the proxy and has re-encrypted the re-encrypted ciphertext without the knowledge of delegator or the original proxy. 5.

Proxy Impersonation safe CPRE

As demonstrated in the previous sections, Proxy Impersonation is a strongly desirable security property for any PRE scheme. This applies specifically to those schemes where the re-encrypted ciphertext has those components of the re-encryption key which contain a hash functions on some random variables. In the subsequent section, we propose a CPRE scheme motivated by the PRE schemes in (Ateniese, 2007) and (Woo Kwon Koo, 2012) and provides resistance against Proxy Impersonation. The Scheme has following algorithms :•

Setup(λ) This algorithm takes the security parameter λ as input. It selects a generator and at random and then computes . The public parameters and master secret key   

are defined as : and

  •

KeyGen (msk, ) Given an corresponding secret key

•

Encrypt : (params, id, M ∈ , c) Given an identity id, a plaintext message M, and a condition c, the algorithm selects a random and computes and . The ciphertext is computed as follows:   

and the master secret key, the algorithm returns the

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi  •

ReKeyGen (params, skid , encryption key

•

such that

and then the re-

is generated as follows :-

ReEncrypt (params, conditional ciphertext encrypted ciphertext       

) Delegator chooses a random

) Proxy executes this algorithm and gets as input, the , Re-Encryption key and the condition candcomputesthe reas follows :-

Parse as Parse as Compute Check the following condition:If the above condition is true, compute . Return as the re-encrypted conditional ciphertext under

Decrypt :- The Decrypt algorithm has two variants: One for decrypting the first level conditional ciphertext and another for decrypting the second level, re-encrypted ciphertext.  Decrypt1 (params, ) Given the first level ciphertext condition c, the algorithm computes the plaintext as follows :Parse Parse

    

and

Compute Compute Select a random

and compute the following :.



, secret key

o o Check the following conditions for correctness and validity of ciphertext :o o



o If all the above conditions hold good, then output

as the plaintext, else return .

 Decrypt2 (params, ) Given the second level ciphertext condition c, the algorithm computes the plaintext as follows : Parse as

, secret key



Parse



Compute



Compute



Compute

 

Compute Check the following conditions for correctness and validity of ciphertext :o

and

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi



o If all the above conditions hold good, then output

as the plaintext, else return .

5.1 Correctness Formally, let ciphertext. Then, and = ReKeyGen (params, skid , • Decrypt1 (params, • Decrypt2(params,

be a properly formed ) and KeyGen (msk, )

KeyGen (msk, ), the following holds good: )=

We shall first show the correctness of the first level conditional ciphertext . Let be the first level ciphertext of message and 

For a random

and

under identity

, with

let us compute

and by properties of bilinear pairing . .

  

Compute Compute



Since all conditions for Decrypt are satisfied, the algorithm returns

For the second level ciphertext

, under

 

Applying Decrypt2(params, Compute



Compute

, parse it as

, which is equal to and secret key

) and



    

Substituting the values of Compute

and by properties of bilinear pairing,

 

Checking the conditions for Decrypt2, Since the above conditions are satisfied, the algorithm outputs

Compute which is equal to

5.2 Security Proof The proposed PI safe CPRE scheme is IND-CPRE-CCA secure in the random oracle model. This follows from Theorem 1. Theorem 1 : The proposed CPRE scheme is safe in random pracle model, assuming the CBDH problem is intractable in group . Formally, if there exists a type 1 adversary who queries at most random oracle

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi queries to : , and breaks the - IND-CPRE-CCA of our scheme, then there exists an algorithm which can break the CBDH assumption in with advantage in time . Proof :If there exists a -time adversary , that can break the IND-CPRE-CCA security of our scheme with advantage , then we demonstrate how to construct an algorithm that can break the ( )-CBDH assumptin in . Suppose is given as input a challenge tuple ( ), with unknown . The goal of is to output . Algorithm first gives ( game with adversary in the following manner :•

) to

. Next,

acts as a challenger and plays the

Oracle Queries :- simulates the oracle queries as follows :On receipt of a new query for



and store the tuple in the list.

, pick a random

On receipt of a new query for and , and return , if available, else, pick a random

 store the tuple

in the

 , If the

set



list for tuple and

. Challenger looks up the

, corresponding to

. Else select

list for tuple . in

list for tuple and store the tuple

list and return .

available return and return

looks up the and set

and store the tuple

queries

If available return



list and output

, output

, and store the tuple

On receipt of a new query for , , , looks up the and return , if available, else, pick a random

set Else if list and output

. if

list. Else, output

queries

, corresponding to

. Challenger looks up the . Else select

and store the tuple

submits the query

 list for tuple

If available return

tuple

list for tuple

If in

list

. Challenger looks up the . Else select

and store the

list and return

The simulation proceeds as follows :•

Setup phase :  Set

where

and challenger doesn't know

 total number of allowed queries.



 length of the message.  Challenger provides the public parameters params( . •

Phase 1: 

makes the following queries in phase 1.

KeyGen. o submits an identity

. If

, abort the simulation, else :-

) to the adversary

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi



Retrieve

Output list.

from the

list corresponding to

as secret key for

and compute

in K

and store the tuple

ReKeyGen. o submits the re-encryption key query for the tuple

. If

ReKeyGen algorithm from the scheme and output else if o

Select

o o o

Retrieve Set Select

Compute

Output

from the

, run the actual corresponding to

:list corresponding to

and .

and

store

the

tuple

in the R list.  Re-Encrypt. o submits the first level conditional ciphertext

, re-encryption key

, identities

and condition . o

Challenger retrieves the re-encryption key corresponding to identities from the identity

list and computes the second level re-encrypted ciphertext as

 Decrypt1. Adversary decryption. o Parse

(

) andreturns to the adversary

submits the first level conditional ciphertext

and condition for

from K list.

, retrieve

Compute Decrypt1(

Return

o o

Else, if : Check the validity of input ciphertext

Retrieve

Compute

o o o

Retrieve from list corresponding to . Compute Return as the plaintext corresponding to

as the plaintext corresponding to

corresponding to

under the

 Decrypt2. o Parse

and condition

and .

. ,

from the

list.

................................................... (1)

and to

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi

Search tuples

     If yes, return •

, , , , , such that following conditions hold good :-

, otherwise return

Challenge.  After the completion of Phase 1, outputs two equal length messages condition . Challenger selects an identity and responds as follows : Retrieve the tuple from the K list.  Compute    Select a , Compute and into Select a



and a

store

the

tuple

list.

,Compute

Return

 •

and store the tuple

into list. as the challenge ciphertext.

Phase 2. Adversary continues to issue queries as in phase 1, with the restrictions prescribed in the INDCPRE-CCA game and that prevent trivial decryption of challenge ciphertext. Challenger responds to all other queries as in phase 1.

• Guess. After the completion of phase 2, adversary 

picks a random tuple CBDHP. Here

from

returns a guess list and output

to the challenger. as output to

 

6. Applications Conditional Proxy Re-encryption is a very interesting and fast evolving cryptographic technique and has numerous applications like Secure email forwarding, Spam filtering and several cloud related applications like Distribution Rights Management and Role Based Access Control in networked file storage on cloud.

7. Conclusion In this work, we introduced a new construction of efficient and collusion resistant CPRE scheme, that is CCA secure in the random oracle model and further enhances the notion of security of an ID based PRE scheme by introduction of another security property called the 'Proxy Impersonation' resistance. It would indeed be interesting to find efficient construction of an ID based CPRE scheme that also provides security against Transferability attacks and further to develop an ID based CPRE scheme secure in the standard model.

References Ateniese, M. G. (2007). Identity-Based Proxy Re-encryption. ACNS, 2007, (pp. 288-306). B Libert, D. V. (2008). Tracing malicious proxies in proxy re-encryption. Pairing-Based Cryptography--Pairing 2008 (pp. 332353). Springer. Boneh, D. a. (2011). Efficient selective identity-based encryption without random oracles. Journal of cryptology , 659-693. Boneh, D. a. (2001). Identity-based encryption from the Weil pairing. Advances in Cryptology—CRYPTO 2001, (pp. 213-229).

Dheeraj Gandhi, Pandu Rangan , Sharmila Deva Selvi Chu, C.-K. a.-G. (2007). Identity-based proxy re-encryption without random oracles. In Information Security (pp. 189-202). Springer. G. Ateniese, K. F. (2006). Improved proxy re-encryption schemes with applications to secure distributed storage. ACM Transactions on Information and System Security (TISSEC), Vol.9, No.1, (pp. 1-30). G. Ateniese, K. F. (2005). Improved proxy re-encryption schemes with applications to secure distributed storage. Proc. of NDSS 2005, San Diego, California, (pp. 29-43). Hohenberger, R. C. (2007). Chosen-ciphertext secure proxy re-encryption. Proc. of ACM CCS 2007, ACM Press,Alexandria, VA, USA, (pp. 185-194). J. Weng, M. C. (2010). CCA-secure unidirectional proxy re-encryption in the adaptive corruption model without random oracles. Science China: Information Science, Vol.53, No.3, (pp. 593-606). J. Weng, Y. Y. (2009). Efficient conditional proxy re-encryption with chosen-ciphertext security. Proc. of ISC’09, SpringerVerlag, LNCS 5735, Pisa, Italy, (pp. 151-166). J. Weng, Y. Y. (2009). Efficient conditional proxy re-encryption with chosen-ciphertext security. Proc. of ISC’09, SpringerVerlag, LNCS 5735, Pisa, Italy, (pp. 151-166). Jian Weng, R. H.-K. (2009). Conditional proxy re-encryption secure against chosen-ciphertext attack. ASIACCS 2009, (pp. 322-332). Jun Shao, G. W. (2011). Identity-Based Conditional Proxy Re-Encryption. In Proceedings of IEEE International Conference on Communications 2011 (pp. 1-5). M. Blaze, G. B. (1988). Divertible protocols and atomic proxy cryptography. Proc. of Eurocrypt’98, Springer-Verlag, LNCS 1403, (pp. 127-144). Matsuo, T. (2007). Proxy Re-encryption Systems for Identity-based Encryption. IACR Cryptology ePrint Archive 2007, (p. 361). Okamoto, M. M. (1997). Proxy cryptosystems: Delegation of the power to decrypt ciphertexts. Electronics Communications and Computer Science, Vol.E80-A, No.1, (pp. 54-63). R.H. Deng, J. W. (2008). Chosen-cipertext secure proxy re-encryption without pairings. Proc. of CANS’08, Springer-Verlag, LNCS 5339, Hong Kong, China, (pp. 1-17). Verged, B. L. (2008). Unidirectional chosen-ciphertext secure proxy re-encryption. Proc. of PKC’08, Springer-Verlag, LNCS 4929, Barcelona, Spain, (pp. 360-379). Vivek, S. a. (2011). Conditional Proxy Re-Encryption - A More Efficient Construction. Advances in Network Security and Applications}, volume=196, (pp. 502-512). Woo Kwon Koo, J. Y. (2012). Collusion-Resistant Identity-Based Proxy Re-encryption.

Clearing the Air in Cloud Computing: Advancing the Development of a Global Legal Framework Virginia Greiman and Barry Unger Boston University, Boston, USA ggreiman@bu.edu unger@bu.edu

Abstract: Cloud Computing Management has introduced a whole new meaning to “globalism.” Control over national security, criminal conduct, medical records, trade, intellectual property, privacy and a host of other important rights and responsibilities is governed by a paradigm that is conducted in the internet “cloud.” Cloud computing depends upon a level of inter-connectedness that crosses legal borders, yet has no boundaries. Cloud-computing regulations in one jurisdiction may have no application in another jurisdiction. However, important laws such as the U.S. Patriot Act, the General Agreement on Trade and Services (GATS), and the European Union privacy laws are applicable regardless of the location of the cloud service provider. A global legal framework that balances national and private interests would enhance confidence and improve legal certainty in the global electronic marketplace. This article examines the existing legal and ethical concepts and doctrines for cyber law and organizes a conceptual legal structure for cloud computing management. Based on empirical research, international legal theory, and an analysis of international and national legal regimes and case decisions, this article explores the advantages and the challenges of establishing a global legal framework for cloud computing that advances cooperation and innovation in the cloud, while protecting the rights of the communities of users on earth. Keywords: Cloud computing, cyber law, cloud privacy, cyber security, cybercrime

1. Introduction With more and more sensitive commercial and personal data being stored on the cloud, regulators and authorities around the world have responded to concerns about the security of cloud computing by introducing new laws, regulations and compliance requirements which attempt to mitigate the perceived security and data privacy risks associated with the use of cloud computing. Businesses dispute that regulation is necessary, while law enforcement agencies highlight the challenges for investigating and securing electronic evidence when the data is stored in the cloud. The stringency of cloud security requirements has led to some organizations shunning the adoption of cloud computing solutions, citing the web of legal and regulatory requirements and the costs associated with ensuring compliance as a prohibitive factor. The goal of this research is to explore the jurisdictional space in the cloud and how legal frameworks can be used to provide a clearer understanding of the rights and obligations of the users of that space. This paper will provide a clearly understood conceptual framework for categorizing, delineating, and benchmarking cloud computing related governmental, corporate, and individual activity and the relevant national and governmental legal provisions and constraints (both civil and criminal) and private company and individual rights of ownership and privacy so that base level areas of agreement within the mainstream world community can be identified, made a matter of international cooperation and removed from jurisdictional dispute and corporate planning uncertainty; and provide a more transparent understanding of the remaining areas of jurisdictional differences, and areas that are best addressed in private contracts among private companies, between companies and their customers, and between companies and the nations in which they do business. Rather than incremental reform that seeks to limit the private user’s control of the cloud space, or the international system's control of the national interest, this article argues for the private and public, and the international and local (i.e. the multinational provider and the individual user) to come together to create a hybrid framework that best serves the needs of all.

2. What is cloud computing and its’ security implications? Cloud computing has been broadly defined as an infrastructural paradigm shift that is sweeping across the enterprise IT world (Geelan 2009). Others define “cloud” as an evolving paradigm that is a matter of degree not just a complete shift. The cloud is actually physical servers within some jurisdiction. The commonly referenced cloud model as defined by the National Institute of Standards and Technology (NIST) is composed of five essential characteristics (on-demand, self-service, broad network access, resource pooling, rapid elasticity, measured service), three service models (software as a service, platform as a service, infrastructure

Virginia Greiman and Barry Unger as a service), and four deployment models (private cloud, community cloud, public cloud and hybrid cloud). The NIST definition characterizes important aspects of cloud computing and is intended to serve as a means for broad comparisons of cloud services and deployment strategies, and to provide a baseline for discussion from what is cloud computing to how to best use cloud computing. The broad definition of cloud computing raises global concerns ranging from governmental interests in national security, to private industry’s interest in economic competitiveness, to the individual’s right to privacy in the data secured in the cloud. Cloud computing management requires an understanding of the legal regime in which it exists to prevent the growing threat of cyber security and privacy breaches in a space that has not been clearly identified. The economic aspects of the Internet must be balanced against social policy and the rights of the individual to protect goodwill, reputation and most important confidentiality. Cloud computing and its privacy and security implications are at the forefront of news media debate around the world. In developing countries, civil society has advocated for strong data protection laws and heightened enforcement of criminal laws contending that only regulators from developed countries are discussing its privacy and security policy. To address these concerns, cloud computing security risks have been raised by many governmental organizations including the Federal Trade Commission, The Council of Europe, and the Organization of Economic Cooperation and Development (OECD), yet no universal solution has been adopted. Some have contended that cloud computing is a new spin on old problems. In reality, cloud has evolved over time and thus the question has arisen whether today’s legal and regulatory frameworks are appropriate. Importantly, because technology is advancing at such a rapid pace, the law is continuously struggling to keep up. Moreover, the alleged obsolescence of legal rules in computers and the Internet among other technologically advanced fields is well recognized in legal scholarship (Moses 2007; Downing 2005). The literature reflects many challenges to operating in the cloud and though many of these challenges have existed for decades the scale of the damage that can be done has dramatically increased. A few of the most pressing challenges are: (1) Severely underdeveloped legal frameworks to deal with cloud management; (2) ambiguity as to what state, federal and international civil and criminal laws are relevant to the cloud; and (3) the absence of a treaty framework to address the broad potential reach of cloud computing.

3. National and international data protection regimes Data that is stored in a cloud may move across multiple jurisdictions depending on the scale of the cloud service provider and each jurisdiction may have its own set of rules or unique requirements. Cloud users often have limited visibility over where their data is at any point in time which would greatly impact their ability to ensure continuing regulatory compliance and to manage any non-compliance risks. A lack of consistency in privacy laws across jurisdictions makes monitoring compliance with regulatory requirements and assessing risk of non-compliance difficult and expensive for the cloud user. Governments have recognized the importance of privacy and have legislated on this issue. For example, in Europe, the model for privacy of personal information is principally covered by two directives – 95/46/EC on personal data processing, and 2002/58/EC on privacy of electronic communications. These directives provide a common approach however laws vary in detail from country to country. Moreover, the European Convention on Human Rights guarantees a right to privacy under Article 8 and has been adopted by many countries including the UK. Another model is Singapore, where the government has chosen to adopt a "cloud friendly" policy as seen by the Singapore government's own adoption of cloud computing for government services and a light-handed approach in terms of legislating the adoption of cloud computing. The Singapore government has also recently passed the Personal Data Protection Act 2012 (PDPA) which will be a boost to Singapore's ambition of becoming a data center hub for the region. In comparison to other countries, the US does not have a comprehensive overarching national data privacy regime and has taken a sector specific approach only when required by specific industries or circumstances. Data security protection in the United States emanates primarily from state legislation. For instance, state data breach notification laws, starting with California’s milestone SB 1386, marked a turning point in corporate data protection requirements. Now as many as 46 states, DC, Puerto Rico, and the U.S. Virgin Islands have data breach notification statutes for breaches of sensitive information (NCSL 2012). One of the most rigid laws in the United States is the 2010 Massachusetts data protection law (201 CMR 17.00). The law goes far beyond previous state mandates, in requiring companies to have comprehensive information security programs in

Virginia Greiman and Barry Unger place. The regulations provide that businesses must “take reasonable steps to select and retain” third-party service providers. Though not clarified by the regulators, this could be read to impose an audit or review requirement before a business can use a particular cloud provider. While a fundamental legal right to data protection is recognized both in national laws and in certain regional legal doctrines, on the international level there is no binding data protection law. Instead, treaties such as the Universal Declaration of Human Rights of 1948 (UDHR) and the International Covenant on Civil and Political Rights of 1966 (ICCP) have been relied upon to fill this gap. The International Law Commission (ILC) established in 1948 under a United Nations resolution is charged with promoting the development and harmonization of international law and in a report entitled, Protection of Personal Data in Transborder Flow of Information, they revealed a number of core principles of data protection that unfortunately has not won broad recognition among member States. Microsoft in its 2010 Cloud Computing Advancement Act (CCAA) called for the reconciliation of the conflict of law issues in data protection through a multilateral framework by treaty or similar instrument. Researchers and scholars have also urged passage of universal protection though have not been successful in advancing statutory schemes. The introduction of a treaty to resolve conflicts between nations’ civil laws has been used effectively to harmonize contract law through the Convention on the International Sale Goods (CISG), and to harmonize the enforcement of IP protection through the Trade Related Aspects of Intellectual Property Rights (TRIPS Agreement). However, an international treaty to harmonize cloud-computing regulations would pose unique challenges that must be balanced against the need of the users if the protections were excessive and deterred cloud providers from serving certain areas because of the expense and difficulty in moving data freely between platforms.

4. International standards Notably, despite efforts by the technical community, national and inter-governmental organizations, and international standards organizations there has been little movement on legislation for cloud computing or uniform standards. Some of the more significant efforts in developing international standards have emanated from the U.S. Department of Homeland Security, the U.S. Department of Commerce, the European Commission, and several prominent organizations, including the National Institute for Standards and Technology (NIST), IEEE, the Cloud Security Alliance (CSA), the Organization for Economic Cooperation and Development (OECD) and the International Telecommunications Union (ITU). These efforts are important because they can serve as models for treaties, future national legislation, international agreements and private contracting. Standards have proven to be a successful starting point for major initiatives and should be reviewed and prioritized by all nations interested in finding agreement and developing a framework for harmonization of the law, clarifying the role of the provider and required technical capabilities, and reducing conflict among countries, individuals, and public and private organizations interested in advancing the economic and social benefits of cloud computing.

5. International extraterritorial jurisdiction In the absence of an international treaty that guarantees universal protection of data in the cloud, a key solution is for national regimes to establish extraterritorial jurisdiction under international law. This would mean that a state’s jurisdiction over parties is extended to those acts that are commenced in another state’s territory but produce harmful consequences in the territory of the party extending jurisdiction. This is commonly known as the “effects doctrine.” Thus, if a cloud provider had stored the data outside of the jurisdiction of the victim, the state would have jurisdiction over the foreign party based on “reasonableness” and “interest balancing” standards. However, as legal scholars have noted, if these standards fail, that leaves traditional international law rules that mark the outer limits of a State’s ability to adjudicate and enforce its own laws for conduct that occurs outside its state borders (Bederman 2010, p. 185). In comparison, Professor Jack Goldsmith has taken the position that traditional international legal tools are fully capable of handling all jurisdictional problems in cyberspace (Goldsmith 1998).

6. International cloud policy and registration center Another, possible approach for global cloud management is the development of an international cloud policy and registration center (ICPRC). The organization’s authority would derive from its member countries and could be structured similar to the World Trade Organization’s, Internet Corporation for Assigned Names and

Virginia Greiman and Barry Unger Numbers (ICANN). It could operate as a private organization or an inter-governmental organization. Cloud Management Service Providers (CCSPs) could register with the organization and the ICPRC would be charged with overseeing the coordination of the cloud system. Its role could be expanded or reduced from the role of ICANN depending on the policy interests of the participating countries. The organization could study cloud protocol and technical aspects of cloud management and could define policies for how cloud space is allocated and how to manage overlapping cloud jurisdiction. A Board of Directors would be responsible for governance of the organization and would vote on all policy recommendations of its membership. Depending on the interests of its members, the organization could serve as an international regulatory and supervisory body for the cloud system, or it could take on a role similar to the OECD and could serve as a global standards organization and provide guidance on harmonization of the laws. There are many examples of successful international supervisory bodies including the Basel Committee on Banking Supervision, the Financial Action Task Force on Money Laundering, the International Organization of Securities Commissions and the call by European leaders for an international financial regulatory body to ensure “cross-border supervision of financial institutions. All of these organizations and many others could serve as models for the development of an international cloud policy and registration center that would encourage innovation in the cloud while at the same time protecting the privacy of the individuals staking their reputations on the reliability of this evolving technological innovation.

7. The cyber cloud One interesting approach offered in the cloud scholarship to address the problem of conflict in data protection laws is the establishment of an international system for cloud-computing and internet regulation that is modeled on the United Nations Law of the Sea (UNCLOS) (Narayanan 2012). UNCLOS is a treaty that addresses the problem of disputes over sea territory most recently raised in the South China Sea disputes. It was established to delineate clear boundary rights of countries near coasts and it dictates that exploitation of the seabed and ocean floor shall be carried out for the benefit of mankind as a whole. Unlike UNCLOS, the extensions would not be delineated by physical boundaries, but rather would be determined by the subject matter of the data at issue (Narayanan 2012). For example, data protection laws would be extended extraterritorially when the data was highly sensitive or particular to the country. The Law of the Sea has also been referenced in the criminal context by proponents of universal jurisdiction for cybercrime drawing analogies to piracy as a method of justification (Cade 2012; Gable 2010).

8. Global Harmonization of Criminal Jurisdiction in the Cloud Users of the cloud not only need data privacy protection, but also protection from the criminals that roam the places where their data resides. This requires an examination of the existing framework for cybercrime. One of the most important efforts at harmonization of national laws in the cyber law regime is the enactment of the first international treaty on internet crimes, The European Council Convention on Cybercrime. Of the fiftyone signatories that have signed the Convention, 39 have ratified the Convention. Four of the signatories are non-member States of the Council of Europe including Canada, Japan, South Africa and the United States. To understand the benefits and limitations of this important Treaty as it applies to the cloud, it is essential to understand the process of treaty-making and enforcement in the United States. Of great significance in the observance of an international agreement is how the treaty will be interpreted as treaties may be more vague, ambiguous, and otherwise difficult to interpret than other kinds of legal authority (Bederman 2010). Also of concern is the notion that treaties are really soft law and are unenforceable because they lack penalties and other mechanisms for punishing States and its actors for misconduct (Goldsmith and Posner 2005; Shackelford 2009). The Cybercrime Treaty has received criticism from the outset. Some arguing that its provisions are too weak, while others claim it raises constitutional privacy concerns under the Fourth Amendment (Lessig 1995). Others have praised the law as providing the best legal framework for the international community (Crook 2008), and for recognizing the need for a mechanism allowing law enforcement to investigate offenses and get evidence efficiently, while respecting each nation’s sovereignty and constitutional and human rights (Rustad 2009). Though existing instruments like the Convention on Cybercrime remain very much valid as it relates to “cloud crime,” there are particular jurisdictional issues that arise on data protection. For example, if law enforcement in country A needs to access data in country B is this legal under the Convention on Cybercrime? A preliminary

Virginia Greiman and Barry Unger review of the Convention reveals a very limited entry point to data access without going through the mutual legal assistance process. The second scenario is that you have access to data secured in Country A, but it is owned by a Country B citizen. The law provides a lower level of protection in this instance. The third scenario is that law enforcement of country A asks service provider of country A to provide information to data stored in country B. In the Belgium Yahoo case, the concept of “electronic communication service provider” was interpreted very broadly requiring detailed disclosure by the provider of the personal data of email users (de Hert and Kopcheva 2011). As discussed herein, the Convention on Cybercrime proves an important starting point in developing a criminal code for the cloud, however, due to the scale of the cloud, to be effective, the treaty must clearly define the cloud space and address the complex jurisdictional issues as well as provide stiffer penalties in the event of an intrusion to not only the cloud providers, but the billions of users of the cloud.

9. Extraterritorial application of criminal law in the cloud Since statutes are subject to constitutional challenges and review by the U.S. Supreme Court and the lower courts when there is a dispute, it is important that Congress gets these statutes right. In order to establish ethical standards in cyberspace, penal laws must be enacted with as much clarity and specificity as possible, and not rely on vague interpretations in the existing legislation. “With cybercrime laws, perpetrators will be convicted for their explicit acts and not by existing provisions stretched in the interpretation, or by provisions enacted for other purposes covering only incidental or peripheral conduct” (Schjølberg and Hubbard 2005). The complexity of developing a cloud crime regime is best demonstrated by the fact that the U.S. government can charge cyber related crimes under more than forty five different federal statutes. The most prominent of these statutes is the Computer Fraud & Abuse Act (CFAA) which makes it a crime to access a protected computer. Other statutes that must be analyzed include the Wiretap Act, the Electronic Communications Privacy Act, the U.S.A. Patriots Act superseded by the current Foreign Intelligence Surveillance Act (FISA) of 2008. In addition to federal statutes, in 1978, state legislatures began enacting computer crime statutes. Since then, every state has enacted some form of computer-specific legislation (Kleindienst 2009). Prosecution of cybercrimes under state statutes continues to increase (Perry 2006). A review of all federal and state computer crime statutes is required to determine the gaps in the law concerning management of the new cloud regime.

10. Policing the cloud: the cop on the beat Law enforcement on the cloud creates significant challenges for our current model of criminal law. As described by one scholar, “the hierarchical, reactive model we use to control real-world crime is not an effective means of dealing with cybercrime” (Brenner 2005). An emerging area around criminal prosecutions seems to be the lack of case law which fleshes out the various elements of federal and state statutes governing criminal activity involving cloud computing. The need to identify the origin of the attack, the offenders, traffic data, and subscriber information are among the critical evidentiary elements for a successful cloud prosecution. An investigation may require interception, search orders, protection orders, mutual legal assistance and other urgent measures, particularly if it involves the preservation of evidence in other countries. If we are serious about reducing crime in the clouds, uniform international rules need to be developed concerning access to evidence and law enforcement needs better tools. Technology is essential to modern society, however, it also changes the way criminals conduct their activities, and vulnerabilities in our ICT infrastructure are fertile grounds for criminal exploitation. The real challenge for international law is developing enforcement mechanisms that can keep pace with technological changes that are very dynamic (Zagaris 2009). The leading case of U.S. v. Gorshkov (2001), has raised much controversy over the methods utilized to obtain enforcement jurisdiction over foreign defendants. Because evidence of cybercrime must be obtained urgently, it is essential to establish the circumstances and conditions under which transborder searches and seizures are permissible (Brenner and Koops 2004). Broader aspects of cybercrime and forensic training that must be addressed include ways of bridging the gap between national law and international standards, the jurisdictional authority of the courts, enforcement tools, collaboration with the private sector and service providers, mutual legal assistance and the role of competent authorities. Key questions involving cloud investigations include: (1) the ability of law enforcement to access stored communications controlled by a third party such as a service provider or an employer; (2) whether an

Virginia Greiman and Barry Unger interception can include acquisition of stored communications; (3) the definition of electronic storage; (4) the use of surveillance in national security investigations; (5) the admissibility of electronic evidence; (6) expedited preservation of computer data; and (7) the difficulties of cross border searches.

11. International public private partnerships We can’t solve problems by using the same kind of thinking we used when we created them. - Albert Einstein Governments, scholars and professionals worldwide all agree that if we are going to reduce or eliminate illegal activity in the cloud, we must join together the efforts of the public and private sectors to bridge the gap between national policy-making and the operational realities on the ground. Mutual Assistance and public and private sector partnering by organizations around the world is presently involved in protecting against serious intrusions, national threats to security and serious criminal activity. Agencies including the FBI, INTERPOL, and private corporations such as Microsoft, Google, eBay, and American Express have been active in building information sharing alliances. The important question to be analyzed is not whether there should be public private partnerships but what form these partnerships should take. Shawn Henry, former Assistant Director, Cyber Division, FBI, in an address to the Foreign Press Center (FPC) at the U.S. Department of State described the extent of the problem this way, “the unique threats across all networks, both here in the United States and information that we get from “our partners,” is that it’s widespread around the world. To enhance the partnership between cloud computing service providers and the government, the following questions must be explored: (1) Should the role of the service provider be merely data preservation, or is data retention a prerequisite to effective cloud crime enforcement? (2) What are the minimum technical capabilities that service providers should possess to assist the government in gathering data concerning cloud crime activity? (3) Should the United States adopt the rigid standards of the German law which requires the service provider to have specific technical capability to gather data? (4) How proactive should service providers be in investigating potential criminal activity? (5) What is the intent required for a provider to be liable for a criminal activity in the cloud?

12. Summary of cloud legal framework Summarized in Table 1 are the key areas of concern discussed in this paper for future analysis and the benefits the recommended frameworks would provide. Table 1: Summary Recommended Frameworks Multilateral Treaty on Data Protection Recognition of Extraterritorial Jurisdiction International Cloud Policy and Registration Center (ICPRC) Multilateral Treaty on Cloud Crime

International Rules on Evidence and Enforcement International Cloud Standards Global Partnerships

Benefits Guarantees universal protection of data, consistent application across borders, reduces privacy and confidentiality concerns. Extends a State’s jurisdiction over foreign parties who caused harmful consequences based on “reasonableness” and “interest balancing standards.” Provides a forum for the study of cloud protocol and technical aspects of cloud management and could define policies and establish a registration system for how cloud space is allocated and how to better manage data protection and overlapping cloud jurisdiction. Establishes a mechanism for addressing complex jurisdictional issues. For example, where the perpetrator of the crime is in one jurisdiction, the stored data is in another jurisdiction, and the victim is in a third jurisdiction. Permits better control over complex cross border investigations and the circumstances and conditions for investigating and securing electronic evidence and recognition of a subpoena or extradition request. Allows for harmonization of law, resolution of cross border conflicts and clarifies the role of the service provider and the specific technical capabilities they should provide. Advances collaboration between public and private organizations worldwide to protect against serious intrusions, national threats to security and criminal activity in the cloud.

Virginia Greiman and Barry Unger

13. Conclusion The time for a global legal framework for cloud computing has definitely arrived. The present legal and regulatory landscape around cloud computing is in a flux. There are new laws being proposed that could change the responsibilities of both cloud computing users and providers. Innovation is not only required in the cloud, but in the laws that will protect cloud providers and users from unwanted storms and unwelcome intruders. This paper explored and delineated where frameworks are needed and the emerging legal issues. The next step is to develop an international framework based on minimally acceptable standards, and where feasible enshrine these in international treaties, and only from there move to defining what must still be negotiated on an individual country basis or through contractual arrangements. Cloud computing that employs a hybrid, community or public cloud model is today rapidly creating the possibility for amazing innovations and better services to consumers and customers, but also creates new dynamics in the relationship between an organization and its information, thus requiring a well-planned comprehensive framework that will remove uncertainty and foster the advancement of an innovative and prosperous partnership in the cloud.

References Bederman, D.J. (2010) International Law Frameworks, 3rd ed., Foundation Press, New York.

Brenner, S.W. (2005) “Distributed Security: Moving Away from Reactive Law Enforcement”, 9 Int’l J. Comm. L & Pol’y 11.

Brenner, S.W. and Koops, B. (2004) “Approaches to Cybercrime Jurisdiction”, Journal of High Technology Law, 4 J. High Tech. L. 1, 2-4. Cade, N.W. (2012) “Note: An Adaptive Approach for an Evolving Crime: The Case for an International Cyber Court and Penal Code”, 37 Brooklyn J. Int’l L. 1139, pp. 1166-1168. Crook, J.R. (2008) “Contemporary Practice of the United States Relating to International Law: U.S. Views on Norms and Structures for Internet Governance”, The American Society of International Law, American Journal of International Law, 102 A.J.I.L. 648, 650. de Hert, P. & Kopcheva, M. (2011) “International Mutual Legal Assistance in Criminal Law Made Redundant: A Comment on the Belgian Yahoo! Case”, Computer Law and Security Review, Volume 27, Issue 3, June, pp. 291-297. Downing, R. W. (2005) “Shoring up the Weakest Link: What Lawmakers around the World Need to Consider in Developing Comprehensive Laws to Combat Cybercrime”, 43 Colum. J. Transnat’l L. 705, 716-19. Gable, K. A. (2010) “Cyber-Apocalypse Now: Securing The Internet Against Cyberterrorism and Using Universal Jurisdiction as a Deterrent”, 43 Vand. J. Transnat’l L. 57, 60-66. Geelan, J. (2009) Twenty-one experts define cloud computing, January 24, Cloud-Expo, SYS-CON Media, Inc. Goldsmith, J. L. (1998) Against Cyberanarchy, 65 U. Chi. L. Rev. 1199. Goldsmith, J.L. and Posner, E.A. (2005) The Limits of International Law, Oxford University Press, New York. Kleindienst, K., Coughlin, T. M., and Paswuarella, J.K. (2009) “Computer Crimes”, American Criminal Law Review, 46 Am. Crim. L. Rev. 315, 350. Lessig, Lawrence (1995) The Path of Cyberlaw, 104 Yale L.J. 1743, 1743-45. Moses, L.B. (2007) “Recurring Dilemmas: The Law’s Race to Keep up with Technological Change”, 7 U. Ill. J.L. Tech. & Policy 239, 241-243. Narayanan, V. (2012) “Harnessing the Cloud: International Law Implications of Cloud-Computing”, 12 Chi. J. Int’l L. 783, Winter, Chicago Journal of International Law, The University of Chicago, pp. 806-809. (NCSL) National Conference of State Legislators (2012) State Security Breach Notification Laws, August 20, Denver, CO. Perry, S.W., (2006) Bureau of Justice Statistics, DOJ, National Survey of Prosecutors. Rustad, M. (2009) Internet Law in a Nutshell, Thomson Reuters, St. Paul, MN. Schjølberg, S. and Hubbard, A.M. (2005) Harmonizing National Legal Approaches on Cybercrime, International Telecommunications Union, Geneva. Shackelford, S. (2009) “From Nuclear War to Net War: Analogizing Cyber Attacks in International Law”, 27 Berkeley J. Int’l L. 192, 251. U.S. v. Gorshkov, 2001 WL 1024026 at 1 (W.D. WA May 23, 2001). Zagaris, B. (2009) “European Council Develops Cyber Patrols, Internet Investigation Teams and Other Initiatives Against Cyber Crime”, International Enforcement Law Reporter, Cybercrime; Vol. 25, No. 3.

Picture Steganography in the Cloud Era Dan Ophir Afeka, Tel-Aviv Academic College of Engineering, Israel dano@afeka.ac.il

Abstract: Steganography (Zoran D., 2004; Cheddad A., Condell J., Curran K. & McKevitt P., 2009), is a cryptographic method that transfers data in unnoticeable ways. It uses multimedia carriers, such as image, audio and video files. The working assumption is that if the hidden data is visible or discoverable, the point of attack is evident. Therefore, the goal of a good steganographic algorithm is to conceal the existence of the embedded data. Other parameters used for classifying and weighing the strengths of a given steganographic algorithm are robustness (resistance to various image processing methods and compression) and capacity of the hidden data. Another encryption method is using a cloud system. This method decomposes large files into small parts, enabling the transmission of encrypted data from several different sites after which, the data is collected and recomposed. Such decentralization makes it difficult to decipher the significance of the gathered data because visualizing the whole picture based on disparate parts is very challenging. The advantages of using the cloud system for encryption may make steganography redundant. Steganography is a sophisticated method of cryptography, being that it is far more difficult to hide something unnoticeably, than to hide something without such a constraint. The cloud system is a relatively new trend and will probably arouse much interest in the coming years. The possible combination of the two methods will yield better encryption methods but will also curtail certain current uses of steganography. One of the main conclusions raised during the research stage of this work was that the amount and the diversity of the available algorithms for performing steganography is so high that it is virtually impossible to create a system that is capable of clearly and unequivocally dealing and decoding all of the existing steganographic method variations. A small change in a known concealing algorithm is enough to prevent the matching revealing algorithm from extracting the hidden data. Keywords: Encryption, steganography, watermarking, advertising, cloud computing, distributed computing

1. Introduction: Picture Steganography The subject of this work is the analysis of the implications of cloud-computing’s current status on steganography (Hayati P., Potdar V. & Chang E., 2005, more relevant references are given further) that is based on signal and image processing algorithms. The concealed information can be images, text or any type of binary data. This work will concentrate on the use of steganography for several purposes which include: 

Data hiding – images or any other type of data can be concealed in another image, leaving the manipulated image as visually similar as possible to the initial image.



Analyzing and detecting – an image can be analyzed for the existence of hidden data. If such hidden data is found, it can be extracted and saved externally. There are two modes of steganography:



Hiding – in this mode the purpose is to hide the information in the image. The information may be another image, text or any other binary information.



Decoding – the action of interpreting the hidden information.

Figure 1: An example of steganographic manipulation: The image (a) is an overlap of an original image with image (b). In order to see this superimposition, the observer has to look at the picture at distance of about 50 cm (The image is a private acquisition).

Dan Ophir

2. Steganographic applications There are three main purposes for the use of steganography: 

Prohibited communications – communications between several parties whose activities are illegal or shunned by society. Steganography is used by underground organizations.



Watermarking - Steganography can be used as a digital signature, assuring copyrights properties. Digital signature and watermarking – concealing data into an image to track its origins after is passes hands.



Advertising – Steganography can be used to appeal to the subconscious to advertise a commercial product or as a part of a campaign.

3. Historical background The origins of hiding a message go back to antiquity. Herodotus, the early Greek historian, relates a story about Histaeus the ruler of Miletus who wanted to send a message urging a revolt against the Persians. Histaeus shaved the head of his most trusted slave, tattooed the message on the slave's scalp, waited for the slave’s hair to grow back and sent the slave to his friend Aristagorus with the message safely concealed. During WWII, the Germans perfected the micro-dot, a message that was photographed and reduced to the size of a dot on a typewritten page and concealed in an innocuous letter. Other methods including the use of invisible inks and concealing messages in plain sight (see Figure 2) were used throughout modern history (Szczypiorski K. 2003; Kundur D. and Ahsan K. 2003; Petitcolas, FAP; Anderson RJ; Kuhn MG 1999).

Figure 2: A Steganogram from 1945 . A secret message was inserted using Morse code which was camouflaged as innocent looking grass blades along the river (The image is from Zoran D., 2004).

4. Steganography Methods There are literally hundreds of different methods and algorithms used for coding and decoding steganographic messages. In this section, we will give a few examples: 

Writing a message

Writing a message on a homogenous background as seen in Figure 2 

Using a template masking matrix

Dan Ophir Template masking is shown in the Figure 3: (a) is placed on (b) and hides unnecessary letters. The uncovered letters reveal the significant message.

Figure 3: A steganogram: the empty template-matrix (a) visible steganographic text containing the hidden message (b). The hidden message (c) (The image is from Zoran D., 2004). 

Using the less significant bit

Manipulation with less significant bit for one picture and most significant bit for another picture is represented in Figure 4. 

Visual stegonography decoding

Looking at an unseen picture in scaled-resolution (Figure 1a), whereas the picture is hidden in another picture (Figure 1b). The encryption of the hidden image is easily performed, by observing the picture from distance of 50 cm. This technique of seeing a different picture at a distance is analogical to the change of the image resolution. At a greater distance, the neighboring pixels are seen as a one pixel of the smaller image. The advantage of this kind of steganography lays in the simplicity of its decoding that can be done without any computing. The advantage of the double scaled image is exploited by appealing to the subconscious of the potential observer.

5. The Linkage between Steganography and Cloud Architecture Cloud computing (Chappell, D. 2008) decreases the advantages of formal steganography. An image can be divided into many data packets making the original file difficult to decode. The paradigm of steganography is to hide the data unnoticeably, and the paradigm of cloud computing is to manipulate the pieces of data separately and eventually, gather the corresponding output pieces and reintegrate them to the original file. Merging steganography and cloud architecture requires, in the case of decoding, supplying the decoder-key with the encoded data, which is of course unreasonable. By supplying the decoder-key with the data the two “golden rules” of steganography are broken: 

Attaching the decoder-key to the data shows that the apparently harmless data is not what it seems;



Attaching decoder-key to the data breaks the confidentiality of the data, especially if it was encoded with the help of the decoder-key.

Dan Ophir

Figure 4: Steganography and Steganoanalysis (the images are from Zoran D., 2004) using the most and the least significant bits, visualization and explanations of its respective stages: A - The carry, (A0,A1, A2,…,An-1) B=(A0,A1,A2,A4) C - the image to be hidden (C0,C1,,…,Cn-1) D=(Cn-1,A1,A2,A3,….An-1) E=(D0) F=(Cn-4,Cn-3,Cn-2,Cn-1,…,A4,A5,….,An-1) G=(F0,F1,F2,F3) H=A-G I=A-F J – The hidden picture that was extracted from D K – The hidden picture that was extracted from F

6. Distributed computing The distributed computing [13] paradigm is to divide a program, or data, into parts and run those parts separately on copies of available resources. There are approaches that consider distribute computing synonymous to cloud computing.

Dan Ophir The relation of distributed computing to steganography lies in the method of dividing the carrier, with its incorporated message into separate parts (see Figure 5), which are treated as distinct computing-platforms.

Figure 5: (a) Partitioning the picture into a coarse grid focusing on the head (striped rectangle); (b) Fine-grid the striped rectangle (a) is enlarged and refined (The image is a private acquisition). Such a distributed computation (Andrews, G. R. 2000), is composed from two categories of modules: fork and join (Figure 6). The module typed fork performs the analysis. The partition of the portion of an input and some amount computation work to be done is divided into copies of computation-programs, running in parallel on corresponding inputs and giving a respective output. The module type join performs the synthesis of all the outputs received from the origin- vertex splitting from the same fork-vertex.

Figure 6: â&#x20AC;&#x153;Fork-Joinâ&#x20AC;? diagram: task dividing and unification of the results (The image is a private acquisition).. This methodology of parallel computing processes is illustrated in Figure 7. This is an example corresponding to the hierarchical working on the input image of Fig. 5. Some parts of the image are less important and the computation is performed on a coarse grid, whereas the detailed part of the image such as the face and the corresponding computation, is performed on the finer grid. The relation of distributed computing to steganography lies in the fact that the process of encoding and decoding hidden information in the sub-picture is less critical because, in general, the small file pieces of the grid are mostly unreadable and therefore, there is much less need for hiding the encrypted message. Distribution computation neutralizes the possibility of using steganography to appeal to the subconscious of the viewer, since only an unrecognizable part of the image is seen. This makes it is too difficult for the brain to reconstruct the object hidden in the full image.

Dan Ophir The join-vertices aggregate the sub-images into one greater image which is more readable than the small ones. The management of the whole flow of the information coming into the join-vertices is complicated. The module performing the join-reconstruction should know the addresses and the linkages of the processes taking part in the whole distributed computation.

Figure 7: Fork-Join Diagram related to Figure 5, in which the rectangle- vertices represent the rectangles in the Figure 5 (The image is a private acquisition). For example, the SETI (Stanford Project – see reference) project (see [14]) is a kind of distributed computation, in which the members of the project individually supply one piece of the information. The project management performs a superposition of the inputs into a global picture of the celestial map. Each individual member has no general vision on what’s going on. Similarly, by possessing one piece of information of the whole multi image process doesn’t convince the individual of being a part of a steganographic message. Therefore, the hidden information remains hidden. The “Drop Box” service is an additional example of distributed computation enabling an easier way to transmit data by using the shared space in the network. Such algorithms make it more difficult to keep track of suspected material.

7. Conclusions The use of steganography in the era of cloud-computation is a mixed bag. Certain uses of steganography are difficult when using cloud computing such as the subconscious uses when trying to hide an image within an image. However, overall, the level of difficulty in identifying “steganographied” material, and its decoding has increased dramatically when using the packet method of cloud computing. Cloud computing separates and distributes small parts of the original data which impedes the recognition of the hidden information. Messages which are divided into corresponding sub-messages become more difficult to detect. This poses the question whether partitioning the message into very small pieces to be reconnected in the last computation stage is sufficient for making sure that the data is undecipherable. However, the progress of cotemporary decoding due to huge computation speed increases may require parallel progress in the encoding messages. The answer lies in combining methods. The increasing complexity of cloud-computing in addition to classical steganography may be a suitable upgrade in creating a more complex encoding and decoding processes.

Dan Ophir

References Andrews, G. R. (2000), Foundations of Multithreaded, Parallel, and Distributed Programming, Addison–Wesley, ISBN 0201-35752-6. Chappell, D. (2008). "A Short Introduction to Cloud Platforms" (PDF). Retrieved on 2008-08-20. Cheddad A., Condell J., Curran K. & McKevitt P., (2009) Digital image steganography: Survey and analysis of current methods, “Signal Processing” \ Elsevier. Hayati P., Potdar V. & Chang E., (2005) A Survey of Steganographic and Steganalytic Tools for theDigital Forensic Investigator, Curtin University of Technology, Australia. Hideki N., Michiharu N. & Eiji K., (2006), High-performance JPEG steganography using quantization index modulation in DCT domain, Pattern Recognition Letters 27 \ Elsevier, pp. 455-461. Hong L.J., Masaaki F., Yusuke S. & Hitoshi K., (2007) A Data Hiding Method for JPEG 2000 Coded Images Using Modulo Arithmetic, Electronics and Communications in Japan, Part 3, Vol. 90, No. 7. McBride B.T., Peterson G.L. & Gustafson S.C., (2005), A new blind method for detecting novel steganography, Digital Investigation 2 \ Elsevier, pp 50-70. Kundur D. and Ahsan K. (2003). Practical Internet Steganography: Data Hiding in IP. Texas Wksp. Security of Information Systems Petitcolas, FAP; Anderson RJ; Kuhn MG (1999). nformation Hiding: A survey, Proceedings of the IEEE (special issue) 87 (7): 1062–78. Qingzhong L., Andrew H.S., Bernardete R., Mingzhen W., Zhongxue C. & Jianyun X., (2008), Image complexity and feature mining for steganalysis of least significant bit matching steganography, Information Sciences 178 \ Elsevier, pp. 2136. Stanford project, investigating the cosmos, 189 Bernardo Ave, Suite 100, Mountain View, CA 94043, http://setistars.org/ Szczypiorski K. (2003). Steganography in TCP/IP Networks. State of the Art and a Proposal of a New System - HICCUPS. Institute of Telecommunications Seminar. Zax R. & Adelstein F., (2009), FAUST: Forensic artefacts of uninstalled steganography tools, Digital Investigation 6 \ Elsevier, pp 25-28. Zhang X. & Wang S., (2004), Vulnerability of pixel-value differencing steganography to histogram analysis and modification for enhanced security, Pattern Recognition Letters 25 \ Elsevier, pp. 331-339. Zoran D., (2004), Information Hiding: Steganography & Steganalysis, George Mason University \ Department of Computer Science, USA.

Forensic Readiness for Cloud-Based Distributed Workflows Carsten Rudolph1, Nicolai Kuntze1 and Barbara Endicott-Popovsky2 1 Fraunhofer SIT, Germany 2 University of Washington, USA rudolphc@sit.fraunhofer.de kuntze@sit.fraunhofer.de endicott@uw.edu

Abstract: Distributed workflows in the physical world can be documented by so-called process slips, where each action in the process is assigned to the responsible person and progress or completion of sub-tasks are confirmed using signatures on the process slip. The paper version creates a paper-based audit trail that documents who has done which part of the process and when. In the digital world, electronic process slips have been proposed that use digital signatures to achieve a similar behaviour in distributed service-based processes. This also provides a trail of linked digital signatures to represent the process. When moving such distributed workflows to the cloud (at least partly), steps might be fully automatic or only initiated by the user without any clear control on the execution of the process. Therefore, documenting the user interaction is not sufficient. This paper proposes to extend the idea of electronic process slips by hardware-based security to control the cloud server and to securely document the execution of particular steps in the process. The concept is based on Trusted Platform Modules (TPM) as specified by the Trusted Computing Group (TCG).The result is an electronic audit trail that provides reliable and secure information on the execution of the electronic process that ensures the satisfaction of specific requirements for forensic readiness in distributed workflows including cloud-based services. The composition concept remains as powerful as in the original version of the electronic process slip. Keywords: Forensic-ready processes, cloud service

1. Introduction A Workflow Management Systems (WFMS) is often used to support the automated execution of business processes. Web-Services and cloud-based services provide many opportunities to create highly distributed business processes. A standard for specifying such workflow processes is the Web Services Business Process Execution Language (WSBPEL) (Alves, et al., 2007), or BPEL in short (http://docs.oasisopen.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html). A workflow in this scenario can be defined as a set of interacting cloud-based services that are composed to achieve a common goal in the process. The composition defines which services participate in the process, the order of their interactions and which data is transferred during the process. Compositions are used to automate the coordination between participating ”partners” thereby increasing the efficiency of the whole process. There exist two different types of interaction between single services in a workflow: 

service orchestration (centralized) refers to those workflows, in which one central service receives client requests, makes the required data transformations and invokes other services as required by the workflow.



service choreography (decentralized) refers to the workflows in which there are multiple engines, each executing a part of the workflow at distributed locations. The engines communicate directly with each other (rather than through a central coordinator) to transfer data and control when necessary in an asynchronous manner.

In the centralized system model one or more workflow engines, where each of them is able to interpret the process definition, interact with workflow participants and, where required, invoke the use of IT tools and applications. Centralized systems provide centralized monitoring and auditing, simpler synchronization mechanisms and overall design simplicity. Thus, audit trails can be created by the central entity. However, the centralized approach suffers from drawbacks (Yan, et al., 2006). Centralized service architectures are designed as classic client/server applications in which the server provides most of the functionality of the system while the computational potential at the client side is barely used. This results in heavyweight systems when many parallel instances need to be executed. A decentralized workflow has various advantages. Each partner is aware of the current state of the workflow and its involvement in the workflow. The decentralized WFMS should be able to distribute the tasks to the appropriate partners and ensure specified task dependencies by sending the tasks to the predetermined partners only when all prerequisite conditions are satisfied. However,

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky decentralized execution of inter-organizational workflows raises a number of security issues including integrity, non-repudiation and confidentiality. A decentralized workflow execution mechanism that ensures the correctness of the control flow and the satisfaction of main security requirements was defined by Rudolph et al. (Rudolph, et al., 2009). This composition approach ensures that each web service can access only the information needed for the correct execution of the invoked operations and it provides an execution proof of the fulfilled assignments. Integrity and authenticity of the execution proof can be verified by a central verification unit. The main idea is that messages exchanged between the web services participating in the composition are based on a particular data structure, called process slip. This electronic process slip was defined for Web Service workflows, where services can be running in controlled environments, e.g. accessible through a single enterprise service bus. In the case of cloud-based workflows, this level of control is no longer possible. Nevertheless, enterprise service buses with automatic and dynamic service orchestration have not become the backbone of enterprise IT, while cloud-based services are heavily used and are integrated into workflows (Anstett, et al., 2009) (Wei, et al., 2010). This paper proposes to use hardware-based security in the cloud as well as on the client side. The process slip is extended to combine hardware-based crypto identifying a device with user-based crypto that authenticates a user and encrypts data that is only used in other steps of the process. The functionality of the hardware security module is the functionality of the Trusted Platform Module TPM as defined by the Trusted Computing Group TCG (http://www.trustedcomputinggroup.org/). Signatures created on the digital process slip document the execution of steps in the workflow and combine this information with reliable information on the status of the particular cloud server or other device used in the process. The process slip including these signatures creates a resilient and reliable digital trail of the workflow execution that can be built to satisfy strong requirements on forensic readiness.

2. Related work The WS-* family of protocols cover a variety of middleware related topics like messaging (OASIS, 2004), transactions, and security (OASIS, 2004). Best practice exists since many years for the providing patterns and tools for the application of security and trust mechanisms for these protocols (Tatsubori, et al., 2004). In complex environments process definition languages like BPEL (Alves, et al., 2007) are used together with additional concepts as defined for example by the Enterprise Service Bus (Chappell, 2004). This allows for flexible solutions with respect to integration of new services and changes in the processes supported by the system design. However, all these techniques concentrate on the protection of single services, although the need for better security and auditability for distributed workflows is widely acknowledged (see e.g. (Patel 2013)). Only very few approaches have been proposed that base security on the process rather than on single steps in the process (Rudolph, et al. 2009). More recently, the issue of securing distributed processes and workflows has received more attention and approaches have been proposed that take up a process view on security. Hadjichristofi and Fugini (Hadjichristofi, et al. 210) combine a service-based view with repudiation schemes in order to support a secure orchestration using the example of scientific workflows. Baouab develops the initial idea of a secure process slip to a more complete approach for governance and auditing of decentralized workflows (Baouab 2013). This approach also supports event-driven runtime verification for decentralized and inter-organizational workflows (Baouab, et al. 2011). Lim et al. have developed an approach for documenting business processes via workflow signatures (Lim, et al. 2012). Some work also exists for the topic of improving the process-oriented property view in business process (Sheng, 2011). All these approaches are based on the assumption that the different platforms involved in a distributed process are protected from malicious influences and act in accordance with the service specifications. Further, security remains on the service level and security mechanisms do not provide any information on the underlying platform. Existing work on creating trusted cloud platform or on reporting the status of cloud servers and platforms exist but there is no integration with the process models and the workflow view. Examples of work on trust and security for cloud scenarios include the following. Ko et al. review existing technologies and discuss the different issues for accountability and auditability in cloud scenarios (Ko, et al. 2011). The Certicloud approach (Bertholon, et al. 2011) develops a solution to ensure hardware-based security for cloud platforms. With Certicloud, cloud servers can provide reliable information on their identity and their current status with respect to software running and configuration parameters.

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky However, Certicloud is targeted at infrastructure as a service (IaaS) and therefore concentrates on securing infrastructure components. There is no integration into a process view and the technology is currently not integrated into distributed workflows. However, the Certicloud approach can build a basis for secure, trusted and auditable cloud-based distributed workflows. Another example of a component that can support secure distributed workflows by sealing (i.e. securely storing and binding) data to particular cloud platforms was developed for the Eucalyptos cloud (Santos, et al. 2012).

3. Requirements for Forensic Readiness for Cloud-Based Distributed Workflows A reliable documentation of distributed workflows needs to show that all security policies for the process have been satisfied and that no manipulations of the documentation have occurred. These policies are not always technical policies but can also involve user actions or responsibilities of companies or individuals responsible for segments of the process. Therefore, the documentation record for a distributed workflow needs to include the following information. 

Evidence showing the satisfaction of process security requirements



Evidence documenting the status and identity of computing platforms involved.



Evidence showing the actual execution order and steps of the workflow and deviations occurred.

3.1 Security requirements for distributed workflows A wide variety of security requirements can occur for workflows and even more for distributed workflows. Usually these requirements are either identified based on static trust relations on the organizational or IT infrastructure level, or for single services within the workflow. However, in particular in the case of decentralized workflows it is not trivial to derive the correct combination of security mechanisms that satisfies all requirements of the different partners involved in the workflow. Therefore, it is necessary to precisely define security requirements on the level of the workflow itself. A full formal framework for the specification of security requirements for workflows can be based on a generic framework for security requirements (Gürgens, et al., 2005) which is used in the EU FP7 projects SERENITY and ASSERT4SOA (http://www.assert4soa.eu/) for the formal specification of workflow security requirements in the context of security services and for the specification of security properties of single services. Workflow security requirements can be, for example, concerned with the authenticity of an entity performing a particular service in a workflow, integrity or confidentiality of data that is transported between entities involved in the workflow, or the enforcement of particular (distributed) sequences of actions (or services) in the workflow. A combination of security mechanisms for single services as well as for the overall workflow can be required to satisfy all these different security properties. The secure electronic process slip as proposed by Rudolph et al. (Rudolph, et al. 2009) defines a data structure that combines various security mechanisms to enforce and document various process-based security requirements. The following paragraphs briefly revise different classes of security requirements for distributed workflows concentrating on those requirements that can be satisfied by the original electronic process slips. Four classes of properties are distinguished: authenticity, non-repudiation, confidentiality and enforcement of workflow sequences. 3.1.1 Authenticity In general, authenticity of a particular action is satisfied for one element (human partner or automated component) in a workflow if this element can deduce that this action has occurred from its knowledge about the global behaviour of the workflow system (including all partners and also including possible malicious behaviour) and the view of the current behaviour that the action has occurred. Stronger authenticity requirements can restrict the occurrence of the action to be authentic to the current instance of the workflow or even to a particular phase of the current instance of the workflow. 3.1.2 Non-repudiation Non-repudiation is strongly related to authenticity but in addition requires that one partner can prove to other partners that a particular action or sequence of actions has occurred. Thus, the partner executing those actions cannot repudiate this execution. In a decentralized workflows partner have to collect evidence and might even have to rely on other partners for the enforcement of non-repudiation requirements. If security is

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky based on such a mutual trust between partners in a distributed workflow, overall trust requirements have to be considered in the design of the workflow and the security policies. 3.1.3 Confidentiality A large variety of information can be required to be confidential in a workflow. This information can include data transferred or computed within the workflow, identity information of the partners involved, order of execution of workflow actions, parts of the workflow specification, security policies, cryptographic keys, and audit trails or evidence collected for non-repudiation. Formalization of confidentiality is often based on noninterference and information flow properties formalizing the requirement that the occurrence of some actions cannot interfere with the behaviour of those partners not allowed to gain knowledge about these actions. Non-interference properties can be used to formalize confidentiality of workflow data towards external attackers. However, non-interference properties are not suitable for properties within a workflow where a partner might know about the occurrence of some or all workflow actions but might not be allowed to know the values of all parameters in the workflow data. The notion of parameter confidentiality (Gürgens, et al., 2005) is more suitable to formalize confidentiality requirements for workflows. 3.1.4 Enforcement of sequences Security of a workflow very often depends on the order of actions in the workflow. A particular action can depend on a number of other actions occurred before or a particular binding phase can only be finished if all goals of the involved partners are satisfied. The lack of central control in distributed workflows increases the complexity of the problem. It might be even impossible to achieve the most efficient decentralization if control has to be given to an un-trusted entity and if it is necessary to actually prevent a deviation from the workflow specification. Additional communication between partners might be required to release confidential data that is necessary to continue with the workflow. A realization is easier if it is satisfactory to detect violations of the requirement after they have occurred. Many of these different combinations of security requirements can be realized by the electronic process slip.

3.2 Requirements for forensic readiness The original electronic process slip uses formal representations of security requirements and cryptography (digital signature, asymmetric and symmetric encryption) to enforce security policies for distributed workflows and to document workflow instances. However, this process slip can only be considered secure if all computing platforms (e.g. cloud platforms) are secure and trusted and all humans involved act in compliance with regulations. Technical enforcement cannot only be based on security mechanisms on the level of service and cloud interfaces. For forensic readiness, the information in the process slip needs to satisfy the requirements for secure digital evidence. Currently, there is no generally accepted notion for secure evidence. However, one possible definition for \emph{Secure Digital Evidence} was proposed by Rudolph et al. (Rudolph, et al. 2012). It states that a data record can be considered secure if it was created authentically by a device for which the following holds: 

The device is physically protected to ensure at least tamper-evidence.



The data record is securely bound to the identity and status of the device (including running software and configuration) and to all other relevant parameters (such as time, temperature, location, users involved, etc.)



The data record has not been changed after creation.

These requirements can also be applied to cloud scenarios.

4. The secure process slip Traditional paper based workflows use signed reports carrying the relevant information. These reports are produced wherever it is required to grant the owner the access to certain services. The original version of the electronic process slip applies this token based concept to digital business processes by establishing a trustworthy data structure as a technical basis for the workflow. There is no central authority in a decentralised workflow. Thus, it is required to distribute the control of the workflow among the managers which are involved into the single (sub-)workflows. To link the parts, a control

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky token travels from manager to manager, which is a digital form of a report description of the execution of a workflow process. It transfers data between the managers and stores their digital signatures as proofs for the executed tasks. Moreover, the workflow uses the process slip to provide the additional security benefits to these execution records such as restricted access, integrity, or non-repudiation. The process slip must be generated prior to the initiation of each instance of the workflow. The original process slip structure contains four sub-containers (see Figure 1 Data components of a process slip): Data, Audit Data, Security Policies and Workflow Description.

Figure 1: Data components of a process slip 

Data - This sub-container stores input data transferred between the steps of the process. Therefore, these data can vary from step to step. This leads to a structured container format supporting addressing of the data to certain steps resp. participants of the workflow. According to the needed security level some (or all) data can be encrypted and signed for the corresponding receiver to ensure confidentiality and integrity of the data. Access control to the encrypted data is regulated by using tickets, a Public Key infrastructure (PKI), identity management systems, or pre-shared keys for example. As an alternative to embed the data into the sub-container it is also possible to include indirection to an alternative source providing the data. This could be implemented by using a ticket scheme.



Audit Data - In the audit trail data sub-container each involved manager should write according to the workflow or security definition log data for the recently performed tasks. Depending on the type of service, audit data contains TPM-based digital signatures providing reliable information on the identity and the status (software running, configuration, hardware, etc.) of the server or servers performing the tasks. Each service is only allowed to add process information and documentation data relevant for the service's assignments and their output. The documentation data is encrypted for special recipients like a central verification unit or other services to again fulfill the minimal need to know principle. Additionally, each report entry in this section must be signed by its author to guarantee the integrity of the written information. By adding an authenticated log of each step to the final result it is later on possible to track the process and assign responsibilities. This provides non-repudiation as one main advantage.



Security Policy - Security policies are in a separate data structure which is logically linked to the workflow definition. They specify security boundaries for each step or sequence of steps in the workflow such as the permitted activities that a given partner can apply on a specified data and the execution order of the assignments in the workflow. Each manager has to be able to interpret and enforce these policies accordingly as there is no central authority which grants for it. To ensure that each service will have access only to the policy rules needed to execute the assigned tasks and fill in the information, needed for the next partners, the initiator could encrypt the rules for the corresponding authorized service. By encrypting the rules each step in the workflow is limited in his knowledge to the absolute minimum of knowledge regarding the workflow he is working in. Additional data structures are required to support the creation of audit data in this case.



Workflow Description - The whole process is controlled by a workflow description in the container. This static data structure is not changed during the execution of the workflow and can either be a full description of the workflow or be a reference to a location where the definition is kept. According to the needed security level the partner services may have only access to the description of their own activities. This implies that during the generation of the container some (or all) parts of the workflow description

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky must be encrypted for the corresponding services, so that they are able to invoke the corresponding actions but do not have access to the activities of the other partners. Thus authentication and integrity may be added by XML Signature, XML Security, or WS-Security. Whenever a web service or a cloud service participating in the composition receives a process slip container it performs the following tasks. First, it verifies that the sender is an authorized partner; second, it extracts from the slip the audit data of the sender and its security policy in order to check whether all requirements for its current assignments are met. It decrypts the needed input data. Then it invokes the corresponding operations and fulfils the assignments. When all operations are terminated, it modifies the embedded process slip. Finally, it sends the modified slip container to partner(s) associated with the next control structure(s).

5. Extending the process slip for forensic readiness in cloud processes One approach to extend the process slip is to combine the service-level view of the slip with lower-level security mechanisms, in particular hardware-based security. In the design for a forensic ready process slip we assume that all servers involved are equipped with efficient ways to use a technology that securely identifies the particular server and provides reliable information on the current status of the server including software running, configuration, etc. Trusted Computing as defined by the Trusted Computing Group TCG http://www.trustedcomputinggroup.org/) is a suitable technology that, however, needs to be refined to support the possibly massive parallel use required by cloud servers. TPM chips can store information on the current status in so-called platform configuration registers PCRs and report this information in a reliable way with digital signatures signed inside the chip with a key generated by the chip. Further, TPMs can provide (optional pseudonymous) identities for serves that cannot be manipulated by software attacks. Keys cannot be used on other platforms and usage of encryption keys can be bound to particular values in the PCRs. Thus, encrypted data can only be decrypted on the correct platform and if the platform is in the correct state. Using so-called certified migrateable keys, the transfer of data can be restricted to platforms with known TPM-based identities and controlled by migration authorities. Efficient real hardware-based trusted computing based on current TPM version 1.2 implementations probably requires a cluster of TPM chips supporting one single server. Key distribution can also use the security functions of the TPM. Cloud servers can be identified using the cocalled endorsement key that is unique for a TPM. Then, client-specific or even process-specific attestation identity keys can be generated on the fly and used in the process.

5.1 TPM-based Extensions for the Electronic Process Slip Hardware-based extensions to the process slip are relevant for all four parts of the process slip, namely data, audit data, security policy, and workflow description. ď&#x201A;§

Data - The use of hardware-based security in the data slot can strengthen the enforcement of the protection of data exchanged within the process. Parts of the data can be bound to a particular target platform. Thus, even when malicious entities can get hold of access credentials it is impossible to decrypt confidential data without actually using the TPM on the target platform for the decryption. Also the origin of data can be related to a particular platform when a TPM is used to sign or tick-stamp data in the process. TPM-based tick-stamps deliver relative time information that can also be used to stamp data with real time information when at least one tick value of the current boot cycle has been synchronized with a real-time clock. This information shall then be used in subsequent steps in the workflow to decide on the correct next steps.

ď&#x201A;§

Audit Data - Information in the audit data slot is most relevant for increasing the forensic readiness of the documentation of distributed workflows. Already the original process slips keeps a track of digital signatures that document core steps of the workflow. However, as users responsible for these signatures have no direct control over cloud servers involved in the distributed workflow, these users cannot verify that those steps in the workflow have actually been executed as defined in the workflow description. Time stamps and attestation signatures documenting the status and identity of the platform can enrich the audit track. This information is central to workflow audit information. It clearly shows how the workflow was distributed and which contributing servers were in the correct state. In addition to the actual workflow documentation, audit data also needs to include a documentation of auxiliary protocols such as

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky key distribution protocols, remote attestation for the documentation of the status of the cloud servers, authorisation protocols such as ticket-based protocols for access control (e.g. Kerberos). ď&#x201A;§

Security Policy - The extension of the security policies needs to include all information on interactions with the TPM. It is also linked to the workflow description. In addition to the original process slip, the policy also needs to detail which platforms and servers in the workflow need to be securely identified and where and when reliable time information and status information needs to be included in the audit track.

ď&#x201A;§

Workflow Description - The functional part of the workflow should in principle remain unchanged from the original version. However, when TPMs or other hardware-based security shall be used the workflow description needs to be extended by segments that define the interaction with the TPM. The details depend on how integrated TPM-based parts are with the actual functional segments of the distributed workflow. Also, decisions on provider (e.g. cloud provider) can depend on the security functionality available. Therefore, these steps need to be described as part of the workflow description and need to be closely linked to the security policies.

All-in-all, the integration of data from hardware-based security into the data structures of the process slip is more-or-less straightforward. However, practical realisations require standardized interfaces for TPM access in distributed workflows and adequate efficient implementations available in cloud infrastructures.

6. Using Process Slip Information During and After Workflow Execution Very often, collecting audit data means additional effort and cost. In the case of distributed workflows the situation is slightly different. Of course, mainly the audit data slot only has small uses during the actual execution of the workflow. However, the complete process slip has the potential to improve management and efficiency as well as security at run-time for distributed workflows. Many security requirements are obvious and in particular requirements like non-repudiation or confidentiality need cryptography in the workflow also without a clear process slip. The TPM-enhanced process slip can use the lightweight public key infrastructure and key generation and protection functions of the TPM. Enforcement of security properties or run-time monitoring of service level agreements can be supported by TPM-enhanced electronic process slips. One example is the trust that cloud users need to put into the cloud provider. Various conditions can be fixed in a contract between user and cloud provider. For example, the provider can ensure in the contract that always cloud servers with a particular set of properties is used or that some workflows are restricted to cloud servers located in a particular country. The set of suitable servers can be identified in the contract. However, currently cloud users have no chance to verify that the cloud provider always complies with these conditions. Furthermore, in the case of violations the user has no chance to collect evidence showing these violations. Hardware-based digital signatures using non-migratable TPM keys can be used to build suitable chains of evidence showing compliance. Workflow management systems (WFMS) can make use of process slips at run-time to control the process as well as for documentation and audit trails for compliance. In a centralized workflow with services running within the perimeter of a company network, a WFMS might have sufficient access to all steps of the workflow to collect information for audit trails. In a decentralized workflow this direct access is no longer possible. The WFMS can use the process slip to create audit trails for the complete workflow even when large parts of the workflow are completely moved to the cloud. Depending on the character and type of the actual application and workflow, various other uses of the process slip for auditing and forensic evaluation are possible.

7. Conclusion Distribution of workflow execution offers advantages for the key characteristics of WFMSs availability and information governance due to the removal of the central entity which is in charge for the workflow execution. Hardware-based security mechanisms provide reliable information for the audit trail that increases the forensic readiness of this audit information. A proper notion of digital evidence requires such reliable information and secure identification of the components (e.g. cloud servers) involved in the workflow.

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky The security treatment (e.g. allow access to data or encrypt data for specific partners etc.) is determined by explicitly defined security policies. Thus the selection of proper specification of security policy rules and their systematic generation is crucial for the fulfilment of the specified security requirements. Therefore the generation process of security policies, which will be able to ensure a proper level of security in the common case. Automated processes between machines are increasingly required by the industry to meet the trend towards highly flexible environments. In this machine to machine (M2M) case each entity performs actions in the mandate of the owner. By combining the presented concept with hardware based security concepts signatures stating the integrity of each entity are possible and opens the way for non-repudiation in M2M processes. Until now little effort has been dedicated to the verification of modelled business processes. For example, there is no support to detect possible deadlocks, to detect parts of the process that are not viable or to verify specific security requirements. Using asynchronous product automata (APA) and the simple homomorphism verification tool (SHVT) (Ochsenschläger, et al., 2007) (Ochsenschläger, et al., 2000) (Ochsenschläger, et al., 2000), developed by Fraunhofer SIT, formally specifications, simulations and verification of security policies for process slips and audits are possible for distributed and cloud-based workflows.

References Alonso G. et al. (2004) Web Services: Concepts, Architecture and Applications Springer Verlag. Alves A. et al. (2007) Web Services Business Process Execution Language Version 2.0. Anstett T. et al.(2009) Towards BPEL in the cloud: Exploiting different delivery models for the execution of business processes, World Conferenceon Services. S. 670-677. Baouab, A. (2013). Gouvernance et supervision décentralisée des chorégraphies inter-organisationnelles (Doctoral dissertation, Université de Lorraine). Baouab, A., Perrin, O., & Godart, C. (2011). An event-driven approach for runtime verification of inter-organizational choreographies. In Services Computing (SCC), 2011 IEEE International Conference on (pp. 640-647). IEEE. Bertholon, B., Varrette, S., & Bouvry, P. (2011, July). Certicloud: a novel TPM-based approach to ensure cloud iaas security. In Cloud Computing (CLOUD), 2011 IEEE International Conference on (pp. 121-130). IEEE. Bilal M., Thomas J.P., Mathews T. and Subil A. (2005 ) Fair BPEL Processes Transaction using Non-Repudiation Protocols SCC '05: Proceedings of the 2005 IEEE International Conference on Services Computing. Biskup J. et al. (2007) Towards Secure Execution Orders for CompositeWeb Services Web Services, 2007. ICWS 2007. IEEE International Conference on. - S. 489-496. Chappell D.A.. (2004) Enterprise service bus. Charfi A. and Mezini M. (2005) An aspect-based process container for BPEL. AOMD '05: Proceedings of the 1st workshop on Aspect oriented middleware development. Charfi A. and Mezini M. (2005) Using Aspects for Security Engineering of Web Service Compositions. ICWS. IEEE Computer Society Gürgens S., Ochsenschläger P. and Rudolph C. (2005) Abstractions preserving parameter confidentiality European Symposium On Research in Computer Security (ESORICS 2005). Gürgens S., Ochsenschläger P. and Rudolph C. (2005) On a formal framework for security properties International Computer Standards & Interface Journal (CSI). Hafner M. et al. (2005) Modelling inter-organizational workflow security in a peer-to-peer environment Web Services, Proceedings of ICWS 2005. . Hadjichristofi, G. C., & Fugini, M. (2010). A dynamic web service-based trust and reputation scheme for scientific workflows. In Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services (pp. 56-63). ACM. Hofreiter B. and Huemer C. (2004) Transforming UMM Business Collaboration Models to BPEL OTM Workshops. Laddad R. (2003) AspectJ in Action: Practical Aspect-Oriented Programming. Langworthy D. (2004) WS-BusinessActivity Langworthy D. (2004) WS-Transaction. Lim, H. W., Kerschbaum, F., & Wang, H. (2012). Workflow Signatures for Business Process Compliance. Dependable and Secure Computing, IEEE Transactions on, 9(5), 756-769. Mendling J. und Hafner M. (2005) From Inter-organizational Workflows to Process Execution: Generating BPEL from WSCDL . OTM Workshops. Montagut F. and Molva R., (2007) Enforcing integrity of execution in distributed workflow management systems SCC 2007, 4th IEEE International Conference on Services Computing, Myers M. et al. (1999) X.509 Internet Public Key Infrastructure Online Certificate Status Protocol - OCSP. IETF, OASIS (2004) WS-Reliability OASIS (2004) OASIS WS-Security Version 1.0. Ochsenschläger P., Repp J. and Rieke R. (2000) Abstraction and composition -- a verification method for co-operating systems Journal of Experimental and Theoretical Artificial Intelligence.

Carsten Rudolph, Nicolai Kuntze and Barbara Endicott-Popovsky Ochsenschläger P. und Rieke R. (2007) Abstraction Based Verification of a Parameterised Policy Controlled System Computer Network Security, Fourth International Conference on Mathematical Methods, Models and Architectures for Computer Network Security, MMM-ACNS 2007. Ochsenschläger P., Repp J. and Rieke R. (2000) The SH-Verification Tool Proc. 13th International Florida Artificial Intelligence Research Society Conference (FLAIRS-2000). OMG (2000) Workflow Management Facility Specification, V1.2 Object Management Group Inc.. Patel, J. (2013). Forensic Investigation Life Cycle (FILC) using 6 ‘R’Policy for Digital Evidence Collection and Legal Prosecution. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Piechalski J. and Schmidt A.-U. (2006) Authorised Translations of Electronic Documents Proceedings of the ISSA 2006 From Insight to Foresight Conference / Venter H. S. [et al.]. Information Security South Africa (ISSA) Rudolph, C., Kuntze N., Alva, A., Endicott-Popovsky, B., Christiansen, J., Kemmerich, T. (2012). On the creation of reliable digital evidence. In S. Shenoi. IFIP WG 11.09 (Eds.), Advances in Digital Evidence VIII. Rudolph C, Velikova Z and Kuntze N. (2009) Secure Web Service Workflow Execution VODCA 2008 Electronic Notes in Theoretical Computer Science. - Bd. 236. - S. 33-46. Santos, N., Rodrigues, R., Gummadi, K. P., & Saroiu, S. (2012). Policy-sealed data: A new abstraction for building trusted cloud services. In Usenix Security. Sheng, L., Gang, X., Dong, F., & Jianshi, Y. (2011). An optimized modeling method based on BPEL4WS. In Service Operations, Logistics, and Informatics (SOLI), 2011 IEEE International Conference on (pp. 57-61). IEEE. Sun S., Kumar A. und Yen J. (2006) Merging workflows: a new perspective on connecting business processes Decis. Support Syst.. - Amsterdam, Elsevier Science Publishers B. V. Tatsubori M., Imamura T. und Nakamura Y. (2004) Best-Practice Patterns and Tool Support for Configuring Secure Web Services Messaging. Proceedings of the IEEE International Conference on Web Services. - Washington, DC, USA : IEEE Computer Society. Vagts H.-H (2007) Control Flow Enforcement in Workflows in the Presence of Exceptions Master's thesis. TU Darmstadt. Wei Y. and Blake M.B. (2010) Service-oriented computing and cloud computing: Challenges and opportunities Internet Computing, IEEE, 6 : Bd. 14. - S. 72-75. Yan J., Yang Y. and Raikundalia G. K. (2006) SwinDeW-A p2p-Based Decentralized Workflow Management System IEEE Transactions on Systems, Man, and Cybernetics Part A - Systems and Humans. - S. 922-935.

A Quantitative Threat Modeling Approach to Maximize the Return on Security Investment in Cloud Computing Andreas Schilling and Brigitte Werners Faculty of Management and Economics, Ruhr-University, Bochum, Germany andreas.schilling@rub.de Abstract: The number of threats to cloud-based systems increases and likewise does the demand for effective approaches to assess and improve security of such systems. The loss, manipulation, disclosure, or simply the unavailability of information may lead to expenses, missed profits, or even legal consequences. This implies the need for effective security controls as well as practical methods to evaluate and improve cloud security. Due to the pervasive nature of cloud computing threats are not limited to the physical infrastructure but permeate all levels of an organization. Most research in cloud security, however, focuses on technical issues regarding network security, virtualization, data protection, and other related topics. The question of how to evaluate and, in a second step, improve organization wide security of a cloud has been subject to little research. As a consequence, insecurity remains among organizations regarding protection needs of cloud-based systems. To support decision makers in choosing cost-effective security controls, a stochastic cloud security risk model is introduced in this paper. The model is based on the practical experience that a threat agent is able to penetrate web-based cloud applications by successfully exploiting one of many possible attack paths. Each path originates from the combination of attack vectors and security weaknesses and results, if successfully exploited, in a negative business impact. Although corresponding risks are usually treated by an organization in its risk management, existing approaches fail to evaluate the problem in a holistic way. The integrated threat model presented in this paper leverages quantitative modeling and mathematical optimization to select security controls in order to maximize the Return on Security Investment (ROSI) according to the complete threat landscape. The model is designed to be applied within the framework of an existing risk management and to quantify security risks using expert judgment elicitation. The results indicate that already small security investments yield a significant risk reduction. This characteristic is consistent with the principle of diminishing marginal utility of security investments and emphasizes the importance of profound business decisions in the field of IT security. Keywords: cloud computing, threat modeling, return on security investment, risk management

1. Introduction Cloud computing is a computing paradigm which is the result of an evolution in computing and information technology. It leads to enhancements in collaboration, agility, scaling, and availability, enabled by centralized and optimized computing resources (Cloud Security Alliance, 2011). By pooling resources it becomes possible to achieve significant cost savings and enable convenient and on-demand network access. Services offered cover the entire information technology (IT) spectrum and include servers, storage, applications, services, or entire networks (Mell & Grance, 2011). When providing computing resources as a utility, the payment model changes from a fixed rate to a pay-per-use model. As a result, capital expenditures for IT investments are substantially reduced and transformed into operational expenditures (Vaquero et al., 2008). Many organizations still have concerns about security and privacy of data which are processed or stored in a cloud infrastructure (Armbrust et al., 2009). The reason for this is that in a typical public cloud environment some systems or subsystems are outside of the immediate control of the customer. Many organizations feel more secure when they have greater influence on security aspects and may intervene if they deem it necessary (Jansen & Grance, 2011). In cloud computing this freedom is reduced and shifted to a certain degree to the service provider. How much control is left depends on the service delivery model and, in case of Software as a Service (SaaS), is very limited. Although this characteristic of cloud services seems problematic at first, it has the potential to strengthen security. A specialized data center operator is usually better able to establish and maintain a high security infrastructure than most organizations could do. Despite the fact that potential consumers are raising numerous security concerns, the specialization and standardization trend of cloud service providers potentially increases security over the long term. However, the assumption that security is always taken care of in cloud computing would be very dangerous. Just as for most traditional solutions cloud services are still facing several threats which arise from the integration into an organizationâ&#x20AC;&#x2122;s IT infrastructure. An understanding of the context in which an organization operates is crucial to determine if risks associated with specific cloud services

Andreas Schilling and Brigitte Werners are acceptable. Default offerings may not be suitable to meet an organization’s security needs. In such a case, additional security controls can be deployed to reduce the risk to an acceptable level (Jansen & Grance, 2011). In the following, a novel approach to evaluate and increase the security of cloud-based systems is proposed. The corresponding model is designed to support the decision maker in selecting cost-effective security controls with respect to the uncertainty of threats. The approach is based on the principle of attack paths to identify how a threat arises. By applying quantitative modeling, it is possible to calculate business impacts and derive optimal investment decisions. The simulation of realistic threat scenarios shows that the proposed model can significantly reduce damages resulting from security incidents.

2. Related work Security challenges in cloud computing have been addressed by several authors and organizations. According to the Cloud Security Alliance (2011) cloud applications are facing threats which go beyond what traditional applications are exposed to. This underlines the importance of enterprise risk management to identify and implement appropriate organizational structures, processes, and controls. They also state that not only the service provider is responsible for security but also the customer. Jansen & Grance (2011) surveyed common problems in cloud computing and published their results as guidelines on security and privacy in cloud computing. According to this, insufficient security controls of the cloud provider may have a negative impact on confidentiality, privacy, or integrity of the customers’ data. As a consequence, an organizaton can “employ compensating security and privacy controls to work around identified shortcommings”. Tsiakis (2010) analyzed the problem of determining appropriate security expenditures and emphasizes the importance of security measurement. Although qualitative approaches can contribute to this evaluation, it is not possible to perform a solid cost/benefit analysis. A quantitative approach, on the other hand, can provide concrete measures such as the Return on Investment (ROI) and gives a clear indication to the decision maker. Sonnenreich et al (2006) are proposing a practical quantitative model which is based on ROI but takes the specifics of information security into account. The same idea is also shared by Böhme & Nowey (2005). To support the understanding of the economics of security investments in general and in cloud computing, a first model has been proposed by Ben Aissa et al. (2010). The model is designed to calculate the mean failure cost by taking into account requirements of different stakeholders, multiple system components, and a threat vector. Rabai et al. (2013) apply this model directly to cloud computing which represents a first approach to quantify and measure risks in economic terms. However, the model only evaluates the current state of the system and provides no decision support. In addition, it is not possible to derive any information on how to improve security.

3. The structure of security threats in cloud computing The following model is designed to support the establishment of cost-effective security controls in cloud computing and is based on a component-based view of security controls. This approach is motivated by the fact that, although security controls can be deployed individually, they do not work in isolation. Each control affects overall security in a specific way and only the conjunction of all implemented controls reflects the actual state of security (Sonnenreich et al., 2006). A modular approach to system design in general is well known and there are several modular engineering methods available to support profound design decisions. In addition, by encapsulating individual parts of a system it becomes easier to acquire and to utilize expert information about certain parts of the overall system (Levin & Danieli, 2005). In cloud computing the customer is not involved in the administration and maintenance of the infrastructure and hence has little to none control over its configuration. As a consequence, risk assessments can focus on direct risks originating from the application and organizational layer. Threats in this regard are more flat and may directly exploit a vulnerability to cause damage. An organization only needs to invest in security to complement the efforts of its service provider. According to the Open Web Application Security Project (OWASP), a threat arises in form of a path through an application or system, starting from an attacker and resulting in a negative business impact. This view of

Andreas Schilling and Brigitte Werners threats seems appropriate, as most cloud applications are browser-based and are used directly by the end user from a workstation or mobile device. To strengthen the security of such a system, it is necessary to identify each path and evaluate its probability and impact. When combined, these factors determine the overall risk of a security breach (Chen et al., 2007; Open Web Application Security Project, 2013). Figure 1 illustrates this concept.

Figure 1: The concept of attack paths to assess information security risks constitutes a multi-stage attack model. Source: Figure based on Open Web Application Security Project (2013) To reduce the success probability of threats, security controls can be implemented which affect corresponding vulnerabilities. By applying the introduced threat model, it is now possible to consider all identified attack paths and choose the most effective controls. To decide on investments, the management requires a concrete cost-benefit analysis, including a comprehensible measurement of security. In addition, there normally is a conflict between the objective to lower security investments and at the same time achieve the highest possible degree of security. To solve this issue the Return on Security Investment (ROSI), initially presented by Berinato (2002), is used as decision criterion. ROSI is a practical measure to calculate the Return on Investment (ROI) for security solutions. It is inspired by the original ROI where the cost of a purchase is weighted against its expected returns:

ROI =

Expected returns - Cost of investment Cost of investment

(1)

To calculate the ROSI, the equation has to be modified to reflect that a security investment does not yield any profit. The expected return is replaced by the monetary loss reduction and leads to the following definition (Berinato, 2002; European Network and Information Security Agency, 2012):

ROSI =

Monetary loss reduction - Cost of solution Cost of solution

(2)

A common measure of monetary loss is the annual loss expectancy (ALE). It is calculated by multiplying the single loss expectancy (SLE) by the annual rate of occurrence (ARO) (Bojanc & Jerman-Blažič, 2008): ALE = SLE ⋅ ARO . (3) Both values, SLE and ARO, are difficult to obtain because reliable historical data do not exist or are not available for use. This is because few companies track security incidents and even if they do, data are often not accurate due to unnoticed incidents or inaccurate quantification of damages. In order to solve this problem Ryan et al. (2012) successfully demonstrated the feasibility of expert judgment elicitation to the field of IT security. It shows that the utilization of expert judgment is particularly useful when quantitative data is missing or of insufficient quality. The following model, therefore, relies on experts who have significant experience with technologies and systems in question. It should be noted that possible inaccuracies regarding the risk estimation cannot completely be eliminated using this approach, however, Sonnenreich et al. (2006) state that an inaccurate scoring can be effective if it is repeatable and consistent. This means, if different investment decisions are compared based on the same input parameters, the resulting outcome may not be perfectly accurate in terms of financial figures, but the evaluation of different investment strategies can provide a coherent basis for decision-making. The following model provides such estimation in a consistent way to assess different investment decisions and even determines an optimal one with respect to ROSI.

Andreas Schilling and Brigitte Werners

4. Mathematical cloud security model The uncertainty about the ARO of threats is modeled based on the principle of attack paths. Each path consists of the probability of the threat to emerge, the probability of the threat to exploit specific vulnerabilities, and the chance of a security control to prevent such event.

4.1 The uncertainty of attack paths Let T = ( T1 ,..., TI ) be a multidimensional random variable and let each Ti model an isolated threat which can either be successful or not without respect to its underlying attack path. This probability can be viewed as a measure of the hardness or complexity of an attack. Each Ti is Bernoulli distributed:

(

)

Ti  B 1; pit .

(4)

A threat requires at least one vulnerability which is suitable to be exploited to cause damage (Open Web Application Security Project, 2013). To model this, for each threat i , multiple vulnerabilities are introduced as

Vi = (Vi1 ,..., ViJ ) where each Vij is Bernoulli distributed. This means that threat i exploits vulnerability j

with probability pij and fails to exploit vulnerability j with probability 1 − pij : v

(

)

Vij  B 1; pijv .

(5)

It is assumed that each vulnerability j has a number of controls associated with, which are modeled by

C j = ( C j1 ,..., C jK ) where each C jk is again Bernoulli distributed:

(

)

C jk  B 1; p cjk .

(6)

4.2 Derivation of success probability of threats To cause damage, a threat has to exploit a suitable vulnerability which may have multiple security controls associated with. As illustrated in Figure 2, the probability that all security controls fail is the joint probability

) ∏ (1 − p ) .

(

that each individual control fails, which is P C = 0,..., C j= 0= j1 K

k =1

c jk

Figure 2: Tree representation of the uncertainty associated with the effectiveness of security controls. (Complementary events are omitted.) Based here on the probability pij that threat i causes damage while exploiting vulnerability j , is: ve

(

)

pijve = pijv ⋅ ∏ 1 − p cjk . k =1

(7)

Accordingly, the overall probability that a threat occurs and in consequence causes damage is the joint probability that a threat emerges and the event that at least one vulnerability is successfully exploited. The probability that a threat exploits at least one vulnerability can be derived from the probability of the

Andreas Schilling and Brigitte Werners complementary event, which is the event that a threat exploits no vulnerability. From this follows the probability pi

ARO

that a threat occurs once:





j =1

(

)



piARO = pit ⋅  1 − ∏ 1 − pijve  .

(8)



The actual ARO is modeled by a multidimensional random variable T

ARO

(

)

= T1ARO ,..., TI ARO . The variable is

used to derive for each threat the number of successful occurrences within one year. For this purpose, it is assumed that the expected number of occurrences ni of a threat within one year can be estimated. Every time ARO

a threat emerges, it can either be successful or not with success probability pi

(

considered to be binomial distributed Ti

ARO

(

 B ni ; piARO

))

. To model this, Ti

ARO

with ni number of trials. The corresponding

probability mass function fT ARO is given by i

n 

ARO

i ARO ( s ) =⋅   ( pi )

s

(1 − p ) ARO i

ni − s

, for s = 1,..., ni .

(9)

4.3 Determining the annual loss expectancy Based on these results, the ALE of a threat can be derived from the expected value of Ti threat. By definition the expected value of Ti

is ni ⋅ p

ARO

ARO i

ARO

and the SLE of the

and the SLE can be estimated or approximated by

any suitable random variable and its corresponding distribution (e.g., normal distribution). Let Ε [.] represent the expected value then the ALE of all threats is: I

ALE= ∑ Ε [SLE i ] ⋅ ni ⋅ piARO .

(10)

i =1

4.4 Model formulation The introduced understanding of security controls as individual components leads to the problem of selecting the most appropriate ones. The results on attack paths are now being used as basis for a novel approach to model cloud security. Properties of threats, vulnerabilities, and security controls are combined in accordance with the definition of ALE to form the foundation of the ROSI calculation (2). To derive the decision criterion, the ALE as introduced in (3) is used to calculate ROSI. It is assumed that the initial investment and the yearly maintenance costs of each security control can be estimated. Let λ be the planning period in years, then the cost of solution can be computed as K 0 ⋅ k k + = k 1= k 1

Cost of solution =

∑c

⋅ sc

∑c

y k

⋅ sck

(11)

with ck being the amount of the initial security investment of control k and ck being the yearly maintenance 0

cost of the same control. The decision variable sck ∈ {0,1} indicates whether a control is selected ( sck = 1) or not ( sck = 0 ) .

The monetary loss reduction is the difference between the upper bound ALEU , which is the worst case scenario in terms of financial damage, and the ALE to be optimized. The resulting objective function (12) expressing the ROSI is to be maximized. The corresponding deterministic counterpart of the described stochastic model utilizes the mathematical expectation of all random variables and is referred to as the Security Controls Selection Problem (SCSP) in the following.

Andreas Schilling and Brigitte Werners Indices and sets I Index set of threats (index i )

Index set of vulnerabilities (index

Index set of security controls (index

Parameters L Lower bound annual loss expectancy

Number of occurrences of threat i within one year Probability that threat i is successful

ALE

ALEU

Upper bound annual loss expectancy

pit

ck0

Initial security investment for security control

pijv

Probability that threat i is successfully exploiting vulnerability j

cky

Yearly security investment for security control

p cjk

Probability that control k prevents a threat from exploiting vulnerability j

Maximum deviation from best case percent Planning period in years

k k ALE in

Single loss expectancy of threat

SLE i

Decision variables Annual loss expectancy to be optimized ALE

Selection of security control

sck

established,

k to be

sck ∈ {0,1}

Cost of solution corresponding to the current solution

(SCSP)

max

ALEU − ALE − C

(12)

C subject to

ALE =

i =1

ALE L =



 

J K t v i ij k 1 =j 1 =

∑ Ε [SLE ] ⋅ n ⋅ p





 

J K t v i i ij k 1 =j 1 =

∑ Ε [SLE ] ⋅ n ⋅ p i

= C

∑c

⋅ sc

∑c

0 ⋅ k k + = k 1= k 1

y k



(13)



(

)



⋅  1 − ∏  1 − p ⋅ ∏ 1 − p cjk  

 J I   ALEU = ∑ Ε [SLE i ] ⋅ ni ⋅ pit ⋅  1 − ∏ (1 − pijv )  i =1 j =1   L ALE ≤ (1 + δ ) ⋅ ALE i =1

)

(

⋅  1 − ∏  1 − p ⋅ ∏ 1 − p cjk ⋅ sck  

(14)



(15) (16)

⋅ sck

(17)

sck ∈ {0,1}

∀k ∈ K

(18)

In constraint (13), the ALE is calculated by using the expected value of SLE and ARO . In (14-15) it is again calculated once with sck= 0, ∀k and once with sck = 1, ∀k , to obtain the lower and upper bounds. As stated before, the upper bound ALE

is used to determine the monetary loss reduction required for the

ROSI calculation. The lower bound ALE on the other hand is needed to guarantee a certain quality of the solution. By choosing parameter δ , it is guaranteed that the solution deviates maximal δ percent from the best possible outcome (16). In (17) the cost of solution is calculated. L

5. Application example and data evaluation Some exemplary threats, vulnerabilities, security controls, and model parameters are introduced in the following to demonstrate the quality of the approach. In any real life application these values would be obtained by conducting a risk assessment of the cloud application or service. For the purpose of this example the data are based on the judgment of the authors.

Andreas Schilling and Brigitte Werners Although the model can in principle be applied to any type of threats, the following example is addressing the field of identity and access management (IAM). Related threats are particularly important in cloud computing due to the ubiquitous access opportunities. In Tables 1 to 3, a number of threats, vulnerabilities, and security controls are presented based on a literature review. In any practical application this information is usually gathered during a risk assessment.

Symbol T1

Threat Exploiting default passwords

Symbol C1

Password guessing: Dictionary, brute force, and rainbow attacks

Security Control Suitable storage of official documents and data media

Shoulder surfing

Provisions governing the use of passwords

Social engineering

Dumpster diving and identity theft

Supervising or escorting outside staff/visitors

TABLE 1:

EXAMPLES OF IAM RELATED THREATS.

Clean desk policy

Training on IT security safeguards

Log-out obligation for users Change of preset passwords

Source: Todorov (2007)

Symbol V1

Vulnerability Lack of, or insufficient, rules

C7 C8

Secure log-in

Inadequate sensitization to IT security

Non-compliance with IT security safeguards

Using encryption, checksums or digital signatures

C10

Use of one-time passwords

Hazards posed by cleaning staff or outside staff

TABLE 3:

Inappropriate handling of passwords

Inadequate checking of the identity of communication partners

POSSIBLE SECURITY CONTROLS TO REDUCE THE EXPLOITABILITY OF IAM RELATED VULNERABILITIES.

TABLE 2:

EXAMPLES OF VULNERABILITIES WHICH ARE EXPLOITABLE BY IAM RELATED THREATS.

Source: German Federal Office for Information Security (2005)

To connect the three stages of the corresponding attack paths, Tables 4 to 7 contain exemplary probabilities, damages, and costs.

i/ j

20K

0.5

0.1

20K

0.8

0.5

0.1

0.5

0.1

0.8

0.5

0.1

0.8

0.1

0.5

0.8

pit

SLE i

0.8

100

0.5

0.1

20K

0.5

20K

0.1

20K

0.1

TABLE 4:

EXAMPLE PROBABILITIES AND SLE VALUES OF THREATS IN USD.

TABLE 5:

EXAMPLE PROBABILITIES THAT A VULNERABILITY IS SUCCESSFULLY EXPLOITED

( )

BY A THREAT pij . v

Andreas Schilling and Brigitte Werners

0.8

0.1

0.8

0.1

0.8

0.5

0.1

0.5

0.1

0.8

0.5

0.1

0.5

0.1

0.5

0.1

0.8

0.5

j/k

Table 6: Example probabilities that a security control prevents a threat from exploiting a vulnerability . 1

30K

15K

10K

15K

25K

15K

2.5K

40K

2.5K

50K

10K

k 0 k y k

Table 7: Initial and yearly investments for security controls in USD. To solve the problem, the model is implemented using the standard optimization software Xpress Optimization Suite. The nonlinear problem can be solved applying successive linear approximation provided by the Xpress-SLP solver (FICO, 2012). Figure 3 shows multiple optimal solutions for δ = 0,..., 6 and illustrates the relations between ROSI, ALE, and cost of solution for a planning period of λ = 3 years. The costs are constantly decreasing and the ALE is increasing when reducing the quality of the solution by choosing larger δ values. Although this behavior is expected, it is notable that ROSI in fact decreases with higher security. This seems to be an unwanted property since security should be as high as possible, but it is in fact a desired property of ROSI, as it measures the return on investment and not the security of the system. To utilize ROSI the decision maker is therefore required to choose how much security is desired before applying the model. When security requirements have been fixed by choosing δ , the model is capable to calculate the optimal selection of security controls with respect to this requirement.

Figure 3: Relation between ROSI, ALE, and cost of solution for decreasing quality of solution . To demonstrate how this approach contributes to efficiently increase security, SCSP is solved optimally with δ = 1 and the solution (= sc2 1,= sc5 1,= sc6 1) is analyzed applying a Monte Carlo simulation which is based on repeated random sampling to obtain concrete results for the stochastic parameters. For this purpose, the stochastic model is implemented using @RISK by Palisade Corporation and the simulation is conducted with 10,000 iterations (Palisade Corporation, 2013). In each iteration the probability distributions take on a specific value which is used for the calculation.

Andreas Schilling and Brigitte Werners To examine the implementation of the optimal solution for Î´ = 1 , the density of the maximal damage is first depicted in Figure 4. The maximal damage is obtained by calculating the damage without the implementation of any security controls. As expected, the mean value ($1,917,694) is almost identical to the result of the deterministic model. The probability density plot, however, shows that a damage realization of more than $1.917 million is very likely with a probability of 71.6%. This fact emphasizes the need for effective controls. The gap between $1 million and $1.5 million is caused by the structure of the threats. T1 is causing a damage shift due to its high success probability

t 1

)

= 0.8 .

Figure 4: Probability density plot with cumulative overlay of maximal damage with no implementation of controls.

Figure 5: Probability density plot with cumulative overlay of actual damage based on the implementation of the optimal solution for Î´ = 1 . When implementing the optimal solution of SCSP for Î´ = 1 , the simulation produces a completely different outcome. As can be seen in Figure 5, the probability density of the actual damage shows significantly smaller realizations compared to the maximal damage. There is in fact a 30.4% chance that no damage occurs. The probability of a realization of more than $1.917 million is reduced to 3.4% when implementing controls according to the solution. As can be seen, the shape of the density function is now flattened out and in particular high realizations are very rare. The previous gap is hardly visible anymore, as the applied solution is consisting of controls that have a distinct influence on vulnerabilities which are exploitable by T1 . The superiority of this solution is particularly obvious when examining the cumulated probability density of both cases, as the actual damage is clearly ranked as superior to the maximal damage.

Andreas Schilling and Brigitte Werners

6. Conclusion With the emergence of cloud computing, organizations are confronted with a new situation which requires, more than before, a solid evaluation of their systems and the establishment of proper security controls. The security of cloud-based system is to a large extent assured by the service provider but still needs to be complemented by additional security controls which have to be implemented by the consumer of the service. In this paper, a quantitative approach is introduced to support the decision maker in selecting such compensating controls in a cost-effective manner. The corresponding model is taking into account how threats are arising in a cloud context and adds the element of uncertainty. By defining the stochastic structure it becomes possible to determine how much damage can be expected with respect to different security solutions. In addition, by leveraging well-established methods of mathematical optimization it is possible to select the best possible investment strategy. Simulations of possible threat scenarios have shown that the developed approach is providing a significant improvement of security. However, the modeling still leaves room for improvement with respect to the representation of uncertainty. In case of extreme events the utilization of expected values in the deterministic counterpart of the model may cause undesired properties of the solution. To avoid this, the applicability of other modeling approaches and decision criteria are currently under consideration. Possible approaches include the use of different measures of dispersion, other risk measures like Value at Risk, and multiple criteria decision making.

Acknowledgment This work was supported by the Horst Görtz Foundation.

References Armbrust, M. et al. (2009) “Above the Clouds: A Berkeley View of Cloud Computing”. Berkeley: EECS Department, University of California. Ben Aissa, A., Abercrombie, R.K., Sheldon, F.T. & Mili, A. (2010) “Quantifying security threats and their potential impacts: A case study” Innovations in Systems and Software Engineering, Vol. 6, No. 4, December, pp. 269-281. Berinato, S. (2002) “Finally, a real return on security spending”, CIO Magazine. Böhme, R. & Nowey, T. (2005) “Economic Security Metrics” In: I. Eusgeld, F. C. Freiling & R. Reussner, eds. Dependability Metrics. Berlin: Springer, pp. 176-187. Bojanc, R. & Jerman-Blažič, B. (2008) “An economic modelling approach to information security risk management” International Journal of Information Management, Vol. 28, No. 5, October, pp. 413-422. Chen, Y., Boehm, B. & Sheppard, L. (2007) “Value Driven Security Threat Modeling Based on Attack Path Analysis” Proceedings of the 40th Annual Hawaii International Conference on System Sciences, January, p. 280a. Cloud Security Alliance (2011) Security Guidance for Critical Areas of Focus in Cloud Computing V3.0. [online] Available at: https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf. European Network and Information Security Agency (2012) Introduction to Return on Security Investment - Helping CERTs assessing the cost of (lack of) security, Heraklion: ENISA. FICO (2012) FICO Xpress-SLP. [online] Available at: http://www.fico.com/en/products/dmtools/xpressoverview/pages/xpress-slp.aspx. German Federal Office for Information Security (2005) IT-Grundschutz Catalogues, Bonn: BSI. Jansen, W. & Grance, T. (2011) Guidelines on Security and Privacy in Public Cloud Computing, Gaithersburg: National Institute of Standards and Technology. Levin, M.S. & Danieli, M.A. (2005) “Hierarchical Decision Making Framework for Evaluation and Improvement of Composite Systems (Example for Building)” Informatica, Vol. 16, No. 2, April, pp. 213-240. Mell, P. & Grance, T. (2011) The NIST Definition of Cloud Computing, National Institute of Standards and Technology, Gaithersburg. Open Web Application Security Project (2013) OWASP Top Ten Project. [online] Available at: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project. Palisade Corporation (2013) @RISK: Risk Analysis Software using Monte Carlo Simulation for Excel. [online] Available at: http://www.palisade.com/risk/. Rabai, L.B.A., Jouini, M., Ben Aissa, A. & Mili, A. (2013) “A cybersecurity model in cloud computing environments” Journal of King Saud University - Computer and Information Sciences, Vol. 25, No. 1, January, pp. 63-75. Ryan, J.J. et al. (2012) “Quantifying information security risks using expert judgment elicitation” Computers & Operations Research, Vol. 39, No. 4, April, pp. 774-784. Sonnenreich, W., Albanese, J. & Stout, B. (2006) “Return On Security Investment (ROSI) - A Practical Quantitative Model” Journal of Research and Practice in Information Technology, Vol. 38, No. 1, February, pp. 55-66.

Andreas Schilling and Brigitte Werners Todorov, D. (2007) Mechanics of User Identification and Authentication: Fundamentals of Identity Management, Auerbach Publications, Boca Raton. Tsiakis, T. (2010) “Information Security Expenditures: A Techno-Economic Analysis” International Journal of Computer Science and Network Security, Vol. 10, No. 4, April, pp. 7-11. Vaquero, L.M., Rodero-Merino, L., Caceres, J. & Lindner, M. (2008) “A Break in the Clouds: Towards a Cloud Definition” ACM SIGCOMM Computer Communication Review, Vol. 39, No. 1, January, pp. 50-55.

PhD Research Papers

Security as a Service using Data Steganography in Cloud Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee Dept of Information Science and Technology, Anna University, Chennai, India anitabalajim@yahoo.com

Abstract: In cloud era as the data stored is away from the user, privacy and integrity of the data plays a crucial role. This paper proposes a practical and efficient method for providing security to the data stored at the data server through metadata. This method provides security using cipher key which is generated from the attributes of metadata. In this proposed model the time required for generating the cipher key is proportional to the number of attributes in the metadata as well the algorithms used for cipher key generation. Our design enforces security by providing two novel features. 1. Security is provided by the proposed model, where the encryption and decryption keys cannot be compromised without the involvement of data owner and the metadata data server (MDS), hence makes data owner comfortable about the data stored. 2. The cipher key generated using the modified feistel network holds good for the avalanche effect as each round of the feistel function depends on the previous round value. The proposed model also makes data owner reassured about the data stored at the cloud in both cases: (i) Data at Rest and (ii) Data in motion. The steganographic method used is automated based on the input from the metadata server and the user. We have implemented a security model incorporating these ideas and have evaluated the performance and scalability of the proposed model. Keywords: Security, data storage, metadata, cloud, feistel function, steganography

1. Introduction Cloud computing has become the most attractive field in industry and in research. The requirement for cloud computing has increased in recent days due to the utilization of the software and the hardware with less investment (R. Anitha 2011). A recent survey regarding the use of cloud services made by IDC, highlights security to be the greatest challenge for the adoption of cloud computing technology (Kuyoro 2011). Four key components of data security in cloud computing are data availability, data integrity, data confidentiality, and data traceability. Data traceability means that the data transactions and data communication are genuine and that the parties involved are said to be the authorized persons (Anup Mathew 2012) . As shown in several studies, data traceability mechanisms have been introduced in various forms, ranging from data encryption to intrusion detection or role-based access control, and is successfully protecting sensitive information. However, the majority of these concepts are centrally controlled by administrators, who themselves are one of the major threats to security (Johannes Heurix 2012). In some modern distributed file systems, data is stored on devices that can be accessed through the metadata, which is managed separately by one or more specialized metadata servers (MichaelCammert 2007). Metadata is a data about data and it is structured information that describes, explains, locates, and makes easier to retrieve, use, or manage an information resource. The metadata file holds the information about a file stored in the data servers. In cloud computing, the users will give up their data to the cloud service provider for storage. The data owners in cloud computing environment want to make sure that their data are kept conďŹ dential to outsiders, including the cloud service provider which will be the major data security requirement. The author (Johannes Heurix 2012) describes a security protocol for data privacy that is strictly controlled by the data owner, where the PERiMETER makes use of a layer-based security model which protects the secret cryptographic key used to encrypt each user's metadata. Several studies have been carried out relating to security issues in cloud computing but the proposed work presents a detailed analysis of the data security in cloud environment through metadata and challenges focusing on security of data through metadata. The Novel scheme supports security through metadata attributes and the access control of the data is limited within the keys used in this model. Furthermore, the ability to generate and compare the keys between the user and the metadata server plays a major role. The security policy in this proposed model depends on 1) Userâ&#x20AC;&#x2122;s key which is used to encrypt the original data 2) Cipher key generated using metadata attributes stored in the MDS. The challenge in constructing the security policy, involves the data owner and the MDS in order to retrieve the original data. As the data owner and the MDS are involved in the key creation and sharing policies, the model prevents unauthorized access of data. The model also makes data owner confident about the complete security of the data stored, since the encryption and decryption keys cannot be compromised without the involvement of data owner and the MDS. Our contributions can be summarized as follows:

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee ď&#x201A;§

We propose a model to create a cipher key C based on the attribute of metadata stored using a modified feistel network and support user to access the data in a secured mode.

ď&#x201A;§

We also propose a novel security policy which involves the data owner, the MDS and the data server by means of key creation and sharing policies thereby ensuring that the model prevents unauthorized access of data.

The rest of the paper is organized as follows: Section 2 summarizes the related work and the problem statement. Section 3 describes the system architecture model and discusses the detailed design of the system model. Section4 describes the modified feistel network structure design and issues of the proposed model. Section 5 explains about the data security at the data server location. The performance evaluation based on the prototype implementation is given in Section 6 and Section 7 concludes the paper.

2. Related works The Related work discusses about the previous work carried out in the area of cloud security and we have also discussed about how metadata is used in cloud computing environment.

2.1 Metadata in distributed storage systems Recently a large amount of work is being pursued in data analytics in cloud storage (Michael Cammert 2007). The author (AbhishekVerma 2010) has proposed metadata using Ring file system. In this scheme metadata for a file is stored based on hashing its parent location. Replica is stored in its successor metadata server. In the paper (YuHua 2011) have proposed a scalable and adaptive metadata management in ultra large scale file systems. In paper (Michael Cammert 2007) the metadata is divided into two types: static and dynamic metadata. The author has suggested a publish-subscribe architecture, enabled a SSPS to provide metadata on demand and handled metadata dependencies successfully. (Anitha.R 2011) has described that the data retrieval using metadata in cloud environment is less time consuming when compared to retrieving a data directly from the data server.

2.2 Security Schemes The author (Chirag Modi 2013) proposed a survey paper where they discussed about factors affecting cloud computing storage adoption, vulnerabilities and attacks. The authors have also identified relevant solution directives to strengthen security and privacy in the cloud environment. They further discuss about various threats like abusive use of cloud computing, insecure interfaces, data loss and leakage, identity theft and metadata spoofing attack. (J. Ravi Kumar 2012) shows that third party auditor is used to periodically verify the data integrity for the data stored at cloud service provider without retrieving the original data. The security is provided by creating the metadata for the encrypted data. (Shizuka Kaneko 2011) have proposed a query based hiding schema Information using a Bloom filter. The query given is processed and the attributes of the query is used for key generation. The generated key is used to hide confidential information. (Marcos K. Aguilera 2003) has proposed a practical and efficient method for adding security to network-attached disks (NADs). The design specifies a protocol for providing access to the remote block-based devices. The security is provided by means of access control mechanism.

2.3 Bloom filter schemes The Bloom filter is a space-efficient probabilistic data structure that supports set membership queries. The data structure was conceived by Burton H. Bloom in 1970. The structure offers a compact probabilistic way to represent a set that can result in false positives (claiming an element to be part of the set when it was not inserted), but never in false negatives (reporting an inserted element to be absent from the set). This makes Bloom filters useful for many different kinds of tasks that involve lists and sets. The basic operations involve adding elements to the set and querying for element membership in the probabilistic set representation. (Shizuka Kaneko 2011) has discuss about the usage of bloom filter in query processing.

2.4 Steganography Security Schemes (Wawge P.U. 2012) describes that steganography comes from the Greek words Steganos (Covered) and Graptos (Writing). The term steganography came into use in 1500â&#x20AC;&#x2DC;s after the appearance of Trithemius book on the subject Steganographia. The word steganography technically means covered or hidden writing. The author

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee (Jasvinder Kaur 2012) proposed new data hiding scheme by using matrix matching method.On this basis of matching factor of columns, particularbits may be changed such that change in image quality is minimum. Thus the original content is hidden. (Sharon Rose Govada 2012) has proposed text steganography with multi level shielding where he proposed a method which is capable of performing text steganography that is more reliable and secure when compared to the existing algorithms. The method is a combination of word shifting, text steganography and synonym text steganography. (Nirmalya Chowdhury 2012) has proposed an efficient method of steganography using matrix approach. He has discussed that the goal of steganography is to hide messages inside other â&#x20AC;&#x2DC;harmlessâ&#x20AC;&#x2122; messages in a way that does not allow any enemy to even detect that there is a second message present. Least significant bit (LSB) insertion is a common and simple approach to embed information in a cover object which he has used. The design uses a matrix based steganography which modifies the bit inside the matrix by means of adding and modifying.

3. System architecture The architecture diagram of the proposed system model is shown in Figure 1. Each block in the architecture explains about how the data is encrypted and how the keys are shared between the user, MDS and the DS.

Figure 1: Architecture Diagram The system model proposes security to the data using modified Feistel network where the metadata attributes are taken as input in the form of matrices. In this model the user uploads the encrypted file using the key X1. The metadata for the file is created and based on the metadata created, attributes of the cipher key Cmxn is created. The Metadata server sends the cipher key Cmxn to the user. Using Cmxn as key the user encrypts the key X1 and generates X2. While downloading the file the key X2 and C mxn is used to retrieve X1 and file is decrypted. This model proposes a modified Feistel function F which introduces the matrix operations like transpose, shuffle, addition and multiplication along with the key matrix. The cryptanalysis carried out in this paper clearly indicates that this cipher cannot be broken by the brute force attack. This model provides high strength to the cipher, as the encryption key induces a significant amount of matrix obfuscation into the cipher. The avalanche effect discussed shows the strength of the cipher Cmxn. The secured bloom filter indexing is used to generate the stego data in order to prevent the data at the data server location. The data stego is generated using the bloom filter index value and the user key K. Thus the original data is made secured at the data server location. The proposed system model also ensures that the data is identically maintained by making use of the cipher key C during any operation like transfer, storage, or retrieval.

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee

Figure 2: File Upload and Download using Security key.

Uploads the Encrypted file 2. Encrypted File sent to the Data server 3.Sends the Cipher key Cmxn to the user.

File Access: When a user sends request for data stored on the cloud environment, the request is given to the metadata server which provides the recent cipher key Cmxn to the user. Using Cmxn user decrypts the key X2 and gets X1. Using X1 the encrypted file from the cloud storage is decrypted to get the original data. By providing the recent cipher key Cmxn the data integrity is also verified. Our system methodology uses the functionalities 

File Uploading.



Data Pre processing.



Construction of modified feistel network



Generation of Cipher key C.



Generation of Secured Bloom filter Index.



Creation of Data stego.

5. Modified Feistel Network Feistel ciphers are a special class of iterated block ciphers where the cipher text is calculated from the attributes of metadata by repeated application of the same transformation or round function.

5.1 Development of the cipher key “Cmxn” using Modified Feistel Function In this paper we propose a complex procedure for generating the cipher key “Cmxn” based on matrix manipulations, which could be introduced in symmetric ciphers. The proposed cipher key generation model offers two advantages. First, the procedure is simple to implement and has complexity in determining the key through crypt analysis. Secondly, the procedure produces a strong avalanche effect making many values in the output block of a cipher to undergo changes with one value change in the secret key. As a case study, matrix based cipher key generation procedure has been introduced in this cloud security model and key avalanche have been observed. Thus the cloud security model is improved by providing a novel mechanism using modified Feistel network where the cipher key Cmxn is generated with the matrix based cipher key generation procedure.

5.2 Procedure for generating Cipher Key Cmxn The Cipher key generation procedure is based on a matrix initialized using secret key and the modified feistel function F. The input values used in various feistel rounds are taken from the previous round. The selection of rows and columns for the creation of matrix is based on the number of attributes of the metadata and the secret key matrix “ Kmxn“ and the other functional logic as explained in the following subsections. 5.2.1 Data Preprocessing Data preprocessing is a model for converting the metadata attributes into matrix form using the SHA-3 cryptographic algorithm, containing m rows and n columns, where m is the number of attributes of the metadata and n takes the size of the SHA-3 output. The matrix is splitted into 4 equal matrix say m1, m2, m3 and m4. The matrix obfuscation is carried out in order to make the hacker opaque. The matrices m1, m3 and m2, m4 are concatenated. This obfuscated matrix is fed as input to the feistel network structure where concatenated value of m1, m3 will be the left value and m2, m4 be the right value of the feistel network.

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee

Figure 3: Model for Data Preprocessing. 5.2.2 Modified Feistel Network Structure The Matrix Lmxn which is a concatenated value of m1 || m3 is considered as the left value of the feistel network structure and Matrix Rmxn = m2 || m4 is considered as the right value of the feistel network structure. Using MD5 cryptographic hash algorithm the key matrix Kmxn is generated whose size is m x n where “m” is the number of attributes of metadata and “n” is the size of the MD5 algorithm. The development of the cipher key in the feistel network is done through the number of rounds until the condition is satisfied. In this symmetric block ciphers, matrix obfuscation operations are performed in multiple rounds using the key matrix and the right side value of the feistel network structure. The function F plays a very important role in deciding the security of block ciphers. The concatenated value of Lmxn and Rmxn in the last round will be the cipher key Cmxn. Fig.4 below represents the one round modified feistel network structure.

Figure 4: One Round of Modified Feistel Network

5.3 Definition of feistel function F: Let R be a function variable and let K be a hidden random seed, then the function f is defined as, F(R, K) = FK(R) where F is a modified feistel function. The procedure for developing the function is described below. Each round has its own feistel function F. The function f is considered to be varied based on the right side value of the feistel network i.e. the function F is indexed by the matrix Rmxn for that round. In this modified feistel network structure, the function for each round depends on the previous round i.e. Roundi (Li, Ri) = ( Ri-1, F( K, Li-2 ) )

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee The above formula shows that a small change in one round affects the entire feistel network. For each round as the value of R of the network gets compressed at some point in time the feistel round automatically stops based on the size of the attributes. The procedure for deriving function F is explained in steps as follows: Algorithm 1: Creation of cipher key Begin 1. Read Metadata attribute 2. Apply SHA-3 3. Generate Matrix Mmxn, split the matrix, and generate Lmxn and Rmxn 4. Left value of Feistel = Lmxn and Right value of Feistel = Rmxn For i = 1 to n Repeat till n / 2 = 1 Begin 4.1 Split Rmxn into equal matrix, R1mxn, R2mxn 4.2 Transpose R1mxn, R2mxn as R1nxm, R2nxm 4.3 Apply matrix addition of R1nxm, R2nxm = Tmxn 4.4 Transpose Tmxn 4.5 Matrix multiplication of Tnxm * Kmxn = RVmxn / *condition for multiplication is verified */ 4.6 New Lmxn = RVmxn 4.7 New Rmxn = Old value of Lmxn End 5. Repeat the step till n takes odd value 6. Write(C) Cipher key C = Lmxn || Rmxn / *|| represents concatenation */ End Algorithm 2: Creation of feistel function F Begin 1) Read Matrices Lmxn and Rmxn a. Assign Left value = Lmxn b. Right value = Rmxn 2) Split R mxn = R1 mxn and R2mxn. 3) Transpose (R1mxn) = R1 nxm 4) Transpose (R2mxn) = R2 nxm 5) T nxm = R1nxm + R2 nxm. 6) RV mxn. = T nxm * K mxn 7) Re - Assign a. L mxn = RV mxn b. R mxn = L mxn 8 Go to Step 2 till n = odd value. End

5.4 Analysis of Cipher key Cmxn: Avalanche Effect: The modified feistel network also holds good for the avalanche effect as each round depends on the previous round value. Avalanche effect is an important characteristic for encryption algorithm. This characteristic is seen that a small change in the metadata attribute will have the effect on its cipher key which shows the efficacy of the cipher key i.e. when changing one bit in plaintext and will change the outcome of at least half of the bits in the cipher text. The need for the discussion of avalanche effect is that by changing only one bit leads to a large change, hence it is hard to perform an analysis of cipher text, when trying to come up with an attack. The avalanche effect is calculated by the formula,

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee

5.5 Generation of Secured Bloom filter Lookup: The second level of security in the metadata layer which is provided using the secured bloom filter look up table. The generation of the look up table is as shown in Fig.5. A Secured bloom filter index is created based on the value of cipher key Cmxn and key K which is X2 of the user using the attribute of metadata. The HMAC1 is applied for every attribute A of the metadata created using the key from the user and the output of the first level is again applied for HMAC2 using the cipher key Cmxn hence the index creation cannot be compromised without the involvement of the user and the metadata attributes.

Figure 5: Secured Bloom Filter Index Generation.

6. Data Steganography at Data Server location: This section explains about the data security at the data server location. As the data in cloud is kept in the data server which is away from the user the security of data at rest plays a major role. The generation of stego data is as shown in the Fig. 6. The original data is converted into data stego at the time of storing the data. The conversion process is carried out in 4 steps. 1. Data to Matrix converter 2. Matrix is added with SBF value 3. The output is added with the key from the user 4. Matrix to Data convertor. Thus the Original data is divided into data stego and uploaded to the data server location. Certain mathematical operations like converting the data block into matrix form and by using SBF, the original matrix is modified. To hide the original information, straight message insertion may transform every bit value of original information i.e. embedding some bit values to the original value. Each of these techniques is applied to provide security to the data, by hiding the original data. For steganography, we have used matrix operations in order to hide the original information.

Figure 6: Generation of StegoData.

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee

Figure 7: Conversion of StegoData to Original Data.

7. Implementation and results The experiments have been carried out in a cloud setup using eucalyptus which contains cloud controller and walrus as storage controller. In our experiment, hundreds of files are uploaded into the storage and then downloaded based on the userâ&#x20AC;&#x2122;s requirement. When the user uploads the encrypted file using key X1, the model first generates the metadata and based on the attributes, the MDS shares the cipher key Cmxn the user. The experimental result shows that the model provides adequate security to the data stored. The performance result shows that the time taken for encryption and decryption is less and the security model reduces the acceptance of false request by an unidentified user. The experiment is also conducted for providing a metric that measures the effectiveness of the system model by means of False Acceptance Rate and False Rejection Rate of the proposed security model. The experimental result shows that the model provides adequate security to the data stored without the involvement of TPA. Results demonstrate that our design is highly effective and efficient in providing security with reduced time complexity.

7.1 Experimental analysis In our experiment we have used the KDD-CUP dataset of various years from 2003 to recent version and investigated the effect of file access performance using metadata with respect to the response time. Figure.8 compares the response time for accessing the file with metadata and without using metadata. Figure.9 and 10 illustrates the performance security model.

Figure 8: Comparison with respect to response time with and without using metadata

Figure 9: Time taken to create a Cmxn.

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee

Figure 10: Avalanche effect of Modified Feistel

Algorithm.

8. Conclusion This paper investigates the problem of data security in cloud data storage where the data is stored away from the user. We have studied the problem of privacy of data stored and proposed an efficient and secured protocol to store data at the cloud storage servers. We believed that the data storage security in cloud era is full of challenges especially when the data is at rest and at the data location. Our method provides privacy to the data stored and the challenge in constructing the security policy, involves the data owner and the MDS in order to retrieve the original data. The model also makes data owner confident of the security of the data stored in cloud environment, since the encryption and decryption keys cannot be compromised without the involvement of both the data owner and the MDS, even when data at rest.

References Abhishek Verma, Shivaram Venkataraman, Matthew Caesar, and Roy Campbell. (2010) "Efficient Metadata Management for Cloud Computing Applications", International Conference on Communication Software and Networks, pp 514519. nd Aguilera, M. K, Lillibridge.m and Maccormick. (2003) “Block-Level Security for Network-attached disks”, The 2 Usenix conference on File and Storage Technologies, pp 159–174. Anitha Ramachandran and Saswati Mukherjee. (2011) "A Dynamic Metadata Model in Cloud Computing", Proc. of Springer CCIS, Vol. 2, pp.13–21. Anup Mathew. (2012) ”Survey Paper on Security & Privacy Issues in Cloud Storage Systems”, Proc. of Electrical Engineering Seminar and Special Problems 571B, pp 1-13. Bhupesh Kumar Dewangan, and Sanjay Kumar Baghel. (2012) “Verification Of Metadata By Encryption For Data Storage Security In Cloud”, International Journal of Research in Computer and Communication technology, Vol.1, Issue.6, pp 300-305. Chiemi Watanabe and Yuko Arai. (2009) “Privacy-Preserving Queries for a DAAS model using Two-Phase Encrypted Bloom Filter”, International Conference on Database Systems for Advanced Applications, pp 491-495. Chirag Modi, Dhiren Patel, Bhavesh Borisaniya, Avi Patel, and Muttukrishnan Rajarajan. (2013) “A survey on security issues and solutions at different layers of Cloud computing”, Journal of Super Computers, pp 561–592. Cong Wang, Qian Wang, Kui Ren, Wenjing Lou. (2009) ”Ensuring Data Storage Security in Cloud Computing”, International Workshop on Quality of Service, pp 1-9. Jan-Jan Wu, Pangfeng Liu, and Yi-Chien Chung. (2010) "Metadata Partitioning for Large-scale Distributed Storage Systems", IEEE International Conference on Cloud Computing, pp 212-219. Jasvinder Kaur, Manoj Duhan, Ashok Kumar, Raj Kumar Yadav, (2012) ”Matrix Matching Method for Secret Communication using Image Steganography” International Journal of Engineering, Hunedoara,Issue.3, pp 45-48. Johannes Heurix, Michael Karlinger, Thomas Neubauer. (2012) “Perimeter – Pseudonymization and Personal Metadata Encryption for Privacy-Preserving Searchable Documents”, International Conference on Health Systems, Vol. 1, Issue 1, pp 46-57 . Ken Kuroiwa and Ryuya Uda. (2012) “A Low Cost Privacy Protection Method for SNS by Using Bloom Filter“, The 6th International Conference on Ubiquitous Information Management and Communication. Kuyoro S. O, Ibikunle.F and Awodele. O. (2011) “Cloud Computing Security Issues and Challenges“, International Journal of Computer Networks, Vol.3, Issue 5, pp 247 – 255. Michael Cammert, Jurgen Kramer, and Bernhard Seeger. (2007) "Dynamic Metadata Management for Scalable Stream Processing Systems", IEEE International Conference on Data Engineering Workshop, pp 644-653. Nirmalya Chowdhury, Puspita Manna. (2012) “An Efficient Method of Steganography using Matrix Approach”, International Journal of Intelligent Systems and Applications, Vol.4, Issue.1, pp 32-38. Nyberg. K, Knudsen. L.R. (1992), “Provable security against differential cryptanalysis, in: Advances in Cryptology”, Proc. of CRYPTO'92, - LNCS, Vol.740, Springer-Verlag, pp 566-574.

Anitha Balaji Ramachandran, Pradeepan Paramjothi and Saswati Mukherjee Qian Wang, Cong Wang, Kui Ren, Wenjing Lou, Jin Li. (2011) “Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing", IEEE Transactions On Parallel And Distributed Systems, Vol. 22, No. 5, pp 1-13. Ravi kumar.J and M. Revati. (2012) “Efficient Data Storage and Security In Cloud “, International Journal Of Emerging trends In Engineering And Development , Vol.6, Issue 2. Sharon Rose Govada, BonuSatishKumar, Manjula Devarakonda, Meka James Stephen.(2012) ”Text Steganography with Multi level Shielding”, International Journal of Computer Science Issues, Vol. 9, Issue. 4, pp 401 – 404. Shizuka Kaneko, Toshiyuki Amagasa and Chiemi Watanabe. (2011) “Semi-Shuffled BF: Performance Improvement of a Privacy-Preserving Query Method for a DaaS Model Using a Bloom filter”, International Conference on Parallel and Distributed Processing Techniques and Applications. Shucheng Yu., Cong Wang, Kui Ren, Wenjing Lou. ( 2010) "Achieving Secure, Scalable, and Fine-grained Data Access Control in Cloud Computing”, IEEE INFOCOM, pp 1-9. Wawge P.U. and Rathod. A.R. (2012) “Cloud Computing Security with Steganography and Cryptography AES Algorithm Technology”, World Research Journal of Computer Architecture, Vol.1, No. 1, pp.11-15. Yang Tang, Patrick P.C. Lee, John C.S. Lui and Radia Perlman. (2012) “Secure Overlay Cloud Storage with Access Control and Assured Deletion”, IEEE Transacations On Dependable and Secure Computing, Vol. 9, No.6, pp 903-916. Yixue Wang, HaiTao Lv. (2011) "Efficient Metadata Management in Cloud Computing", IEEE International Conference on Communication Software and Networks, pp 514-519. Yu Hua, Yifeng, Hong Jiang, Dan Feng, and Lei Tian. (2011) "Supporting Scalable and Metadata Management in Ultra Large Scale File Systems", IEEE Transactions on Parellel and Distributed Systems, Vol. 22, No. 4, pp 580-593. Yuko Arai and Chiemi Watanabe. (2010) “Query Log Perturbation Method for Privacy Preserving Query”, International Conference on Ubiquitous Information Management and Communication. pp 1-8.

An Analytical Framework to Understand the Adoption of Cloud Computing: An Institutional Theory Perspective Rania El-Gazzar1 and Fathul Wahid1,2 1 Department of Information Systems, University of Agder, Kristiansand, Norway 2 Department of Informatics, Universitas Islam Indonesia, Yogyakarta, Indonesia rania.f.el-gazzar@uia.no fathul.wahid@uia.no

Abstract: Cloud computing (CC) offers a new information technology service model for organizations. In spite of its possible benefits, however, it also poses some serious concerns. Why do organizations adopt CC in spite of its potential risks? Based on several core concepts based on institutional theory, we propose an analytical framework to better understand the adoption of CC by organizations. We focus on the concepts of field-level changes, organizational fields, institutional isomorphism, and institutional strategic responses within the context of CC adoption. We identify a number of organizations that form the organizational field and bring about changes (i.e., CC providers, peer organizations, business partners, professional and industry associations, and regulators) that may trigger institutional pressures (i.e., coercive, normative, and mimetic) on the adopting organizations. We conclude by presenting possible strategic responses (i.e., acquiescence, compromise, avoidance, defiance, and manipulation) to address the institutional pressures related to CC adoption. Keywords: cloud computing, institutional theory, adoption, organizational field, institutional isomorphism, strategic responses

1. Introduction The concept of cloud computing (CC) has received considerable attention in academic and technical literature over the past several years (Timmermans et al. 2010; Yang & Tate 2012). The extant literature reports various benefits that CC may provide for organizations, including simplicity, cost efficiency, reduced demand for skilled labor, and scalability (Armbrust et al. 2010; Venters & Whitley 2012; Garrison et al. 2012). However, the literature also forewarns adopters to pay attention to potential risks associated with the implementation, management, and use of CC services (Marston et al. 2011). Notwithstanding these potential risks, several sources indicate that the adoption of CC has been growing significantly (Catteddu 2010; Lee et al. 2011; CSA & ISACA 2012). CC offers a compelling solution for small- and medium-sized enterprises (SMEs) due to its low-entry barriers, both technical and financial, for using such sophisticated services. In contrast, large enterprises (LEs) possess surplus resources and can afford to implement an in-house information technology (IT) infrastructure (Weinhardt et al. 2009; Gordon et al. 2010; Son et al. 2011; Li et al. 2012). However, some questions regarding the use of CC have not been clearly addressed in the literature to date. For example, does CC leverage its promises to the adopters, what factors affect the decision to adopt or not to adopt CC, and do these factors affect the way the adopters manage the potential risks and/or exploit the promising benefits? As a preliminary effort to address these issues, we propose an analytical framework to better understand the process of CC adoption by organizations. As the extant literature pays more attention to the benefits of CC than to its risks, we expect that the framework will be useful for answering the question of, Why do organizations adopt CC in spite of its potential risks? To develop the framework for this study, we have relied on the concept of institutional theory, which is wellsuited for gaining a better understanding of the various stages of the IT institutionalization process and the interactions between IT and the institution (Swanson & Ramiller 2004; Mignerat & Rivard 2009). In addition, institutional theory equips us with various concepts for better understanding the impact of internal and external factors on organizations that are engaged in IT-induced changes (Mignerat & Rivard 2009; Weerakkody et al. 2009). The theory is also able to capture the notion of â&#x20AC;&#x153;irrationalityâ&#x20AC;? in decision-making processes (Meyer & Rowan 1977; DiMaggio & Powell 1983; Mouritsen 1994), such as when an organization adopts CC to keep up with the industry hype and not just to reduce costs.

Rania El-Gazzar and Fathul Wahid The remainder of the paper is structured as follows: In Section 2, we describe the concept of CC, along with its associated benefits and risks. The underlying concepts of institutional theory are presented in Section 3. In Section 4, we develop an analytical framework by connecting the institutional concepts supported with arguments from the extant literature. Section 5 ends the paper with conclusions and possible ways to use the framework in future research.

2. Cloud computing The CC paradigm has emerged from previous distributed computing technologies such as grid computing and virtualization (Sultan 2011). CC is classified as a form of IT outsourcing through which shared IT resources are pooled in large external data centers and made accessible by users through the Internet (Venters & Whitley 2012). Commonly, CC is defined as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Mell & Grance 2011, p. 2). CC services are delivered by the provider to users via various models, such as Software as a Service (SaaS), Platform as a Service (PaaS), and/or Infrastructure as a Service (IaaS) (Mell & Grance 2011). The SaaS model provides Internet-based access to applications created by the CC provider. The PaaS model provides programming languages and tools supported by the CC provider via the Internet to develop and deploy usercreated applications. The IaaS model provides computing resources (e.g., processing power, storage, and network). Further, CC can be deployed in various forms, such as public, private, community, and hybrid clouds (Mell & Grance 2011). A public cloud infrastructure is accessible by the general public and is owned by the CC provider. A private cloud infrastructure is owned and managed by the user/organization. A community cloud is a private one shared by several organizations that have common concerns. The hybrid cloud infrastructure is a combination of two or more private, public, or community clouds that are linked together by standardized technology to ensure data and application portability. In addition to CC providers and users, there are “enablers” or “intermediaries” that manage the relationships between cloud providers and users and that facilitate CC adoption and use (Marston et al. 2011). Regarding CC adoption, “there are significant technical, operational, and organizational issues which need to be tackled […] at the enterprise level” (Marston et al. 2011, p. 184). Hence, there are two views regarding the emergence of CC—optimistic and pessimistic. From an optimistic viewpoint, CC may bring economic, strategic, and technological benefits to organizations (Garrison et al. 2012). Organizations can increase their productivity and focus on their core business activities due to the decreased need to set-up in-house IT infrastructure, thus saving IT-related capital expenditures while achieving business agility (Ernst & Young 2011; Kepes 2011; Garrison et al. 2012). Since CC services are scalable, they adequately suit different users’ needs and environments on a pay-as-you-go subscription basis (Durkee 2010; Mell & Grance 2011). Nevertheless, we cannot neglect the more pessimistic viewpoint that focuses on the potential risks and problems of CC adoption. Commonly identified CC risks are privacy (this includes control over the data, as well as trust, legal, and ethical issues), cultural differences at both corporate and geographical levels, and switching costs resulting from the vendor lock-in problem (Dillon et al. 2010; Ernst & Young 2011; Yang & Tate 2012). To comprehensively evaluate and understand CC adoption by organizations, it is important to look at both the benefits and risks of this process.

3. Theoretical basis Institutional theory is rooted in the social sciences with contributions from various disciplines including economics, political science, organization science, and information systems (IS)/IT studies (Scott 2004; Currie 2009; Mignerat & Rivard 2009). Regarding IS/IT-related phenomena, it is argued that institutional theory has relevance to “understanding the impact of internal and external influences on organizations that are engaged in […] IT-induced change” (Weerakkody et al. 2009, p. 355). In the context of IS research, many studies have utilized institutional theory to “examine IS/IT-related phenomena exemplified in IT innovation, IS development and implementation, and IT adoption and use” (Mignerat & Rivard 2009, p. 1).

Rania El-Gazzar and Fathul Wahid The rationale for choosing institutional theory to construct our analytical framework is twofold: It increases our understanding of “how institutions influence the design, use, and consequences of technologies, either within or across organizations” (Orlikowski & Barley 2001, p. 153) and it captures the notion of irrationality in decision-making through which organizational actors seek legitimacy more than efficiency (Avgerou 2000; Orlikowski & Barley 2001; Mignerat & Rivard 2009). This legitimacy is gained when these actors “accept and follow social norms unquestioningly, without any real reflection” (Tolbert & Zucker 1996, p. 176). In constructing the analytical framework, we focus on several core concepts that are germane to the understanding of CC adoption: field-level changes, isomorphic pressures, strategic responses, and institutional impacts. Each of these concepts is succinctly described below. The relationships among these concepts are depicted in Figure 1.

Figure 1: The core concepts Field-level changes. To obtain acceptance and legitimacy, organizations are required to conform to a set of rules and requirements at the organizational field level (Wooten & Hoffman 2008). The organizational field is defined as “a community of organizations that partakes of a common meaning system and whose participants interact more frequently and fatefully with one another than with actors outside the field” (Scott 2001, p. 84). This may include government, critical exchange partners, sources of funding, professional and trade associations, special interest groups (e.g., industry level), and the general public (Scott 1991). Nonetheless, the concept of an organizational field has been dilated beyond geography and goals to encompass organizations that produce similar services or products (e.g., competitors), consumers, suppliers, and regulatory agencies (DiMaggio & Powell 1983). Changes at this organizational field level trigger various isomorphic pressures to organizations operating in that field. Institutional isomorphism. At the field level, organizations confront powerful forces (i.e., isomorphic pressures) that cause them to become more similar to one another, thus achieving isomorphism (DiMaggio & Powell 1983). Institutional isomorphism is argued as “a useful tool for understanding the politics and ceremony that pervade much modern organizational life” (DiMaggio & Powell 1983, p. 150). Institutional isomorphism manifests in three forms (DiMaggio & Powell 1983): coercive, normative, and mimetic. Coercive isomorphism results from both formal (e.g., regulations) and informal (e.g., culture) pressures exerted on organizations by their legal environment. Normative isomorphism results from pressures exerted by professional associations that define normative rules about organizational and professional behavior. Likewise, universities and professional training institutions produce individuals with similar orientations and educational backgrounds; for instance, an organization might decide to adopt cloud services because its managers learn that cost reduction is a good thing. Mimetic isomorphism results from uncertainties (e.g., goal ambiguity or poor awareness of organizational innovation); organizations are influenced by their competitors in the field and tend to imitate them, expecting similar success. These various isomorphic pressures force organizations to response accordingly and strategically. Strategic responses. A key theme of institutional theory is that “an organization's survival requires it to conform to social norms of acceptable behavior” (Covaleski & Dirsmith 1988, p. 563). At the organizational level, organizations may enact five strategies expressed through tactics to cope with various isomorphic pressures in order to gain, maintain, or repair their legitimacy (Oliver 1991; Suchman 1995). While “early

Rania El-Gazzar and Fathul Wahid adoption decisions of organizational innovations are commonly driven by a desire to improve performance” (DiMaggio & Powell 1983, p. 148), as innovations diffuse, the adoption decision becomes driven by the desire to achieve legitimacy rather than to improve performance (Meyer & Rowan 1977). Legitimacy is defined as the “congruence between the social values associated with or implied by [organizational] activities and the norms of acceptable behavior in the larger social system” (Dowling & Pfeffer 1975, p. 122). These strategic responses are dependent on how organizations interpret the isomorphic pressures that they should conform to. According to Oliver (1991), organizations may (1) just conform to institutional norms through an acquiescence strategy, (2) balance themselves with their institutional environment through a compromise strategy when they confront a conflict between institutional norms and internal organizational objectives, (3) preclude the need for conformity to institutional norms through an avoidance strategy, (4) resist the institutional norms by using a defiance strategy, or (5) seek to import, influence, or control institutional constituents with a manipulation strategy. By relying on these concepts drawn from institutional theory, we build our analytical framework as follows.

4. Constructing an analytical framework In constructing the analytical framework, we place the concepts of institutional theory into the context of CC adoption. In this conceptual paper, our arguments are more descriptive than normative. We examine the plausibility of the framework by bringing in relevant literature on the use of CC in specific and enterprise systems since our focus is on the adoption of CC at the organizational level. Field-level changes. We start by identifying relevant organizations that form the organizational field. Field-level changes, such as the enactment of new government regulations, the ways of collaborating between business partners, and the advent of new CC services, trigger various isomorphic pressures. In the context of CC adoption, it is important to understand “how technologies are embedded in complex interdependent social, economic, and political networks, and how they are consequently shaped by such broader institutional influences” (Orlikowski & Barley 2001, p. 154). Based on our review of the extant literature, we identify the relevant organizations at the field level, which are summarized in Table 1. Table 1: Organizations at the field level Organization CC providers Peer organizations Business partners Professional and industry associations Regulators

Description Various forms of CC (SaaS, PaaS, and IaaS) offered by CC providers, along with their promised benefits and associated potential risks, affect CC adoption. Organizations develop this trust through asking their peers about their perceptions of CC providers’ capabilities and reputations. Business partners (e.g., customers and suppliers) may affect the organization’s decision to adopt CC services in order to keep on their partnership. Professional and industry associations may develop guidelines to facilitate CC adoption, as well as evaluation criteria to select appropriate CC providers. Regulators may enact obligations on CC providers to inform the adopting organizations about the protection of data security, privacy, and integrity. This is more important among government agencies.

References Armbrust et al. (2010) Altaf & Schuff (2010) Heart (2010) Yao et al. (2010) Li et al. (2012) Badger et al. (2011) Kshetri (2012) Marston et al. (2011) Kshetri (2012)

Institutional isomorphism. As stated previously, various isomorphic pressures may be the result of changes at the field level. Coercive pressures may be exerted by other organizations through compulsory power such as parent organizations or trading partners with higher bargaining powers (Chong & Ooi 2008). Other organizations may adopt CC because of their learning process, such as adhering to professional standards or observing earlier adopters. This process, which enables them to see potential benefits that may be harvested (Herhalt & Cochrane 2012), creates a normative isomorphic pressure. They assess and explore the value proposition of CC before making a decision.

Rania El-Gazzar and Fathul Wahid Mimetic pressures may emerge from industry trends, the media, and consultantsâ&#x20AC;&#x2122; influence (Benders et al. 2006). For example, SMEs may lack internal IT expertise, and, consequently, the easiest way for them make a decision about adopting CC is to follow the industry hype or what is suggested by, for example, the media, white papers, and consultants. Table 2 summarizes three types of isomorphic pressure, which result from fieldlevel changes and influence organizationsâ&#x20AC;&#x2122; decisions to adopt or not to adopt CC. Table 2: Institutional isomorphism Isomorphism Coercive

Description Organizations adopt CC for regulatory compliance reasons or because they forced by other organizations through compulsory power.

Normative

Organizations adopt CC because they are influenced by learning processes or a convincing power of other organizations. Organizations adopt CC to become similar to other adopting organizations, without a thorough reflection process.

Mimetic

References Chong & Ooi (2008) Zielinski (2009) Low et al. (2011) Herhalt & Cochrane (2012) Li et al. (2012) Yao et al. (2010) Low et al. (2011) Herhalt & Cochrane (2012) Benders et al. (2006) Parakala & Udhas (2011) Sultan (2011)

Strategic responses. Types of isomorphic pressures, to a great extent, influence a set of possible strategic responses that an organization may choose from (see Table 3). An organization that faces a coercive isomorphic pressure from either its parent or regulatory body most likely has no other choice than to adopt CC (Chong & Ooi 2008). Thus, it will adopt an acquiescence response. This response may also be a result of a proper study conducted by the potential adopters preceding their decision to adopt full implementation of CC (Herhalt & Cochrane 2012). In another extreme, some organizations choose a defiance strategic response by deciding not to adopt CC due to some reason, such as being unsure about the validity of the promises of CC, a lack of customization opportunities, or dissatisfaction with the offerings/pricing by the vendors (Yao et al. 2010; Herhalt & Cochrane 2012). The other possible strategic responses that exist between these two extremes include compromise, avoidance, and manipulation. Table 3: Strategic responses Strategy Acquiescence

Compromise

Avoidance Defiance Manipulation

Example of response Organizations adopt CC with or without any reflection. Some of them conduct a proper study and decide to choose full implementation, while others do so simply by following the norms, business hype, and/or regulatory force. Organizations develop an adoption strategy, such as by adopting CC to run parts of their strategic information systems or by combining public and private/community clouds. Organizations adopt partial implementation and conduct testing of a proof of concept, such as using CC to run parts of their nonstrategic information systems. Organizations decide not to adopt CC at all. Organizations establish their own private or community CC.

References Chong & Ooi (2008) Herhalt & Cochrane (2012)

Parakala & Udhas (2011) Herhalt & Cochrane (2012) Herhalt & Cochrane (2012) Lin & Chen (2012) Ernst & Young (2012) Herhalt & Cochrane (2012) Marston et al. (2011) Brian et al. (2012) Herhalt & Cochrane 2012)

For the compromise strategic response, organizations may develop an adoption strategy (Herhalt & Cochrane 2012); for example, they may decide to adopt hybrid clouds by keeping mission-critical applications on the private/community cloud and transferring noncritical applications to the public cloud (Parakala & Udhas 2011). Some organizations may use avoidance strategic response by adopting partial implementation of CC for purposes of trialability (Herhalt & Cochrane 2012). When an organization decides to adopt a manipulation strategic response, they may establish their own private or community cloud (Herhalt & Cochrane 2012). This strategy is most likely to be adopted by LEs or a group of SMEs that want to have full control over their privacy

Rania El-Gazzar and Fathul Wahid and service quality. A previous study pointed out that LEs are concerned about the service quality of CC and control over their data; hence, they may implement private CC although it requires capital expenditures (Marston et al. 2011). To sum up, based on the core concepts of institutional theory and the extant literature, we have contextualized the analytical framework of CC adoption (see Figure 2, an extension of Figure 1). Our framework provides insights to better understand how and why organizations adopt or do not adopt CC to support their business. We have identified a number of relevant organizations that comprise the organizational field: CC providers, peer organizations, business partners, professional and industry associations, and regulators. We have also revealed possible isomorphic pressures that are relevant to studying the adoption of CC. Further, we have attempted to translate five institutional strategic responses proposed by Oliver (1991) into the context of CC adoption.

Figure 2: The analytical framework

5. Concluding remarks We have presented an analytical framework based on core concepts drawn from institutional theory to understand the adoption of CC among organizations. Although it is supported by plausible arguments based on the extant literature, it should not be viewed as a simple checklist and used mechanically. To avoid this trap, future studies may delve further into each concept by tracing the CC adoption process (Suddaby 2010. It is important to note that while some IT innovations have become successfully embedded and routinized in organizations, some are only used at the ceremonial level to gain legitimacy and are decoupled from everyday practices (Meyer & Rowan 1977; DiMaggio & Powell 1983; Boxenbaum & Jonsson 2008; Currie 2009). This analytical framework does not explicitly pay attention to the how and why of CC adoption. It is important to understand the process of how an organization interprets the field-level changes, and it is equally important to gain insights into why an organization decides to adopt a certain strategic response over others. Both external and internal factors may be considered in this process. These voids could be addressed through empirical investigation and by bringing in other concepts from either institutional theory, such as institutional work (Lawrence & Suddaby 2006) or institutional logic (Thornton & Ocasio, 2008), or other relevant theories, such as the stakeholder theory (Mitchell et al. 1997). Additionally, this phenomenon can be studied by engaging in interpretive research (Suddaby 2010) and by conducting multiple case study (Mills et al. 2006) with carefully selected organizations from various contexts (such as from developed and developing countries and from different industry sectors). Our hope is that the proposed analytical framework can be validated, finetuned, and extended by future research.

References Altaf, F. and Schuff, D. (2010) “Taking a Flexible Approach to ASPs”, Communication of the ACM, Vol. 53, No. 2, pp. 139143. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I. and Zaharia, M. (2010) “A View of Cloud Computing”, Communications of The ACM, Vol. 53, No. 2, pp. 50–58.

Rania El-Gazzar and Fathul Wahid Avgerou, C. (2000) “IT and Organizational Change: An Institutionalist Perspective”, Information Technology & People, Vol. 13, No. 4, pp. 234–262. Badger, L., Bohn, R., Chu, S., Hogan, M., Liu, F., Kaufmann, V., Mao, J., Messina, J., Mills, K., Sokol, A., Tong, J., Whiteside, F., and Leaf, D. (2011) US Government Cloud Computing Technology Roadmap Volume I Release 1.0, National Institute of Standards and Technology, United States. Benders, J., Batenburg, R. and Van der Blonk, H. (2006) “Sticking to Standards: Technical and Other Isomorphic Pressures in Deploying ERP-systems”, Information & Management, Vol. 43, No. 2, pp.194–203. Boxenbaum, E. and Jonsson, S. (2008) “Isomorphism, Diffusion and Decoupling”, in Greenwood, R., Oliver, C., Suddaby, R. and Sahlin-Andersson, K. (eds.) The SAGE Handbook of Organizational Institutionalism, Sage, London, pp. 78–98. Brian, H. A. Y. E. S., Brunschwiler, T., Dill, H., Christ, H., Falsafi, B., Fischer, M. et al. (2008) “Cloud Computing”, Communications of the ACM, Vol. 51, No. 7, pp. 9–11. Catteddu, D. (2010) “Cloud Computing: Benefits, Risks and Recommendations for Information Security”, in Serrão, C., Díaz, V.A. and Cerullo, F. (eds.) Web Application Security, Springer, Berlin Heidelberg. Chong, A.Y.-L. & Ooi, K.-B. (2008) “Adoption of Interorganizational System Standards in Supply Chains: An Empirical Analysis of RosettaNet standards”, Industrial Management & Data Systems, Vol. 108, No. 4, pp. 529–547. Covaleski, M.A. & Dirsmith, M.W. (1988) “An Institutional Perspective on the Rise, Social Transformation, and Fall of a University Budget Category”, Administrative Science Quarterly, Vol. 33, No. 4, pp. 562–587. CSA & ISACA (2012) Cloud Computing Market Maturity, CSA & ISACA. Currie, W. (2009) “Contextualising the IT Artefact: Towards a Wider Research Agenda for IS Using Institutional Theory”, Information Technology & People, Vol. 22, No.1, pp. 63–77. Dillon, T., Wu, C. and Chang, E. (2010) “Cloud Computing: Issues and Challenges”, Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications (AINA), pp. 27–33. DiMaggio, P. and Powell, W. (1983) “The Iron Cage Revisited: Institutional Isomorphism and Collective Rationality in Organizational Fields”, American Sociological Review, Vol. 48, No. 2, pp. 147–160. Dowling, J. and Pfeffer, J. (1975) “Organizational Legitimacy: Social Values and Organizational Behavior”, The Pacific Sociological Review, Vol. 18, No. 1, pp. 122–136. Durkee, D. (2010) “Why Cloud Computing Will Never be Free”, Communications of the ACM, Vol. 53, No. 5, p. 62. Ernst & Young (2011) Cloud Computing Issues and Impacts, Ernst & Young, United Kingdom. Ernst & Young, 2012. Ready for Takeoff: Preparing for Your Journey Into the Cloud, Ernst & Young, United Kingdom. Garrison, G., Kim, S. and Wakefield, R.L. (2012) “Success Factors for Deploying Cloud Computing”, Communications of the ACM, Vol. 55, No. 9, pp. 62–68. Gordon, J., Hayashi, C., Elron, D., Huang, L. and Neill, R. (2010) Exploring the Future of Cloud Computing: Riding the Next Wave of Technology-driven Transformation, World Economic Forum, Geneva. Heart, T. (2010) “Who is Out There ? Exploring the Effects of Trust and Perceived Risk on SaaS Adoption Intentions”, The DATA BASE for Advances in Information Systems, Vol. 41, No. 3, pp. 49–68. Herhalt, J. and Cochrane, K. (2012) Exploring the Cloud: A Global Study of Governments’ Adoption of Cloud, KPMG. Kepes, B. (2011) Cloudonomics: The Economics of Cloud Computing, Diversity Limited, United States. Kshetri, N. (2012) “Privacy and Security Issues in Cloud Computing: The Role of Institutions and Institutional Evolution”, Telecommunications Policy, Vol. 37, Nos. 4/5, pp. 372–386 Lawrence, T.B. and Suddaby, R. (2006) “Institutions and Institutional Work”, in Clegg, S., Hardy, C., Lawrence, T.B., and Nord, W.R. (eds.) The SAGE Handbook of Organization Studies, Sage, London, pp. 215–254. Lee, C., Mckean, J. and King, J. (2011) CIO Global Cloud Computing Adoption Survey Results, CIO/IDG Research Services. Li, M., Yu, Y., Zhao, L. J., Li, X. (2012) “Drivers for Strategic Choice of Cloud Computing as Online Services in SMEs”, Proceedings of the International Conference on Information Systems (ICIS). Lin, A. and Chen, N. (2012) “Cloud Computing as an Innovation: Perception, Attitude, and Adoption”, International Journal of Information Management, Vol. 32, No. 6, pp. 533–540. Low, C., Chen, Y. and Wu, M. (2011) “Understanding the Determinants of Cloud Computing Adoption”, Industrial Management & Data Systems, Vol. 111, No. 7, pp. 1006–1023. Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J. and Ghalsasi, A. (2011) “Cloud Computing: The Business Perspective”, Decision Support Systems, Vol. 51, No. 1, pp. 176–189. Mell, P. and Grance, T. (2011) The NIST Definition of Cloud Computing, National Institute of Standards and Technology, United States. Meyer, J.W. and Rowan, B. (1977) “Institutionalized Organizations : Formal Structure as Myth and Ceremony”, American Journal of Sociology, Vol. 83, No. 2, pp. 340–363. Mignerat, M. and Rivard, S. (2009) “Positioning the Institutional Perspective in Information Systems Research”, Journal of Information Technology, Vol. 24, No. 4, pp. 369–391. Mills, M., Van De Bunt, G.G. and De Bruijn, J. (2006) “Comparative Research: Persistent Problems and Promising Solutions”, International Sociology, Vol. 21, No. 5, pp. 619–631. Mitchell, R.K., Agle, B.R. and Wood, D.J. (1997) “Toward a Theory of Stakeholder Identification and Salience: Defining the Principle of Who and What Really Counts”, Academy of Management Review, Vol. 22, No. 4, pp. 853–886. Mouritsen, J. (1994) “Rationality, Institutions And Decision Making: Reflections on March and Olsen’s Rediscovering Institutions”, Accounting Organizations and Society, Vol. 19, No. 2, pp. 193–211.

Rania El-Gazzar and Fathul Wahid Oliver, C. (1991) “Strategic Responses to Institutional Processes”, Academy of Management Review, Vol. 16, No. 1, pp. 145–179. Orlikowski, W.J. and Barley, S.R. (2001) "Technology and Institutions: What Can Research on Information Technology and Research on Organizations Learn From Each Other?", MIS Quarterly, Vol. 25, No. 2, pp. 145–165. Parakala, K. and Udhas, P. (2011) The Cloud: Changing the Business Ecosystem, KPMG. Scott, W.R. (2004) “Institutional Theory: Contributing to a Theoretical Research Program”, in Smith, K.G. and Hitt, M.A. (eds.) Great Minds in Management: The Process of Theory Development, pp. 460–484, Oxford University Press, Oxford, UK. Scott, W.R. (2001) Institutions and Organizations, Sage, London. Scott, W.R. (1991) “Unpacking Institutional Arguments”, in Powell, W.W. and DiMaggio, P.J. (eds.) The New Institutionalism in Organizational Analysis, University of Chicago Press, Chicago, pp. 143–163. Son, I., Lee, D., Lee, J. N. and Chang, Y.B. (2011). “Understanding the Impact of IT Service Innovation on Firm Performance: The Case of Cloud Computing”, Proceedings of the Pacific Asia Conference on Information Systems (PACIS). Suchman, M.C. (1995) “Managing Legitimacy: Strategic and Institutional Approaches”, Academy of Management Review, Vol. 20, No. 3, pp. 571–610. Suddaby, R. (2010) “Challenges for Institutional Theory”, Journal of Management Inquiry, Vol. 19, No. 1, pp. 14–20. Sultan, N.A. (2011) “Reaching for the “cloud”: How SMEs Can Manage”, International Journal of Information Management, Vol. 31, No. 3, pp. 272–278. Swanson, E.B. and Ramiller, N.C. (2004) “Innovating Mindfully with Information Technology”, MIS Quarterly, Vol. 28, No. 4, pp. 553–583. Thornton, P.H. and Ocasio, W. (2008), "Institutional Logics", in Greenwood, R., Oliver, C., Suddaby, R. and SahlinAndersson, K. (eds.) The SAGE Handbook of Organizational Institutionalism, Sage, London, pp. 99-129. Timmermans, J., Ikonen, V., Stahl, B.C. and Bozdag, E. (2010) “The Ethics of Cloud Computing: A Conceptual Review”, Proceedings of the IEEE Second International Conference on Cloud Computing Technology and Science, pp. 614–620. Tolbert, P.S. and Zucker, L.G. (1996) “The Institutionalization of Institutional Theory”, in Clegg, S.R., Hardy, C. and Nord, W.R. (eds.) Handbook of Organization Studies, Sage, London, pp. 175–190. Venters, W. and Whitley, E.A. (2012) “A Critical Review of Cloud Computing: Researching Desires and Realities”, Journal of Information Technology, Vol. 27, No. 3, pp. 179–197. Weerakkody, V., Dwivedi, Y.K. and Irani, Z. (2009) “The Diffusion and Use of Institutional Theory: A Cross-Disciplinary Longitudinal Literature Survey”, Journal of Information Technology, Vol. 24, No. 4, pp. 354–368. Weinhardt, C., Anandasivam, D.I.W.A., Blau, B., Borissov, D.I.N., Meinl, D.M.T., Michalk, D.I.W.W. and Stößer, J. (2009) “Cloud Computing: A Classification, Business Models, and Research Directions”, Business Information Systems Engineering, Vol. 1. No. 5, pp. 391–399. Wooten, M. and Hoffman, A.J. (2008) “Organizational Fields: Past, Present and Future”, in Greenwood, R., Oliver, C., Suddaby, R. and Sahlin-Andersson, K. (eds.) The SAGE Handbook of Organizational Institutionalism, Sage, London, pp. 130–147. Yang, H. and Tate, M. (2012) “A Descriptive Literature Review and Classification of Cloud Computing Research”, Communications of the Association for Information Systems, Vol. 31, No. 1, pp. 35-60. Yao, Y., Watson, E. and Khan, B. K. (2010) “Application Service Providers: Market and Adoption Decisions”, Communications of the ACM, Vol. 53, No. 7, pp. 113–117. Zielinski, D. (2009) “Be Clear on Cloud Computing Contracts”, HR Magazine, Vol. 54, No. 11, pp. 63–65.

A Privacy Preserving Profile Searching Protocol in Cloud Networks Sangeetha Jose, Preetha Mathew and Pandu Rangan Indian Institute of Technology Madras, Chennai, Tamil Nadu, India – 600 036 sangeethajosem@gmail.com preetha.mathew.k@gmail.com prangan55@gmail.com

Abstract: The cloud providers and the users should ensure the security of their own resources which leads to myriad of security issues. This paper concerns with the security of the user who demands privacy. In cloud networks substantial amount of information need to be shared and communicated, while preserving the privacy of the users. Therefore, a demand to preserve the privacy of each user must be ensured in the cloud. Privacy preserving profile searching (PPPS) is one of the key problems in cloud networks. The users, who have permission can search and access the private information that are encrypted by other users which are stored in different clouds. From the literature it is found that PPPS is closely related to the concept of oblivious transfer with hidden access control (OT-HAC). OT-HAC can be constructed by a blind anonymous identity based encryption (BAIBE). BAIBE can be obtained by an anonymous identity based encryption (IBE) with blind key extraction (BKE) protocol. In BKE, the extraction of the secret key of the user is done in a blinded manner. Moreover, an IBE scheme must be anonymous so that there is no way to link the identity which is used to encrypt the message and the ciphertext. In this paper we propose an efficient provably secure BAIBE which can be used for PPPS in cloud networks. The construction of BKE is based on BLS (Boneh, Lynn and Shacham) blind signature. It is combined with a variant of Boneh-Franklin anonymous IBE to achieve the BAIBE. We prove the security of the system by reducing into number theoretic hardness assumptions. With minimal management effort, this system provides secure and convenient cloud network access while preserving privacy of the user. In short, the proposed BAIBE scheme ensures security, integrity and privacy in cloud networks Keywords: Cloud Computing, Identity Based Encryption, Blind Key Extraction, Blind Signature, Blind Anonymous Identity Based Encryption

1. Introduction Cloud computing makes the long held dream of utility computing a reality. It is an abstraction based on the notion of pooling physical resources and presenting them as a virtual resource(Sosinsky, 2011). According to NIST(National Institute of Standards and Technology), cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing resources. It has all vulnerabilities associated with internet applications and also vulnerabilities arise from pooled, virtualized and outsourced resources. The troublesome areas highlighted in cloud computing are auditing, data integrity, privacy recovery etc. This paper investigates the measures to protect the privacy of the users in the cloud network. One such area is privacy preserving profile searching (PPPS). Only the users, who have permission, can access the private information of the other users which are encrypted and stored already. The privacy preserving profile searching (PPPS) problem can be defined as follows (Lin, Chow, Xing, Fang, & Cao, 2011). Suppose there are two persons say P1, P2 and P1 wants to access the profile of P3, who is a friend in the friend list of P2. Friends in the friend list of P2 are hidden from P1 (Hidden Access Control, HAC (Frikken, Atallah, & Li, 2004)). If P2 is having the requested friend’s (P3) profile and P2 is ready to transfer the information, P1 can get only the profile of P3 from P2. In this case the sender P2 should remain oblivious to which profile has been transferred to P1 (Oblivious Transfer, OT (Crepeau, 2006), (Kilian, 1988), (Rabin, 2005)). Therefore PPPS is closely related to the concept oblivious transfer with hidden access control (OT-HAC) (Camenisch, Dubovitskaya, Neven, & Zaverucha, 2011). In the cloud network one user U1 wants to access the encrypted data which is stored by other user U2, which is hidden from U1. U1 gives request to the cloud and if U2 is having that data and is ready to transfer the data, U1 can get the data. U1 will not learn any other information other than the requested data. U2 will not know that which information is transferred to U1 which is same as oblivious transfer with hidden access control. As already mentioned, OT-HAC can be constructed by anonymous identity based encryption (IBE) with blind key extraction (BKE) protocol (Green & Hohenberger, 2007), (Lin, Chow, Xing, Fang, & Cao, 2011). An IBE scheme is anonymous (Abdalla, et al., 2005) if there is no way to link the identity which is used to encrypt the message with the ciphertext. That is, on seeing the ciphertext no one can link it with a particular identity (ID) of recipient. In the proposed system it is not possible to link the identity of the user which is used for encrypting the message by viewing ciphertext, hence it is anonymous. Also private key generator (PKG) generates the secret key without knowing the user’s identity. Therefore, the resultant system is a blind

Sangeetha Jose, Preetha Mathew and Pandu Rangan anonymous IBE. The proposed system is also efficient because there is a reduction in the number of pairing operations and cipher text and key components. The security of the provably secure cryptosystem need to be reduced to number theoretic hard problems like discrete logarithm (DL), computational Diffie-Hellman (CDH) problem etc. in case of multiplicative groups. When elliptic curve are used the hard problem assumptions are bilinear computational Diffie-Hellman (BDH), Decisional bilinear Diffie-Hellman (DBDH) and its variants. The security is proven either in the random oracle model or in standard model (complexity-based proof). Bellare and Rogaway (Bellare & Rogaway, 1993) formalised the well-known random oracle model (ROM), in which one assumes that hash function is replaced by a publicly accessible random function (the random oracle). This means that the adversary cannot compute the result of the hash function by himself; he must query the random oracle.

1.1 Related work Identity based encryption (IBE) is proposed by (Shamir, 1984) and is realised by (Boneh & Franklin, 2001). In IBE, user will get his/her secret key from PKG by providing his/her identity to the key extract protocol. (Green & Hohenberger, 2007) proposed a blind key extraction (BKE) protocol for extracting the secret key of a user in a blinded manner (blinded from PKG). They formalized the above notion as blind IBE and discussed many applications like privacy preserving delegated keyword search, partially blind signature scheme, temporary anonymous identities so on. They also constructed oblivious transfer (OT) scheme from the blind IBE scheme under DBDH (Decisional Bilinear Diffie-Hellman) assumption. An anonymous IBE scheme proposed by (Boyen & Waters, 2006) which is selective identity secure in the standard model. Camenish et al. (Camenisch, Kohlweiss, Rial, & Sheedy, 2009) designed a committed blind anonymous IBE scheme based on (Boyen & Waters, 2006) anonymous IBE. Lin et al. (Lin, Chow, Xing, Fang, & Cao, 2011) proposed another BKE protocol for an anonymous IBE by (Ducas, 2010) in which the protocol uses zero knowledge proof of knowledge (ZKPoK) with increased efficiency. The security requirement of this scheme relies on the inverse symmetric external DiffieHellman (SXDH) assumption.

1.2 Motivation Cloud computing should provide keen attention towards the privacy and security of the data because of its shared nature. Data has to be transmitted and stored securely across cloud networks, while preserving privacy of the user. Also, we need to reduce the computation and communication cost in the cloud environment. Even though there are a few blind anonymous identity based encryption schemes in the literature (Green & Hohenberger, 2007), (Camenisch, Kohlweiss, Rial, & Sheedy, 2009), (Lin, Chow, Xing, Fang, & Cao, 2011), existing systems are not suitable in the cloud networks because of their complexity in computation and communication. Short signatures are needed in cloud environment because of the bandwidth constraints in the communication channel. By using blinded version of BLS signature, 1024 bit security can be obtained with 160 bits long signature. This is considered to be the shortest among all signature schemes. The number of secret key components as well as the ciphertext components is less in the proposed scheme. Also, the decryption process requires simple computation as compared with existing schemes. The comparison with existing schemes is given in section 4. Security proof of the proposed system is done in random oracle model.

1.3 Organization of the paper Section 2 deals with the technical preliminaries related to the hardness assumptions used in the paper. Section 3 gives the definition of IBE and blind anonymous IBE. Section 4 includes proposed blind anonymous IBE (BAIBE), its proof of security and the comparison with the existing system. The paper concludes with its future scope in Section 5.

2. Preliminaries 2.1 Bilinear pairing    

Let G1 be an additive cyclic prime order group q with generator P and G2 also be an additive cyclic

group of the same prime order q. A map e:G1×G1→G2 is said to be a bilinear pairing if the following following properties hold.

Bilinearity: For all P∈G1 and a,b∈RZq*, aP,bP=e(P,P)ab .

100

Sangeetha Jose, Preetha Mathew and Pandu Rangan 

Non-degeneracy: For all P∈G1, eP,P≠IG2 where IG2 is the identity element of G2 .



Computability: There exists an efficient algorithm to compute eP,Pfor all P∈G1.

Security proof of the proposed scheme is based on bilinear Diffie-Hellman (BDH) and chosen-target computational Diffie-Hellman (CT-CDH) assumptions.

2.2 Computational Diffie-Hellman (CDH) problem and assumption CDH problem states that given (P,aP,bP), compute abP, where P∈G1 and a,b∈RZq*. Definition 1: (CDH assumption): The advantage of any probabilistic polynomial time algorithm A in solving the CDH problem in G1 is defined as AdvACDH=Prob[abP←A(P,aP,bP)|P∈G1 and a,b∈RZq*] The computational Diffie-Hellman (CDH) assumption is that, for any probabilistic polynomial time algorithm A, the advantage AdvACDH is negligibly small (ε).

2.3 Chosen-target computational Diffie-Hellman (CT-CDH) problem and assumption Boldyreva (Boldyreva, 2003) proposed chosen-target computational Diffie-Hellman (CT-CDH) problem and assumption as follows. Definition 2: (Chosen-target CDH problem and assumption): Let G1 be a cyclic additive group of prime order q with generator P. Let s∈RZq* and let Ppub=sP. Let H be a random instance of a hash function family H1:{0,1}*→G1*. The adversary A is given q,P,Ppub,H and has access to the target oracle T0 that returns random points Qi∈RG1 and a helper oracle(.)x. Let qt and qh be the number of queries made to the target oracle and helper oracle respectively. The CT−CDH advantage of the adversary attacking the chosen-target CDH problem AdvACT-CDH is defined as the probabilityof adversary A to output a set V of, say, l pairs {v1,j1,…,vl,jl}, where for all 1≤i≤l,∃1≤ji≤qt such that vi=sQji, all vi are distinct and qh<qt.The chosen-target CDH assumption states that there is no polynomial-time adversary A with non-negligible AdvACT-CDH.If the adversary makes one query to the target oracle then the chosen-target CDH assumption is equivalent to the standard CDH assumption. The chosen-target CDH assumption is hard for all groups where standard CDH problem is hard.

2.4 The bilinear Diffie-Hellman (BDH) problem and assumption Let G1,G2 be two groups of prime order q. Let e:G1×G1→G2 be a bilinear map and P be a generator of G1. The BDH problem is as follows: Given (P,aP,bP,cP), compute W=e(P,P)abc∈G2, where P∈G1 and a,b,c∈RZq*. Definition 3: (BDH assumption): The advantage of any probabilistic polynomial time algorithm A in solving the BDH problem in G1 is defined as AdvABDH=Prob[A(P,aP,bP,cP)←e(P,P)abc|P∈G1 and a,b,c∈RZq*]≥ε The bilinear Diffie-Hellman (BDH) assumption is that, for any probabilistic polynomial time algorithm A, the advantage AdvABDH is negligibly small (ε).

3. Definition of blind anonymous identity based encryption 3.1 Identity based encryption The definition of identity-based encryption (IBE) (Boneh & Franklin, 2001) is as follows. An identity based encryption scheme, Π consists of four probabilistic polynomial-time algorithms (SetUpIBE, KeyExtractIBE , EncryptIBE, DecryptIBE): 

SetUpIBE(1κ): Given a security parameter κ, the private key generator (PKG) generates the public parameters params and his master key pair (MSK, MPK) of the system.

101

Sangeetha Jose, Preetha Mathew and Pandu Rangan 

KeyExtractIBE (params, MSK, ID): Given an identity ID of a user, the PKG computes the corresponding secret key DID for identity ID and transmits it to the user in a secure way.



EncryptIBE(M, params, ID): This algorithm takes the public parameters params, user identity ID and a message M as input. The output of this algorithm is the ciphertext C encrypting M under ID.



DecryptIBE (C, params, DID): This algorithm outputs the message M which is decrypted from C with the help of user secret key, DID.

3.2 Blind anonymous identity based encryption 

IBE is said to be anonymous if it is not possible to identify the receiver by viewing the ciphertext. A blind anonymous identity based encryption can be constructed using the KeyExtract algorithm of anonymous IBE in a blind way. In IBE schemes, PKG generates secret key of the user with identity ID by executing key extraction algorithm KeyExtractIBE . Green et al. (Green & Hohenberger, 2007) formalized the notion of blind key extraction in blind IBE. In blind key extraction user obtains his secret key without revealing his ID to PKG. Thus a blind IBE can overcome the key escrow problem of standard IBE. Blind IBE consists of same three algorithms (SetUpIBE, EncryptIBE, DecryptIBE) as that of standard IBE, except it replaces KeyExtractIBE with BlindKeyExtract algorithm. The protocol for BlindKeyExtract is as follows.



BlindKeyExtract(PKG(params, MSK), U(params, ID)): An honest user U with identity ID can secure the secret key DID by obscuring his ID from PKG; otherwise outputs an error message.



The security requirement for BlindKeyExtract should satisfy the following two properties:



Leak-free extract: A malicious user cannot learn anything by executing the BlindKeyExtract protocol with an honest PKG. By executing this algorithm it won’t leak any information other than the user’s secret key DID which is generated.



Selective-failure blindness: A malicious authority, PKG cannot learn anything about the user’s choice of identity during the BlindKeyExtract protocol. This happens due to the fact that ID is providedto PKG only after blinding.



BlindKeyExtract can be constructed with the help of blind signatures. In IBE, the secret key extracted is the signature of the PKG on the public key ID of the user. Therefore, unforgeability property of the signature ensures that the user by himself cannot generate the secret key DID without PKG’s help. The blindness property of the blind signature ensures that signer (PKG) will not learn any information regarding the message during signing process.



The security notions widely used in the provably secure approach for the encryption schemes are derived from the subset of the cross product of the goal to be achieved and the attack model. The goal to be achieved belongs to the set invertibility (INV), indistinguishability (IND) and nonmalleablity (NM). The attack models fall into CPA (Chosen Plain Text attack) and CCA (Chosen Cipher Text attack). Mix-andmatch of the goals (INV, IND, NM) and attack models (CPA, CCA) in any combination give rise to the notions of security.

4. Proposed blind anonymous IBE (BAIBE) 

This construction uses Boneh-Franklin IBE system (Boneh & Franklin, 2001) which is known to be inherently anonymous and anonymity of their scheme is followed directly from the proof of semantic security. BKE is designed by a variant of blind version of BLS signature scheme (Boneh, Lynn, & Shacham, 2001), (Boldyreva, 2003). SetUpBAIBE, BlindKeyExtractBAIBE, EncryptBAIBE , DecryptBAIBE are the algorithms of the scheme.



SetUpBAIBE(1κ)



This algorithm takes the security parameter 1κas input and generates system parameters called params, along with a master key pair (MSK, MPK) of the PKG who helps to generate secret key for user.



Let G1 and G2are two additive prime order group of order q and e:G1×G1→G2 is a bilinear map. Choose the generator P∈G1. Select s∈RZq* and compute Ppub=sP where s is the master secret key (MSK) and Ppub is the master public key (MPK).

102

Sangeetha Jose, Preetha Mathew and Pandu Rangan 

Choose two cryptographic (collision and pre-image resistant) hash functions H1 and H2 where H1: 0,1→G1 and H2: G2→{0,1}n. Message space is M={0,1}n where n is the length of the string. Thus this algorithm outputs the system parameters,



params=(q,G1,G2,e,n,P,Ppub,H1,H2). 

BlindKeyExtractBAIBE (params,MSK,ID) Blind key extraction algorithm is performed by an interactive protocol between PKG and the user, which takes user’s unique identity ID∈{0,1}* which may be his email id or some unique information, as input. The user is giving his ID in a blinded manner to PKG and PKG signs on the blinded ID by using his master secret key MSK, and generates a blind signature. Later this blind signature is unblind by user and the obtained output is the private key of the user, DID. The algorithm is as follows.  Blinding: User randomly chooses r∈RZq* and computes ID'=H1ID+rP. User sends ID' to the PKG. 

Signing: PKG computes σ'=sID' and sends back σ' to the user.



Unblinding: User unblinds σ' to σ as, σ=σ'-rPpub=sH1(ID).

 User can verify the validity of (ID,) as ePpub,H1ID=e(P,σ). To show the correctness of verification algorithm, the equation can be expanded as follows. 

Note that σ=σ'-rPpub=sID'-rsP=sH1ID+rP-rsP=sH1(ID)

 

Therefore,



ePpub,H1ID=e(P,σ)



L.H.S= esP,H1ID=eP,sH1ID=eP,σ=R.H.S



Public key of the user, PK=p0,p1=QID,rQID, where QID=H1(ID)



Secret key of the user, SK=DID=d0,d1=sH1(ID),r=sQID,r

EncryptBAIBE(M, params,PK)  Encryption of message is by using user’s public key, PK. Select a random k∈RZq* and encrypt the message as C=(U,V)=(kP,M⊕H2(e(p0+p1,Ppub)k))



DecryptBAIBE(C, params,SK) 

Decryption of ciphertext, C=(U,V)is by using user’s secret key, SK is as follows.



Compute V⊕H2ed0+d0d1,U=M.

 

Correctness of decryption is shown as follows. L.H.S= V⊕H2ed0+d0d1,U



=M⊕H2(e(p0+p1,Ppub)k)⊕H2ed0+d0d1,U



=M⊕H2(e(p0+p1,Ppub)k)⊕H2esH1(ID)+sH1(ID)r,kP

103

Sangeetha Jose, Preetha Mathew and Pandu Rangan 

=M⊕H2(e(p0+p1,Ppub)k)⊕H2eQID+QID r,ksP



=M⊕H2(e(p0+p1,Ppub)k)⊕H2(e(p0+p1, Ppub)k)



4.1 Proof of Security In order to preserve the privacy of the user in the cloud network, proposed BAIBEsystem consists of mainly two parts. We use BlindKeyExtractBAIBE protocol for the blind extraction of secret key of the user and an anonymous IBE as encryption system. The above protocol can be constructed based on blind signature. The notion of security of blind signatures should satisfy two properties, unforgeability and blindness. Blindness ensures that signer will not learn any information regarding the message during signing process. Here the message is replaced by the identity (ID) of the user for extracting his secret key. Unforgeability ensures that the user’s secret key is generated by the help of the master secret key of the PKG and also the user by himself cannot generate his secret key. In this paper a blind version of BLS signature, which is secure against one-more forgery under chosen target CDH assumption is used as BKE. The blindness property can be ensured as the signer receives only random element (ID’) in G1 which is obtained by blinding the ID with a random element and therefore ID of the user is not revealed. Security proof for unforgeability of this blind signature is given in (Boldyreva, 2003). One component of the secret key of the user is obtained as d0 after the unblinding process. But there is a slight difference in the proposed system. One more secret key component d1 which is bound to the public key component p1 of the user is needed to be used. This prevents the signer from the full awareness of the secret key of the user which overcomes the key escrow problem which is an inherent drawback of IBE system. This feature is very much essential in cloud network in order to preserve the privacy of the user. Anonymous IBE scheme used in the proposed scheme is a variant of Boneh-Franklin IBE (Boneh & Franklin, 2001). A crucial observation is that in the proposed system only component of ciphertext, V=M⊕H2(ep0+p1,Ppub)k bears the recipient identity. Other ciphertext component, U=kP is just a random group element in G1 which gives no idea about the recipient identity. V is bound with a random element k and hence no one is able to get recipient identity by viewing the ciphertext unless he solves bilinear Diffie-Hellman problem. Therefore anonymity is ensured.Semantic security (IND-CPA) of the scheme can be reduced in to BDH assumption in random oracle model based on the fact that V is indistinguishable from random without the private key,DID as given in Boneh-Franklin IBE. There is only a slight difference in the parameter of hash function. Hash function H2 consists of two public key components (p0,p1) rather than a single public key QID in Boneh-Franklin IBE. But H2 hash queries can be simulated in a similar way as that of Boneh-Franklin IBE. It can be made IND-CCA by using the transformation proposed in (Galindo, 2005).

5. Comparison As shown in Table 1, the proposed system has various advantages as shown in Table 1. Number of secret key and the ciphertext components are less in the proposed scheme. The number of pairings which is used in decryption process is less which reduces computation cost which is a crucial requirement in cloud environment. Table 1: Comparison with existing systems Scheme Camenish et al.(Camenisch, Kohlweiss, Rial, & Sheedy, 2009) Green et al.(Green & Hohenberger, 2007) Lin et al.(Lin, Chow, Xing, Fang, & Cao, 2011) Proposed scheme

# secret key components

# ciphertext components

# pairingsin decryption

104

Sangeetha Jose, Preetha Mathew and Pandu Rangan

6. Conclusion In this paper an efficient blind anonymous identity based encryption (BAIBE) for preserving privacy of the user in the cloud network is proposed and its security is also proven. This can be used in anonymously reading database without breach of security and also for controlling access to a sensitive resource like medical database. It is an open problem whether blind anonymous IBE (BAIBE) can be extended to blind anonymous hierarchical identity based encryption scheme (BAHIBE). BAHIBE without random oracle is another relevant open research problem.

References Abdalla, M., Bellare, M., Catalano, D., Kiltz, E., Kohno, T., Lange, T., et al. (2005). Searchable Encryption Revisited: Consistency Properties, Relation to Anonymous IBE, and Extensions. Advances in Cryptology,CRYPTO'05. Lecture Notes in Computer Science, Springer. Bellare, M., & Rogaway, P. (1993). Random Orales are Practical: A Paradigm for Designing Efficient Protocols. 1st ACM conference on Computer and Communications Security, CCS'93 (pp. 62-73). Proceedings of the 1st ACM conference on Computer and Communications Security. Boldyreva, A. (2003). Threshold Signature, Multisignature and Blind Signatures Based on the Gap-Diffie-Hellman-Group Signature Scheme. Public Key Cryptology (pp. 31-46). Lecture Notes in Computer Science, Springer. Boneh, D., & Franklin, M. (2001). Identity-Based Encryption from the Weil Pairing. CRYPTO'01 (pp. 213-229). Lecture Notes in Computer Sciencce, Springer. Boneh, D., Lynn, B., & Shacham, H. (2001). Short Signatures from the Weil Pairing. ASIACRYPT (pp. 514-532). Lecture Notes in Computer Science, Springer. Boyen, X., & Waters, B. (2006). Anonymous Hierarchical Identionty-Based Encryption(without random oracles). CRYPTO'06 (pp. 290-307). Volume 4117 of Lecture Notes in Computer Sciencce, Springer. Camenisch, J., Dubovitskaya, M., Neven, G., & Zaverucha, G. M. (2011). Oblivious Transfer with Hidden Access Control Policies. Public Key Cryptography, (pp. 192 - 209). Camenisch, J., Kohlweiss, M., Rial, A., & Sheedy, C. (2009). Blind and Anonymous Identity-Based Encryption and Authorised Private Searches on Public Key Encrypted Data. Public Key Cryptography'09 (pp. 196-214). Lecture Notes in Computer Science, Springer. Crepeau, C. (2006). Equivalence Between Two Flavours of Oblivious Transfers. Advances in Cryptology (pp. 350-354). Proceedings of Crypto '87, volume 293 of Lecture Notes in Computer Science, Springer-Verlag. Ducas, L. (2010). Anonymity from Asymmetry:New Constructions for Anonymous HIBE. CT-RSA 2010 (pp. 148-164). Lecture Notes in Computer Science, Springer. Frikken, K., Atallah, M., & Li, J. (2004). Hidden Access Control Policies With Hidden Credentials. WPES '04. Proceedings of the 2004 ACM Workshop on Privacy in the Electronic Society. Galindo, D. (2005). Boneh-Franklin Identity Based Encryption Revisited. ICALP (pp. 791-802). Lecture Notes in Computer Science, Springer. Green, M., & Hohenberger, S. (2007). Blind Identity-Based Encryption and Simulatable Oblivious Transfer. ASIACRYPT'07 (pp. 265-282). Lecture Notes in Computer Science, Springer. Kilian, J. (1988). Founding Cryptography on Oblivious Transfer. STOC 88, (pp. 20-31). Lin, H., Chow, S. S., Xing, D., Fang, Y., & Cao, Z. (2011). Privacy Preserving Friend Search over Online Social Networks. IACR Cryptology ePrint Archieve2011:445. Rabin, M. O. ( 2005). How To Exchange Secrets with Oblivious Transfer. Cryptology ePrint Archive 2005: 187. Shamir, A. (1984). Identity-Based Cryptosystems and Signature Schemes. CRYPTO'84 (pp. 47-53). Volume 196 of Lecture Notes in Computer Science. Sosinsky, B. (2011). Cloud Computing Bible. Wiley Publishing, Inc.

ď&#x201A;§

105

On Providing User-Level Data Privacy in Cloud Madhuri Revalla1, Ajay Gupta1 and Vijay Bhuse2 1 Western Michigan University, Kalamazoo, MI, USA 2 East Tennessee State University, Johnson City, TN, USA madhuri.revalla@wmich.edu ajay.gupta@wmich.edu bhuse@etsu.edu

Abstract: Privacy and security of user data is a big concern in cloud computing because of factors such as sharing of resources, use of virtualized environments and access of services through the Internet. The fact that the user does not have control over the data leads to too many privacy and security concerns. It is important to consider the facts like how to encrypt the data and who is responsible for encryption. The data can be encrypted either by the user or the cloud provider. User should trust the provider if the cloud provider provides encryption. To ensure data privacy, it is a good idea that the user encrypts the data and stores it in the cloud. Public Key Infrastructure (PKI) provides strong encryption, but does not replace the need for authentication. Hence, a strong authentication system is required along with encryption. In this paper, we analyze and propose a multi-factor authentication along with PKI, to achieve the most desirable security properties such as authentication, integrity, and confidentiality. This mechanism is based on the three factor authentication: cryptographic key that user knows, smart card that the user has, and the biometrics that the user is. Our research involves investigation of various types of cloud specific security and privacy issues, and finds techniques that enhance security in cloud environment. This paper thus also discusses some preliminary ideas such as isolation failure and service interruption due to the support of virtualized environments. Multiple users share the cloud infrastructure, and this co-tenancy also introduces vulnerabilities in cloud computing. A malicious user can exploit co-tenancy to perform side-channel attacks to get another user’s confidential information via information leakage. Our research also explores different types of security threats specifically related to the cloud-computing environment. Keywords: Cloud computing, Confidentiality, Data privacy, Integrity, Threat and risk models

1. Introduction In cloud computing, cloud provider provides scalable and dynamic services, applications, and computing resources to the users over the Internet on-demand basis. All of these resources and services are delivered from the data center, which typically consists of a large number of applications accessed by many users. Due to the flexibility and instant deployment of the user’s needed services with the low cost and pay per use model, cloud computing became very popular. According to NIST, cloud computing is defined as follows “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Peter & Timothy, 2011). Also, NIST specifies five characteristics of cloud computing (Peter & Timothy, 2011): 

On-demand self-service



Broad network access



Resource pooling



Rapid elasticity



Pay-per-use

The traditional security techniques are not capable enough to handle security in cloud computing due to various reasons such as virtualized nature of the cloud and the lack of control over the data. In this paper, we first present a brief summary of threat models and security requirements along with previous work on addressing security concerns in cloud computing. We then propose a multi-factor authentication mechanism using PKI to enhance trust and privacy of the user’s data.

1.1 Traditional networks vs. Cloud computing Demilitarized zone (DMZ) contains all external network accessible systems (Internet accessible systems) and protects the internal network. DMZ can be separated from the Internet and internal network with the two firewalls as shown in Figure 1 (Bauer, 2007). The services provided on an external network such as web

106

Madhuri Revalla, Ajay Gupta and Vijay Bhuse servers, mail servers, DNS servers and ftp servers should be located in a secure area DMZ (Frey, 2009). The firewalls are located between internal network and DMZ, Internet and DMZ.

Figure 1: Traditional network setup with DMZ and two firewalls (Frey, 2009) In traditional networks, the servers are directly connected to the DMZ network and firewalls protect their connections to the internal and external networks. In order to access the internal network, an attacker should be able to get through the firewalls. Even if the host or server in DMZ network is compromised, the attacker still needs to get through the firewall to access internal network (Siebert, 2011). Firewalls help to prevent unauthorized access to the internal network from machines or people outside of the network. In cloud computing, users should include firewalls to protect their own network, but the traditional method of placing servers will not work for cloud. Cloud users use physical / hardware firewalls to protect the systems in their local network (Rack space, 2011). But these are not suitable for virtual machines. The customers should install software firewalls on individual machine to protect that particular machine. This paper is aimed at investigating various types of attacks and the risks associated with the threats. The rest of the paper is organized as follows: Section 2 provides the threat model to analyze and assess the threats. Section 3 presents the security requirements for cloud computing. In Section 4, we discuss the background work related to the security for cloud computing. Section 5 presents the proposed approach and Section 6 concludes and discusses some future work.

2. Threat model A threat model helps in analyzing a security problem, design mitigation strategies, and evaluate solutions (Priya & Geeta, 2011). To analyze or assess threats, a systematic approach is required. 

Step 1: Identify threats by examining assets, vulnerabilities, and attackers.

Assets – In cloud computing, an attacker can target the data stored in the cloud, computations performed on the cloud, VM’s running on the cloud, and cloud infrastructure. Vulnerabilities – Check vulnerabilities of cloud applications. Attackers - In cloud computing, attackers can be either insider or outsider. An insider can be any valid user in the cloud such as cloud providers or cloud consumers. Network based firewalls are used to protect the internal network from the outsiders, but these firewalls cannot protect the network from malicious insiders. Network Firewalls cannot distinguish the hosts behind the firewall (Abhinay, et al., n.d.). Software firewalls installed on host machines help to analyze traffic flow inside the network and protect from unauthorized access. However, software firewalls cannot completely prevent attacks from insider, a malicious insider can detect and terminate the firewalls running on host machine (Rumelioglu, 2005) (Microsoft TechNet, 2010). Access Control Lists (ACLs) and Network Access Control (NAC) are help to protect the internal network. By giving certain permissions or access to the users or to the objects, ACLs can limit the traffic flow within the internal network and reduce vulnerability from malicious insiders (Mullins, 2003). NAC protects the threats originated within the network such as rogue point or unauthorized systems. The access to the network is denied for the hosts within the network, if the host system is compromised or does not follow the security procedures (Black Box Network Services, 2009) (Vernier Networks, n.d.).

107

Madhuri Revalla, Ajay Gupta and Vijay Bhuse Outsider is an intruder with limited attack options such as hackers, botnets, spammers, and malware (Klara & Roy, 2012) or network attackers. 

Step 2: Analyze the threats using STRIDE threat model (Microsoft MSDN, n.d.).

S – Spoofing, T – Tampering, R – Repudiation, I - Information disclosure, D - Denial of service, E - Elevation of privilege 

Spoofing identity: An attacker pretends to be a valid user by spoofing actual user’s identity or makes a machine as a valid/trusted machine. For example, an attacker poses attacks by illegally accessing cloud user’s identity.



Tampering with data: This involves unauthorized modification of data by a malicious user. For example, modification of cloud data in a database by an unauthorized user or alteration of data over the network where the data flows between cloud users. An organization stores the data at a cloud storage service provided by the cloud system administrator. The cloud service provider has full access to the data and the ability to modify the data. If an employee in an organization wants to download data from the cloud storage, the following questions will arise – 1. How can an employee be confident that the downloaded data is same as the data stored by the organization? 2. How to check or make sure that the data is not tampered? (Jun, et al., 2010).



Repudiation: Repudiation threat involves performing an illegal action in such a way that there is no proof that the user was involved in the transaction. Example includes, cloud provider performs an illegal action on data stored in the cloud database by impersonating user’s credentials. In the above example, if an employee finds the data has been modified, is there any proof to show that the cloud provider is responsible for the tampering? (Jun, et al., 2010)



Information disclosure: Information disclosure threats involve stealing or revealing secret information to individuals. Examples include an attacker is able to read cloud user’s data, or the ability to steal user’s credentials. A malicious user steals employee’s (in the above example) user name and password and has the capability of fetching data from the cloud storage.



Denial of service: Denial of service (DoS) attacks interrupts the services. For example, attack on a single server of cloud infrastructure causes service interruption.



Elevation of privilege: This threat involves, a user getting privileged access than the normal one and use these permissions to compromise or destroy the entire system. For example, in cloud computing, users get access to the cloud services with the limited capabilities depending on the user’s need. If the cloud user is able to get administrative access to these services, user can access the service without any limitations.



Step 3: Assess the risks associated with the threats and rank the threats.

Use a scale of High, Medium and Low for each threat. Rank the threats according to the risk level and address the most critical threats first (Meier, et al., 2003). Cloud Security Alliance (CSA) conducted a survey and identified the most critical threats to cloud security (Cloud Security Alliance, 2013). 

Step 4: Select mitigation techniques and build solutions.

Consider the previous solution for the threats, select mitigation techniques to resolve the security issue and apply the solutions (Priya & Geeta, 2011).

3. Security requirements for cloud Managing security is a big challenge in cloud computing. The authors in (Ramgovind, et al., 2010) specify the requirements to provide information security for cloud. The following information security requirements are defined originally by (Eloff, et al., 2009). 

 Identification & authentication: In Cloud computing, users get permission to access the cloud services depending on the type of cloud and the delivery model. It is important to verify and validate cloud user’s identity.



 Authorization: In cloud computing, authorization is an important security requirement. Cloud users may have access privileges to only certain services or applications. Cloud system administrators or the authorization applications should authorize the users exactly according to their privileges, no more and no less.

108

Madhuri Revalla, Ajay Gupta and Vijay Bhuse 

 Confidentiality: Cloud data is stored in the database across the cloud infrastructure. It is necessary to protect the confidentiality of user’s personal profile and their data. Cloud provider should give assurance to the users that information or only authorized users access data.



 Integrity: In cloud computing, it is important to ensure integrity of cloud user’s data. The data cannot be modified while the user accesses data or while the data flows across the cloud environment.



 Non-repudiation: Cloud computing uses digital signatures, audit trails, and time stamps to prevent illegal operations or actions to ensure non-repudiation.



 Availability: Availability is one of the most important security requirements in cloud computing. To ensure that appropriate cloud services are available to the corresponding cloud user, service level agreement between cloud provider and cloud user is used. Service level agreement contains availability of services and resources.

To ensure that the system meets these security requirements, one maps the threats using STRIDE modeling to the requirements similar to what is shown in Table 1. Table 1: Mapping of threats and requirements (Hernan, et al., 2006) Threat

Security Requirement

Spoofing

Identification & Authentication

Tampering

Integrity

Repudiation

Non-repudiation

Information disclosure

Confidentiality

Denial of service

Availability

Elevation of privilege

Authorization

4. Related work We know that security for cloud computing is critical and lots of research is being carried out in this field. Junjie & Sen (2011) present the major security concerns and discuss the security measures for cloud computing, see Table 2. Table 2: Security concerns and measures Network attacks

Strengthen the anti-attack capability

Data Security

Information Encryption

Lack of safety standards

Establishing uniform safety standards

Privacy of information is difficult to ensure

Selecting reputable service providers

Data, applications, and services in cloud computing are vulnerable to Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks. For these types of attacks, Intrusion Detection System (IDS) can be used. In general, IDSs are either host-based or network-based or distributed IDSs. Host based IDS monitors individual host systems, network-based IDS monitors all network traffic in a network segment and distributed IDS monitors both host machine and network traffic. Since cloud infrastructure has more network traffic, the traditional IDSs are not capable enough to handle large traffic flow. In a traditional network, IDS can be deployed on network on user site to monitor, detect, and alert the user. In cloud computing, cloud provider administers and manages IDS, which is placed at cloud server site. The user has to rely on cloud provider, because the user does not get any notification if any loss occurs. The user thus may not completely rely on cloud service provider, as the provider may not wish to lessen reputation by revealing the loss (Gul & Hussain, 2011). Gul & Hussain (2011) proposed an efficient multi-threaded and reliable distributed cloud IDS to handle the issues that the traditional IDS cannot resolve. In this distributed model, third party ID monitoring service administers and monitors IDS. The third party service is now responsible to send alerts to the cloud user and notification to the cloud service provider. This model use multi-threading to improve efficiency in cloud network.

109

Madhuri Revalla, Ajay Gupta and Vijay Bhuse Virtualization plays a major role in cloud computing. Normal virtualization security techniques are not enough to handle virtualization security in cloud computing. Applying virtualization to cloud computing may in fact cause additional security risks. Xiangyang et al. (2011) analyze the security risks associated with virtualization and present the safety measures for those risks. 

Resource access control: By using virtualization, cloud user can access the resources from anywhere. But the user does not know the exact storage location of information. There is a possibility of information leakage by extracting secret information using virtualization vulnerability. And also a hacker can use administration privileges by attacking virtualization platform. 



DOS attack: A large number of users use virtualization services. A malicious user can make virtualization platform less effective or may even stop the service some times by launching an application with many services, which need lot of resources. 



Countermeasures: When the requested number of services increased, use mitigation techniques such as move to an alternate work area and check for malicious applications. This attack need fast recovery by restoring the services.

Virtualization platform in building network: The users share the virtual interface in the same networks. The user can see the server and all the other users traffic since they use the same network stack. An attacker can access all user information by attacking one single system. 



Countermeasures: Use directory services to manage identities and one time grant permissions.

Countermeasures: Setup an administrator with two processes – one process running in a secure area for virus and Trojan killing and another process running in an isolation area to communicate with the outside world to update and download latest vulnerability software.

Virtualization platform’s security management: In virtualization platform, the virtual network administrator cannot monitor and troubleshoot all virtual devices. So here the question is how they manage all virtual machines, which introduces a new risk called management risk. 

Countermeasures: This is a virtualization platform design issue. Need to deploy firewalls, intrusion prevention and detection techniques, integrity monitoring and log checking in a virtual machine.

Mohammed et al (2012) proposed a new approach called Multi-clouds Databases Model (MCDM) to reduce security issues in cloud computing. This model uses multi-cloud instead of a single cloud service provider and Shamir’s secret sharing algorithm. Also this model uses triple modular redundancy (TMR) technique with sequential method to improve the reliability of the system. The data is replicated among several clouds and uses TMR and secret sharing techniques to enhance security and privacy in cloud computing. Zhidong & Qiang (2010) proposed a method to build a trusted computing environment for cloud computing. In this method, authors integrate trusted computing platform (TCP) into cloud computing to establish a security environment. TCP provides authentication, communication security and data protection for cloud computing. Another major concern in providing security is energy consumption. Nitin & Ashutosh (2010) analyze security implementation for cloud computing and show that it needs higher energy consumption, as more processing cycles and operations needs to be performed. The authors also proposed some techniques to reduce energy consumption in cloud computing. According to Computer Security Alliance (CSA) survey report in 2013, the nine critical threats according to the severity are in Table 3. Also the table describes the category of the threat and the requirements to secure that threat.

110

Madhuri Revalla, Ajay Gupta and Vijay Bhuse Table 3: Top 9 critical threats in cloud computing (Cloud Security Alliance, 2013) Threat

STRIDE

Security requirements

Data breaches

Information disclosure

Confidentiality

Data loss

Denial of service

Repudiation, Denial of service Tampering with data, Repudiation, Information disclosure, Elevation of privilege, Spoofing identity Tampering with data, Repudiation, Information disclosure, Elevation of privilege Denial of service

Malicious insiders

Spoofing, Tampering, Information disclosure

Availability, Non-repudiation Integrity, Non-repudiation, Confidentiality, Availability, Identification, Authentication, Authorization Integrity, Non-repudiation, Confidentiality, Authorization Availability Identification, Integrity, Authentication, Confidentiality

Account hijacking Insecure APIs

Abuse use of cloud services Insufficient due diligence Shared technology issues

Not applicable ALL

ALL

Information disclosure, Elevation of privilege

Confidentiality, Authorization

5. PKI in cloud computing Data breach is the most critical threat in cloud computing according to the CSA report. The major concern for cloud consumers is alteration or deletion of their data without a backup of the original data. Encoding key or unauthorized accesses also cause loss of data. According to the authors (Gundeep, et al., 2012), this is the main concern for businesses, as data loss/leakage results in loss of reputation but obligated by law to keep it safe. In the following, we propose a multi-factor authentication scheme using public key infrastructure framework, smartcards and biometrics to address data breach security issue in clouds. A Public Key Infrastructure (PKI) is a framework that provides secure infrastructure with its methods, technologies, and techniques. Usually cryptography uses public and private cryptographic keys (ArticSoft, n.d.). Certificate Authority (CA) creates public and private key pair using asymmetric key algorithm, known as RSA algorithm and issues digital certificate. PKI provides (Heena & Deepak, 2012): 



Confidentiality by encrypting the data



Integrity by providing encryption and authentication



Authentication by verifying the identity of the user



Non-repudiation by issuing digital certificates with the public key

5.1 Our proposed solution for data breach Public Key Infrastructure (PKI) uses certificate authority to identify and verify each individual user and the certificate management system contains all the issued certificates (ArticSoft, n.d.). In RSA Conference Europe 2011, Jaimee & Peter provided a trusted model over CA in cloud computing (Jaimee & Peter, 2011). According to this model, an Enterprise Certificate Authority (ECA) generates certificates to the restricted community such as organization or an application and the CA is inside the enterprise boundary to build the trust over CA. PKI plays a vital role in enhancing the data security in cloud environment. By using PKI, data can be encrypted and stored in the cloud. However, PKI is not capable enough to authenticate the user. A malicious user can access the data if (s)he somehow gets hold of a user’s key. Therefore, our proposal is a multi-factor authentication mechanism with PKI to enhance trust and privacy of the user’s data. Our technique uses the combination of the following three factors (Kevin, 2006): 

Something the user knows – Cryptographic / Private key from CA



Something the user has – Such as a smart card



Something the user is - Biometrics

Using smart cards with PKI is one solution to authenticate the user. This solution provides authentication, but the level of security is not strong enough to secure the data. Stolen keys and cards are the issues with this

111

Madhuri Revalla, Ajay Gupta and Vijay Bhuse solution. Another solution is using biometrics with PKI. This solution is more secure when compared to the smart cards with PKI. In this approach, the user needs to present the cryptographic key and biometric to access the data from the cloud. But this method also does not provide strong authentication, since an attacker can perform spoofing attack and steal the key. Also there are issues of false acceptance and false rejections with biometrics (Nagappan, 2009). So, why not use both the smart cards and the biometrics with PKI which easily addresses the weakness of using PKI with either smartcard or biometric. Smartcards are readily available at a low cost and today’s systems are already capable to easily handle the overheads of biometrics. User should provide the cryptographic key, smart card and biometric to access the data. The smart card contains the user’s biometric template. In order to access the data from the cloud, the captured biometric should match with the stored template in the smart card (David, et al., 1999). It will now be extremely difficult for an attacker to simultaneously get the key that the user only knows, steal the smart card that the user has, and spoof the user’s biometric. Thereby providing almost bulletproof secure authenticated data access and prevent data breach. Figure 2 below compares the relative strength among the various options.

Figure 2: Level of security with respect to our proposed solution for data breach prevention (Nagappan, 2009) In an enterprise, every user may not need the full strong authentication. A user role may be assigned to each user in an organization and the appropriate security provided to each user role. The level of security provided will depend on the role of the user. For example, an administrator role needs a highly secured mechanism, the clerk role needs to be secured, and the user needs normal security level. This way, the overhead caused by smartcards and biometric hardware can be optimized.

6. Conclusions and future work Cloud computing faces more threats than the traditional systems due to virtualized environments and shared infrastructure. For example, an adversary could perform side channel attacks by placing malicious virtual machines in the cloud system. Virtual machines are running on the same physical hardware, so one user’s actions has impact on the other’s activities. Also an attacker can perform side-channel attacks to get another user’s confidential information via information leakage because of co-tenancy. Side-channel attacks exploit information leakage to get information such as measuring cache usage, estimating traffic rates and keystroke timing (Thomas, et al., 2009). Cloud computing services have sharing infrastructure and the services are shared among multiple users. Thus, failure of any service causes service interruption. Virtualization also brings some new security vulnerabilities in cloud computing such as VM sprawl, Inter VM attacks and VM escape attack. In virtualization, a Virtual Machine (VM) is created and controlled by Virtual Machine Monitor or Manager (VMM). An inappropriate virtual machine management policy causes VM Sprawl. VM Sprawl means continuous increase in the number of VMs, while most of the VMs are idle or never awaken from sleep. This process may result in wasting the resources of the host machine (Fu & Li, 2011). Isolation failure due to weak isolation or improper access control policy results in the inter or intra attacks between two VMs or between VM and VMM. An attacker is able to perform VM Escape attack by running a program in a virtual machine and get access to the host system via VMM (Shengmei, et al., 2011).

112

Madhuri Revalla, Ajay Gupta and Vijay Bhuse In this paper, we have contrasted cloud computing with the traditional computing from a security perspective. Threat model for cloud computing using STRIDE has been presented and discussed. We have explored different cloud specific issues and some of them are presented. We proposed a solution to one of the issue, namely data breach. We address the problem of privacy of the data at the user level and analyze different solutions for providing security. In order to keep the data secure, the user should have control over the data and need to analyze the security technique. To provide strong security and privacy for the data, we proposed an incremental method using PKI, smartcards, where the user also has a control over the level of security mechanism desired to ensure data privacy. We plan to continue to work on the cloud-specific security issues and continue our investigation of security and privacy in cloud environments. Our objective is to present effective and efficient yet low-overhead solutions to a number of identified security issues in the cloud-computing space.

References Abhinay, K., Devendra, D. & Sapna, B., n.d. The Structure of Firewalls. [Online] Available at: http://abhinaykampasi.tripod.com/TechDocs/Firewall.pdf [Accessed March 2013]. ArticSoft, n.d. An Introduction to PKI. [Online] Available at: http://www.articsoft.com/public_key_infrastructure.htm [Accessed December 2012]. Bauer, M., 2007. Paranoid Penguin - Linux Firewalls for Everyone. Linux Journal, April. Black Box Network Services, 2009. Stop inside attacks on your school's network. [Online] Available at: https://www.eandi.org/pdf/BlackBox_VeriNAC_12.09.pdf [Accessed March 2013]. Cloud Security Alliance, 2013. The Notorious Nine Cloud Computing Top Threats in 2013, s.l.: Cloud Security Alliance. David, C., David, S. & Bob, H., 1999. Moving to Multi-factor Authentication. Linux Journal. Eloff, J., Eloff, M., Dlamini, M. & Zielinski, M., 2009. Internet of People, Things and Services - The Convergence of Security, Trust and Privacy, s.l.: s.n. Frey, C., 2009. COTS Security Guidance (CSG) (CSG-06\G) Firewalls. [Online] Available at: http://www.cse-cst.gc.ca/its-sti/services/csg-cspc/csg-cspc06g-eng.html [Accessed March 2013]. Fu, W. & Li, x., 2011. The Study on Data Security in Cloud Computing. IEEE. Gul, I. & Hussain, M., 2011. Distributed Cloud Intrusion Detection Model. International Journal of Advanced Science and Technology, September, Volume 34, pp. 71-81. Gundeep, B. S., Prashant, S. K., Krishen, K. K. & Seema, K., 2012. Cloud security: Analysis and risk manamgement of vm images. s.l., s.n., pp. 646 - 651. Heena, K. & Deepak, C. S., 2012. Building Trust In Cloud Using Public Key Infrastructure. International Journal of Advanced Computer Science and Applications, 3(3), pp. 26-31. Hernan, S., Lambert, S., Ostwald, T. & Shostack, A., 2006. Threat Modeling. [Online] Available at: http://msdn.microsoft.com/en-us/magazine/cc163519.aspx [Accessed January 2013]. Jaimee, B. & Peter, R., 2011. PKI Reborn in The Cloud, s.l.: RSA Conference. Jun, F., Yu, C., Wei-Shinn, K. & Pu, L., 2010. Analysis of Integrity Vulnerabilities and a Non-repudiation Protocol. IEEE Computer Society, pp. 251-258. Jun-jie, W. & M.Sen, 2011. Security Issues and Countermeasures in Cloud Computing. s.l., s.n., pp. 843 - 846. Kevin, U., 2006. Moving to Multi-factor Authentication. [Online] [Accessed December 2012]. Klara, N. & Roy, C., 2012. Security for Cloud Computing, Urbana-Champaign: University of Illinois. Meier, J. et al., 2003. Improving Web Application Security: Threats and Countermeasures. [Online] Available at: http://msdn.microsoft.com/en-us/library/ff648644.aspx#c03618429_011 [Accessed January 2013]. Microsoft MSDN, n.d. Overview of Web Application Security Threats. [Online] Available at: http://msdn.microsoft.com/en-us/library/f13d73y6(v=vs.100).aspx [Accessed January 2013]. Microsoft TechNet, 2010. Firewall Types. [Online] Available at: http://technet.microsoft.com/en-us/library/ff602917(v=ws.10).aspx [Accessed March 2013]. Mohammed, A. A., Ben, S. & Eric, P., 2012. A New Approach Using Redundancy Technique to Improve Security in Cloud Computing. s.l., s.n., pp. 230-235. Mullins, M., 2003. Protect your network from internal attacks with access control lists. Tech Republic, December. Nagappan, R., 2009. Stronger / Multi-factorAuthentication for Enterprise Applications, Hartford: OWASP Seminar.

113

Madhuri Revalla, Ajay Gupta and Vijay Bhuse Nitin, C. S. & Ashutosh, S., 2010. Energy Analysis of Security for Cloud Application. IEEE. Peter, M. & Timothy, G., 2011. The NIST Definition of Cloud Computing, s.l.: s.n. Priya, M. & Geeta, S., 2011. Privacy Issues and Challenges in Cloud computing. International Journal of Advanced Engineering Sciences and Technologies, 5(1), pp. 001 - 006. Rack space, 2011. Cloud Security and what vendors and customers need to do to stay secure, s.l.: Diversity Limited. Ramgovind, Eloff & Smith, 2010. The management of security in Cloud computing. s.l., s.n. Rumelioglu, S., 2005. Evaluation of The Embedded Firewall System , s.l.: s.n. Shengmei, L., Zhaoji, L., Xiaohua, C. & Zhuolin, Y. J. C., 2011. Virtualization security for cloud computing service. International Conference on Cloud and Service Computing, pp. 174-179. Siebert, E., 2011. How to connect virtual environments to DMZ network architecture. SearchNetworking. Thomas, R., Eran, T., Hovav, S. & Stefan, S., 2009. Hey, You, Get Off of My Cloud:Exploring Information Leakage in ThirdParty Compute Clouds. ACM. Vernier Networks, n.d. Stopping the Insider Threat with Network Access Control (NAC), s.l.: s.n. Xiangyang, L. et al., 2011. Virtualization Security Risks and Solutions of Cloud Computing via Divide-Conquer Strategy. s.l., s.n., pp. 637-641. Zhidong, S. & Qiang, T., 2010. The Security of Cloud Computing System enabled by Trusted Computing Technology. s.l., s.n., pp. V2-11 - V2-15.

114

Cloud Security: A Review of Recent Threats and Solution Models Betrand Ugorji, Nasser Abouzakhar and John Sapsford School of computer science, University of Hertfordshire, College Lane, Hatfield, UK b.ugorji2@herts.ac.uk n.Abouzakhar@herts.ac.uk j.sapsford@herts.ac.uk Abstract: The most significant barrier to the wide adoption of cloud services has been attributed to perceived cloud insecurity (Sundareswaran, Squicciarini and Lin, 2012). In an attempt to review this subject, this paper will explore some of the major security threats to the cloud and the security models employed in tackling them. Access control violations, message integrity violations, data leakages, inability to guarantee complete data deletion, code injection, malwares and lack of expertise in cloud technology rank the major threats. The European Union invested €3m in City University London to research into the certification of Cloud security services. This and more recent developments are significant in addressing increasing public concerns regarding the confidentiality, integrity and privacy of data held in cloud environments. Some of the current cloud security models adopted in addressing cloud security threats were – Encryption of all data at storage and during transmission. The Cisco IronPort S-Series web security appliance was among security solutions to solve cloud access control issues. 2-factor Authentication with RSA SecurID and close monitoring appeared to be the most popular solutions to authentication and access control issues in the cloud. Database Active Monitoring, File Active Monitoring, URL Filters and Data Loss Prevention were solutions for detecting and preventing unauthorised data migration into and within clouds. There is yet no guarantee for a complete deletion of data by cloud providers on client requests however; FADE may be a solution (Tang et al., 2012). Keywords: Cloud Security, Security Threats, Security Solutions

1. Introduction Cloud computing is a new technology paradigm that promises huge benefits to its users. It is all about computing resource sharing to increase efficiency while reducing the overhead of administration and other IT costs. Cloud computing avails a convenient and ubiquitous access to highly on-demand elastic computing resource pool in the form of computing infrastructure, platform or software. The NIST has defined cloud computing as a model to enable ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources such as networks, servers, storage, applications and services; that can be rapidly provisioned and released with minimal management effort or service provider interaction (Mell and Grance, 2011). Amazon Web Service (AWS) launched in 2006 was among the first cloud computing implementations and Eucalyptus was the first open source platform for cloud deployment. Cloud computing has become very attractive especially to individuals and start-up companies due to its scalability, accessibility, relative inexpensiveness and service oriented Pay-As-You-Use model (Wang, 2012). This imply that start-up companies get up and running with huge computational resources in a matter of minutes by signing up accounts with cloud computing vendors and may pay only when they utilise those resources. That way, cloud consumers may not have to worry about huge upfront investments in IT Staff, IT infrastructure, software, maintenance etc. depending on their subscription, instead these are taken care of by the Cloud Service Provider (CSP). Many organisations have signed up with CSPs and migrated their data to the cloud and more are considering cloud computing given its promise of numerous benefits (Marinos and th Sfakianakis, 2012). Governments are not left out in this shift to cloud computing. For instance, on the 8 of February 2011, the United States released Federal Cloud Computing Strategy and proposed to commit USD 20 billion Governments spending to cloud computing (Kundra, 2011). In 2009, revenue generated from cloud computing were estimated to be around USD 17 billion and the forecast was that it would amount to USD 44.2 billion by 2013, however, by 2010, the revenues were estimated to be about USD 68.3 billion and to reach USD 148.8 billion by 2014 (Catteddu and Hogben, 2009) (Pettey and Goasduff, 2010). This is an indication of how rapidly individuals, organisations and Governments are adopting cloud computing. There are four types of cloud deployments (Zhao et al., 2012). 

Private cloud

The Cloud infrastructure and management is located on a specific organisation’s premise or elsewhere and operated for and available to only that organisation.

115

Betrand Ugorji, Nasser Abouzakhar and John Sapsford 

Community cloud

A cloud infrastructure shared by specific number of organisations with a common goal (e.g. security enforcement agencies and governments). It may be provided by different CSPs. It may be deployed within either of the participant organisations premise or externally and may be managed internally or externally (Jadeja and Modi, 2012). 

Public cloud

A public cloud may be owned and provided by a single CSP and is available to anyone who may freely sign-up accounts with the CSP. 

Hybrid cloud

A mix of two cloud deployment types (e.g. a private cloud that accesses public clouds), it may be provided by different CSPs but interoperate for e.g. load balancing. Figure 1 shows a model of the cloud deployment types as described above.

Figure 1: Cloud deployment types. Cloud deployment types adopt three Service Delivery models (SDM) – 

Infrastructure as a Service (IaaS)

CSP provisions and cloud consumer rents and manages computing storage, networks, operating systems, applications etc. The consumer has no control over the underlying infrastructure. 

Platform as a Service (PaaS)

PaaS provides the capability for a cloud consumer to deploy software and applications on the CSP’s cloud platform. 

Software as a Service (SaaS)

A consumer uses applications provided by the CSP such as cloud based emails, spreedsheets, ERPs, UI design applications etc. Composite as a Service has also been categorised as a cloud SDM in some articles. It is common with CSPs to garnish their products with N + aaS names e.g. Security as a Service (SaaS). Figure 2 shows cloud Service Delivery Models with some CSPs and some customers.

116

Betrand Ugorji, Nasser Abouzakhar and John Sapsford

Figure 2: Cloud service delivery models, CSPs & their customers 2. Cloud security threats In the early years of cloud computing, there were no cloud computing standards. Therefore, different CSPs used different frameworks and platforms to implement and deploy their clouds mainly based on Service Oriented Architectures, Utility Computing and Virtualization. However due to the need for cloud interoperability, increased consumer adoption, security and privacy, service standards; standard organizations such as the Cloud Security Alliance (CSA), National Institute of Standards and Technology (NIST), Open Cloud Manifesto (OCM), Federal Information Security Management Act (FISMA) and The Federal Risk and Authorization Management Program (FedRAMP) are now engaged in cloud computing accreditations, certifications or standardizations (Tan and Ai, 2011) (McCaney, 2011). In a quest to explore cloud computing paradigm, there is a lot of research going on around cloud computing. However, due to the cost and complexity involved in carrying out some researches within a live cloud computing environment, majority of researches in cloud computing are carried out as simulations using cloud simulation tools such as CloudSim, GreenCloud, Open Cirrus and iCanCloud (Zhao et al., 2012). Dr. Lawrence Chung’s research team at UT Dallas claim to have found how to build an efficient cloud and have secured grants from Google to use Google App Engine (GAE) to verify their benchmarking and simulation project in a real cloud environment (Ladson, 2013). As common with all new technologies, there are advantages and disadvantages. The most significant barrier to the wide adoption of cloud services has been attributed to perceived cloud insecurity (Sundareswaran, Squicciarini and Lin, 2012). According to the IDC, 87.5% of CIO’s have security and privacy as top concerns hindering their adoption of cloud computing, also the Government Technology Research Alliance have reported that the major impediment towards adopting cloud computing is security and loss of privacy (Behl and Behl, 2012). The security threat posed by cloud computing to individuals and businesses cannot be overemphasized, as this is evident through some recent cloud computing security events - Amazon cloud failure which caused Distributed Denial of Service to other clouds, web hosting companies, organisations and individuals that depended on it, and the Google and Sony cloud data leakages that affected the confidentiality and privacy of thousands of their customers in 2009, are examples (Kantarcioglu, Bensoussan and Hoe, 2011) (Tan and Ai, 2011). Jason Hart, vice-president of cloud solutions at SafeNet reported that 89% of security personnel around the world are not clear on how to keep information in the cloud protected at all times (The Chartered Institute for IT, 2013). With cloud computing in the hands of malicious users, catastrophic malware and virus attacks could be launched against other cloud infrastructures and internet users turning clouds into the biggest botnets. As the clouds pose security threats to cloud customers, these threats are as a result of security threats to the clouds amongst other reasons. In a view towards addressing increasing public concerns regarding the confidentiality, integrity and privacy of data held in cloud environments, the security industry, CSPs, academia, governments and various stakeholders have continued to work towards a secure cloud environment. For instance, the European Union invested €3m

117

Betrand Ugorji, Nasser Abouzakhar and John Sapsford in City University London to research into the certification of Cloud security services (City University London, 2012). The UK Department for Business, Innovation and Skills together with the Engineering and Physical Sciences Research Council are investing £7.5 Million to train data security experts at University of London and Oxford University (The Chartered Institute for IT, 2013). In an attempt to review this subject, this paper will explore some of the major security threats to the cloud and the security models employed in tackling them. In September 2012, the European Network and Information Security Agency (ENISA) top threats publication, lists the following threats against cloud computing as emergent and on the increase – 

Code injection and directory traversal



Drive by exploits



Information leakage (especially with increased use of mobile devices)



Insider threats



Identity theft (especially with increased use of mobile devices)



Targeted attacks

There is a basic lack of control over cloud-based environments and uncertainty over how to deal with insider threats, compliance duties and mobile access (The Chartered Institute for IT, 2013). According to the CSA, the notorious nine cloud computing top threats in 2013 in order of severity are: 

Data Breaches



Data Loss



Account Hijacking



Insecure APIs



Denial of Service



Malicious Insiders



Abuse of Cloud Services



Insufficient Due Diligence



Shared Technology Issues

Other security threats we identified were: 

Incorrect Hypervisor security implementations in non-private IaaS cloud environments.



Wrong firewall usages and wrong firewall and access control implementations.



Predominant use of Vulnerable Distributed Database Systems (DDS).



Malwares.

The next section will discuss some of these threats listed above under the following headings - Abuse of cloud services, Lack of expertise in cloud technology, Inability to guarantee complete data deletion, Message integrity violations, Data leakages, Authentication Issues, Inadequate monitoring, Insecure interfaces and APIs, Social engineering, drive by exploits and insider threats and Access control violations.

1.1 IaaS threats 

Abuse of cloud services

Currently, there are no physical interactions in purchasing cloud services and most CSP’s do not conduct routine inspection on the activities of their IaaS users due to privacy rules, in cases where the latter does not apply, attackers patterns and strategies change over time making attack discoveries and protection even more difficult. Malicious users with forged or stolen bank details could purchase IaaS cloud services and use them to serve malware or perform phishing attacks against other cloud users, as earlier mentioned, turning the cloud into biggest botnet (Grosse et al., 2010). This threat is also applicable in PaaS cloud infrastructure.

118

Betrand Ugorji, Nasser Abouzakhar and John Sapsford



Lack of expertise in cloud technology

Research has also shown that there are not enough competent security personnel in securing the volumes of data held in cloud environments. Several cloud security issues have also been attributed to incorrect security configurations in the cloud (The Chartered Institute for IT, 2013) (Abouzakhar, 2013). For instance, (1) defining security configurations on a hypervisor pose a security threat as VMs completely lose their security when they are relocated (Brenton, 2011) and (2) the use of DDS API calls are vulnerable to SQL-Injection attacks (Markey, 2011).

1.2 PaaS threats 

Inability to guarantee complete data deletion

Due to the distributed storage nature of the cloud and collaborations between several cloud service providers (e.g. cloud bursting), guaranteeing the complete deletion of a client’s data on the client’s request is still an issue. This affects several cloud consumers by causing unwanted vendor lock-ins and privacy issues. 

Message integrity violations

Research has shown that some CSPs do not apply sufficient encryption and some do not encrypt all data stored in the cloud due to the overhead of encryption and the inability to process all encrypted data respectively. This has led to message integrity violations which pose a serious security threat to cloud consumer’s privacy (Intel IT Centre, 2011). It is also known that the privacy of cloud consumers files stored in the cloud are threatened by deduplication attacks. Malicious users can easily identify other users file; learn the contents of those files as well as save the files in a remote control centre (Harnik, Pinkas and Shulman-Peleg, 2010).

1.3 SaaS threats 

Inadequate monitoring

As also mentioned earlier, fine grained auditing and traceback is difficult to realise in cloud environments due to shared log files. In networked environments, hardware association could be used to establish traceback however, virtualization makes such association more difficult to establish. Privacy rules also contribute a threat to security as most cloud customers only want their activities to be monitored for charging and accounting purposes only (Kumar and Kumar, 2010).

1.4 Common threats 

Authentication Issues

Existing password-based authentication has an inherited limitation, cause several drawbacks and poses significant risks. Trust propagation for an authenticated user’s security context between various interoperating clouds is yet to be fully realised within cloud environments. 

Data leakages

As mentioned previously, the multi-tenancy characteristic of cloud environments pose a significant data leakage threat. There is a good documentation of breaking out of one VM into another within the same host as well as URL traversal making it possible for a malicious tenant to access other tenant’s data. Data leakage during cloud bursting is also common due to heterogeneity between different clouds. The different security requirements and trust relations for different tenants is thought as to make a multi-tenant cloud a single point of compromise due to complex security and trust issues (Takabi, Joshi and Ahn, 2010). 

Insecure interfaces and APIs

The cause of most application security issues has been attributed to bad application development practices. Cloud environments also use applications which are built for in-house deployments therefore, suffer from any bugs in such applications too (Armond, 2009) (Kumar and Kumar, 2010) (Markey, 2013). DDS systems such as CouchDB, FlockDB, Hive, MongoDB, RavenDB and SimpleDB use API calls (NoSQL) to perform CRUD operations for applications instead of SQL. There are documentations for vulnerabilities in DDS such as limited support for Dynamic Application Testing Tools (DAST), lack of support for native encryption and hashing, SQL-Injection etc. (Markey, 2011).

119

Betrand Ugorji, Nasser Abouzakhar and John Sapsford 

Social engineering, drive by exploits and insider threats

The Web is rich with deceptive content that lures users into downloading malware. ENISA and Provos, Rajab and Mavrommatis (2009) express concerns over compromises to cloud servers due to Social engineering attacks, Drive by exploits or Drive-by downloads. Due to the human factor, Social engineering, Drive-by downloads and insider threats still pose a high security threat to cloud computing. 

Access control violations

The ability to emulate hardware via software (Virtualisation) is one of the underlying technologies that drive cloud computing. This provides the ability to run several virtual machines (VM) on the emulation software stack (Hypervisor) which creates the possibility for multi-tenancy and the ability to reallocate resources as needed for elasticity and scalability. However, resource reallocation creates the possibility that sectors containing for instance one tenant’s deleted files which have moved into other co-tenants VM can be recovered by those co-tenants. In addition, IP addresses of VM or hosts freed during server relocations which are allowed access into the cloud environment may return to public pools and be used by malicious attackers to access a cloud environment. Heterogeneity in cloud access control interfaces is another serious access control security issue. This makes it difficult for organisations to move their access control policies along when the switch cloud service providers. Existing literature has shown that even though individual domain policies are verified, security violations can easily occur during integration (Takabi, Joshi and Ahn, 2010). Access control violations threaten all cloud computing service delivery models. Table 1 shows the various security threats some cloud vendors are concerned about (Intel IT Centre, 2011). Table 1: Cloud security threats Security Threats

CSPs Carpathia

Lack or immature trust extensions in hypervisors and cloud ecosystem.

Cisco

Lack of cloud standards, inadequate security for cloud automation, lack of automation for security service provisioning, inadequate encryption key management and the overhead of encryption.

Citrix

Administrative mistakes and lack of approved workflow.

Expedient

Insufficient end-to-end chains of trust and inadequate encryption key management.

HyTrust

Inadequate virtualization security, authentication and access control.

McAfee

Distributed data storage problems, access control issues, cloud consumer lock-ins and incomplete data deletion.

OpSource

Constantly changing security requirements and lack of cloud security standards.

Trapezoid

Assurance that data is secure irrespective of its constantly changing location.

Virtustream

Incomplete data deletion, software integrity assurance issues, insufficient data encryption in motion and transference of application authentication.

2. Cloud security solutions There are yet no generally accepted security standards for cloud computing however, some cloud security standards by various organisations like CSA, NIST and FISMA are beginning to have widespread adoption by CSPs. Examples of such standards are the Cloud Control Matrix (CCM) and Security, Trust and Assurance Registry (STAR) created by the CSA. Our research findings show that several CSPs are applying different cloud security solutions, however there are similarities in their usage of security mechanism and appliances. Privacy rules are not in favour of proper cloud consumer activity monitoring however Service Level Agreements (SLA) are now used in creating a balance between the responsibilities and expectations of both the CSP and the cloud service consumer. This goes a long way in alleviating the threat posed by the abuse of cloud computing infrastructure and issues of service availability. Defining security policies in the cloud on host-basis is being used as a solution to the threat of hypervisor security level implementation. The advantage of this is that a security configuration remains with a cloud host as it travels (Brenton, 2011). Several organisations and governments have invested in training security experts to combat the incompetency problem amongst cloud

120

Betrand Ugorji, Nasser Abouzakhar and John Sapsford security experts (The Chartered Institute for IT, 2013). There is yet no guarantee for a complete deletion of data by cloud providers on client requests however; FADE has shown to be a promising solution to this threat (Tang et al., 2012). Assigning a random threshold for every file stored in the cloud and performing deduplication only if the number of copies of the file exceeds this threshold, is shown to solve some data integrity violation issues in the cloud (Harnik, Pinkas and Shulman-Peleg, 2010). Many CSPs have also deployed Security Information and Event Management (SIEM) Systems within their clouds therefore, are now able to generate and manage log reports. This is largely possible by the use of Big Data for general threat analysis issues (Marsh, 2012) (Intel IT Centre, 2011) (Brenton, 2011). The Cisco IronPort S-Series web security appliance was among security solutions to solve cloud access control issues. 2-factor Authentication with RSA SecurID and close monitoring appeared to be the most popular solutions to authentication and access control in the cloud. Database Active Monitoring, File Active Monitoring, URL Filters and Data Loss Prevention were solutions for detecting and preventing unauthorised data migration into and within clouds (Intel IT Centre, 2011) (Cloud Security Alliance, 2011). Encryption of all data at storage and during transmission, use of hypervisor security at the hardware level as well as use of query re-writers at the database level are being used as solutions to the multi-tenancy and data leakage security threats. Almorsy, Grundy and Ibrahim (2011) demonstrated how to utilise existing security automation efforts to facilitate Cloud Service Security Management Process using a SaaS application to secure multi-tenant cloud environment. Table 2 summarises the security solutions against the cloud security threats identified by CSPs as earlier shown in Table 1. Table 2: Cloud Security Solutions CSPs

Security solutions

Security threats

Carpathia

Lack or immature trust extensions in hypervisors and cloud ecosystem.

Use of modern CPUs and chipsets paired with a policy engine controlling orchestration to allow a chain of trust from the hardware to the hypervisor to the operating system by using modern appliances such as the Intel TXT.

Cisco

Lack of cloud standards, automation problems and inadequate encryption key management.

Standardizations as CSA, (ISO) 27001 and 27002 and FISMA, encryption of all data at rest and in transit, Cisco ASA 5585-X Appliance â&#x20AC;&#x201C; Firewall, Cisco IronPort S-Series web security appliance and use of SAML to secure SaaS applications.

Citrix

Administrative mistakes and lack of approved workflow. Insufficient end-to-end chains of trust and inadequate encryption key management.

Enforcement of a workflow enabled administrative solution.

HyTrust

Inadequate virtualization security, authentication and access control.

McAfee

Distributed data storage problems, access control issues, cloud consumer lock-ins and incomplete data deletion.

2-factor authentication with RSA SecurID or smart cards, root password vaulting, accountability and leveraging any pre-existing investment in LDAP or Microsoft Active Directory. Cloud Identity Manager which auto-provisions and deprovisions cloud accounts, use of Single-Sign-On, policybased enforcement and 2-Factor-Authentication.

OpSource

Constantly changing security requirements and lack of cloud security standards.

A combination of SAS 70, PCI, SSAE 16, ISO 27001 27002, FISMA and CSA security standards.

Trapezoid

Assurance that data is secure irrespective of its constantly changing location.

Adoption of SecRAMP Security implementations, use of Intel TXT and Cisco Unified Computing System platform.

Incomplete data deletion, software integrity assurance issues, insufficient data encryption in motion and transference of application authentication.

Encryption of all data at rest using the Intel AES-NI (in the CPU) and encryption of data during transmission.

Expedient

Virtustream

Use of Intel Trusted Execution Technology (Intel TXT). Protection of identity through process and governance.

As a solution to security threats due to Insecure interfaces and APIs, cloud API developers validate and sanitize all inputs both on the client and server side of their source code, encrypt or hash data before inserting them into a DDS and adopt secure software development lifecycle (Abouzakhar, 2013) (Markey, 2013). There is yet no fool-proof approach towards tackling the issues of insider threats but sound physical access control procedures, least privilege access, sanctions and proper security and criminal record checks on cloud security personnel is widely practiced. The use of distributed application firewalls and application-level proxies

121

Betrand Ugorji, Nasser Abouzakhar and John Sapsford implemented inside a perimeter firewall, which are based on decentralized information flow control (DIFC) models that supports a decentralized creation and management of security classes at runtime are being used as solution to access control and trust propagation issues (Marsh, 2012). Intel TXT has been widely adopted by several CSPs as a solution for trust propagation from the hardware to the hypervisor and to the operating system. Multi-threaded IDS deployed across IaaS, PaaS and SaaS cloud infrastructures have proved effective in mitigating malware attacks in cloud environments. Not until recently did Amazon announce that its Hardware Security Module (CloudHSM) is a solution to the problem of proper digital certificate, encryption and cryptographic key management, but this service is relatively expensive and may be unaffordable by individuals and small firms. With AWS CloudHSM, customers maintain full ownership, control and access to keys and sensitive data while Amazon manages the HSM appliances in close proximity to their applications and data for maximum performance (Rashid, 2013).

3. Conclusion In this paper we reviewed the recent cloud security threats and solution models. This is a starting point towards identifying and exploring further the security threats facing the cloud. An understanding of recent security threats in the cloud and their current solutions are important in finding lasting solutions to cloud security issues. Solutions to cloud security threats are by far not a single organisation’s or industry’s responsibility. Security experts in the industry, the academia, various organisations and government need to collaborate towards finding a lasting solution to cloud security problems. Therefore, more research is needed for better appreciation of the security issues facing the clouds. This review serves as to provide more awareness to the recent security threats in cloud computing and the current solution models employed in tackling them, and most importantly as a scratching surface towards our on-going research into intrusion detection systems (IDS) in cloud based environment. We recommend the adaptation and reuse of standard SOA security frameworks such as SAML and WS-Trust in authenticating and federating trust in securing cloud environments. Perhaps the only way to guarantee the security of data at all times is to invest in the latest solutions on the market that can be used to ensure it is not breached by hackers or other unauthorised individuals (The Chartered Institute for IT, 2013).

References Abouzakhar, N.S. (2011) Critical Infrastructure Cybersecurity: A Review of Recent Threats and Violations. Almorsy, M., Grundy, J. and Ibrahim, A.S. (2011) Collaboration-Based Cloud Computing Security Management Framework, 2011 IEEE 4th International Conference on Cloud Computing, Washington DC, pp 364-371. Armond, C. (2009) Avanade Perspective: A Practical Guide to Cloud Computing Security, What you need to know now about your business and cloud security, Accenture and Microsoft, http://www.avanade.com/Documents/Research%20and%20Insights/practicalguidetocloudcomputingsecurity574834 .pdf. Behl, A. and Behl, K. (2012) An Analysis of Cloud Computing Security Issues, 2012 World Congress on Information and Communication Technologies, October-November, pp 109-114. Brenton, C. (2011) Hypervisor vs. Host-based security, A comparison of the strengths and weaknesses of deploying cloud security with either a hypervisor or agent based model, Cloud Security Alliance, https://cloudsecurityalliance.org/wpcontent/uploads/2011/11/hypervisor-vs-hostbased-security.pdf. Catteddu, D and Hogben, G. (2009) Cloud Computing: Benefits, Risks and Recommendations for Information Security, ENISA (European Network and Information Security Agency). City University London. (2012) City University London wins European Union grant for Cloud security research, City University News 27 September, http://www.city.ac.uk/news/2012/sep/city-university-london-wins-european-uniongrant-for-cloud-security-research. Cloud Security Alliance. (2011) Security guidance for critical areas of focus in cloud computing V3.0, th https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf [accessed 4 May 2013]. Cloud Security Alliance. (2013) “The notorious nine cloud computing top threats in 2013” https://downloads.cloudsecurityalliance.org/initiatives/top_threats/The_Notorious_nine_Cloud_Computing_Top_Th th reats_in_2013.pdf [Accessed on 7 April 2013]. Grosse E., Howie, J., Ransome, J., Reavis, J. and Schmidt, S. (2010) Cloud Computing Roundtable, IEEE Security and Privacy, Vol. 8, No. 6, 3 December, pp 17-23. Harnik, D., Pinkas, B. and Shulman-Peleg, A. (2010) Side Channels in Cloud Services: Deduplication in Cloud Storage, Security and Privacy, IEEE, Vol. 8, No. 6, November/December, pp 40-47. Intel IT Centre. (2011) Cloud Security: Vendors answer IT’s Questions about Cloud Security, Intel IT Centre Vendor Round Table, October, pp 2-44. Jadeja, Y. and Modi, K. (2012) Cloud Computing - Concepts, Architecture and Challenges, 2012 International Conference on Computing, Electronics and Electrical Technologies [ICCEET], pp 877-880.

122

Betrand Ugorji, Nasser Abouzakhar and John Sapsford Kantarcioglu, M., Bensoussan, A. and Hoe, S. (2011) Impact of Security Risks on Cloud Computing Adoption, Forty-Ninth Annual Allerton Conference Allerton House, UIUC, Illinois, USA, 28-30 September, pp 670-674. Kumar, S.D. and Kumar, U.M. (2010) Designing Dependable Service Oriented Web Service Security Architectures Solutions, International Journal of Engineering and Technology, Vol. 2, No. 2, pp 81-86. Kundra, V. (2011) Federal Cloud Computing Strategy, The White House Washington, http://www.whitehouse.gov/sites/default/files/omb/assets/egov_docs/federal-cloud-computing-strategy.pdf [accessed Feb. 23rd 2013]. Ladson, L. (2013) Cloud Computing Project Wins First-of-its-Kind Google Award, The University of Texas at Dallas News Centre, 4 March, http://www.utdallas.edu/news/2013/3/4-22431_Cloud-Computing-Project-Wins-First-of-its-KindGoo_article-wide.html. Marinos, L. and Sfakianakis, A. (2012) ENISA Threat Landscape: Responding to evolving threat Environment. Markey, S.C. (2013) Extend your secure development process to the cloud and big data, IBM developerWorks, http://www.ibm.com/developerworks/cloud/library/cl-extenddevtocloudbigdata/ [accessed 14th April 2013]. Markey, S.C. (2011), Auditing Distributed Databases, How to assess risk posture and secure distributed databases, Cloud Security Alliance, https://cloudsecurityalliance.org/wp-content/uploads/2011/11/CSA_Distributed_Dbs_v2.pdf. Marsh, J. (2012) Effective Measures to Deal with Cloud Security, CIO Update, 14 September, http://www.cioupdate.com/technology-trends/effective-measures-to-deal-with-cloud-security.html [accessed 14th April 2013]. McCaney, K. (2011) Amazon cloud service gets approval under fisma, GCN, 16 September, http://gcn.com/articles/2011/09/16/amazon-ec2-cloud-fisma.aspx. Mell, P. and Grance, T. (2011) The NIST Definition of Cloud Computing. National Institute of Standard and Technology US Department of Commerce, Special Publication 800-145, http://csrc.nist.gov/publications/nistpubs/800-145/SP800st 145.pdf [accessed on 31 March 2013]. Pettey, C and Goasduff, L. (2010) Gartner Says Worldwide Cloud Services Market to Surpass $68 Billion in 2010, Press Release, 22 June, Gartner Stamford, Connecticut http://www.gartner.com/newsroom/id/1389313. Provos, N., Rajab, M. A. and Mavrommatis, P. (2009) Cybercrime 2.0: When the Cloud Turns Dark, ACM QUEUE, February/March, Vol. 9, No. 2, pp 48-53, http://delivery.acm.org/10.1145/1520000/1517412/p46-provos.pdf. Rashid, F.Y. (2013) Amazon Offers Appliance-based Encryption Key Management Solution, 28 March, th http://www.securityweek.com/amazon-offers-appliance-based-encryption-key-management-solution [accessed 4 May 2013]. Sundareswaran, S., Squicciarini, A. C. and Lin, D. (2012) Ensuring Distributed Accountability for Data Sharing in the Cloud, IEEE Transactions on Dependable & Secure Computing 9(4), 555-567. Takabi, H., Joshi, J.B.D. and Ahn, G. (2010) Security and Privacy Challenges in Cloud Computing Environments, Security and Privacy, IEEE, Vol. 8, No. 6, November/December, pp 24-31. Tan, X. and Ai, B. (2011) The Issues of cloud computing security in high-speed railway, 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, August, pp 4358-4363. Tang, Y., Lee, P.P.C., Lui, J.C.S. and Perlman, R. (2012) Secure Overlay Cloud Storage with Access Control and Assured Deletion, IEEE Transactions on Dependable and Secure Computing, Vol. 9, No. 6, November/December, pp 903-916. The Chartered Institute for IT. (2013) Cloud computing 'poses daunting challenge to security chiefs', BCS Latest Industry News, 8 May, http://www.bcs.org/content/conWebDoc/50504?utm_medium=email&utm_source=BCS+The+Chartered+Institute+f or+IT&utm_campaign=2493324_securityspecialmay13&dm_i=9U7,1HFV0,9QGEZI,51JJQ,1. The Chartered Institute for IT. (2013) Data security experts 'to be trained at university', BCS Latest Industry News, 9 May, http://www.bcs.org/content/conWebDoc/50515?utm_medium=email&utm_source=BCS+The+Chartered+Institute+f or+IT&utm_campaign=2493324_securityspecialmay13&dm_i=9U7,1HFV0,9QGEZI,51JJQ,1. The Chartered Institute for IT. (2013) Employees may pose data breach threat, BCS Latest Industry News, 14 May, http://www.bcs.org/content/conWebDoc/50546 [accessed 15th May 2013]. Wang, S. (2012) Are enterprises really ready to move into the cloud? An analysis of the pros and cons of moving corporate data into the cloud, https://cloudsecurityalliance.org/wpcontent/uploads/2012/02/Areenterprisesreallyreadytomoveintothecloud.pdf [accessed on 11th May 2013]. Zhao, W., Peng, Y., Xie, F. and Dai, Z. (2012) Modeling and Simulation of cloud computing: A Review, 2012 IEEE Asia Pacific Cloud Computing Congress (APCloudCC), November, pp 20-24.

123

124

Masters Research papers

125

126

Malware Analysis on the Cloud: Increased Performance, Reliability, and Flexibilty Michael Schweiger1, Sam Chung1and Barbara Endicott-Popovsky2 1 Institute of Technology, University of Washington, Tacoma, WA, USA 2 Center for Information Assurance and Cybersecurity, University of Washington, Seattle, WA, USA mpschwe@uw.edu chungsa@uw.edu endicott@uw.edu Abstract: Malware has become an increasingly prevalent problem plaguing the Internet and computing communities. According to the 2012 Verizon Data Breach Investigations Report, there were 855 incidents of breach reported in 2011 with a massive 174 million records compromised in the process; 69% of those breaches incorporated malware in the some way, which was 20% higher than those breaches that used malware in 2010 (Verizon, 2012). Clearly, the need to effectively and efficiently analyze malware is needed. Unfortunately, there are two major problems with malware analysis; Malware analysis is incredibly resource intensive to deploy en masse, and it tends to be highly customized requiring extensive configuration to create, control, and modify an effective lab environment. This work attempts to address both concerns by providing an easily deployable, extensible, modifiable, and open-source framework to be deployed in a private-cloud based research environment for malware analysis. Our framework is written in Python and is based on the Xen Cloud Platform. It utilizes the Xen API allowing for automated deployment of virtual machines, coordination of host machines, and overall optimization of resources available. Our primary goal is for our framework is help guide the flow of data as a sample is analyzed using different methods. Each part of the malware analysis process can be identified as a discrete component and this fact is heavily relied upon. Additional functionality and modifications are completed through the use of custom modules. We have created a sample implementation that includes basic modules for each step of the analysis process, including traditional anti-virus checks, dynamic analysis, tool output aggregation, database interactions for storage, and classification. Each of these modules can be expanded, disabled, or completely replaced. We show, through the use of our sample implementation, an increase in the performance, reliability, and flexibility compared to an equivalent lab environment created without the use of our framework. Keywords: Malware, xen, cloud, framework, analysis

1. Introduction Malware defense and detection is an incredibly hot topic among computer security researchers, industry, government, and even the lay population. There is merit behind the hype as malware has become an increasingly prevalent problem throughout the Internet and the computing community at large. Motivation has shifted from general mischief to that of major financial gain, prompting an explosive growth in the rate of new malware released and an advancement of the techniques utilized by malware authors to thwart analysis and detection (Microsoft, 2007; Paleari, 2010; Egele et al., 2008). â&#x20AC;&#x153;The widespread diffusion of malicious software is one of the biggest problems the Internet community has to face today (Paleari, 2010).â&#x20AC;? Trends in malicious software show that we are clearly losing the battle between those who are detecting and defending against malicious code and those who are creating and releasing it. The explosion of malicious software, coupled with the inefficiencies of signature-based detection schemes, is the primary motivation behind much of the work done in this field. It has been shown that performing the fine-grained analysis required to obtain usable results and monitor malicious software during execution has a non-negligible impact on the resources of the executing system (Paleari, 2010). Due to this overhead, research into moving the analysis environments off of end-user host machines and into the cloud has led to the creation of existing cloud-based malware analysis frameworks (Paleari, 2010). Existing frameworks however, such as the work proposed in (Paleari, 2010; Martignoni, Paleari and Bruschi, 2009), attempt to solve the problem by shifting the majority of the computational load into the cloud but do not account for the aggregation of results from multiple analysis engines (Martignoni, Paleari and Bruschi, 2009). Our work attempts to answer the following question: how can we overcome the shortcomings of current cloud-based analysis frameworks in order to provide increased performance, accuracy, and flexibility? We attempt to provide a solution that combines the best aspects of existing frameworks while maintaining a modular design that is capable of growing to match additional requirements at a later time.

127

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky Our work sets out to provide a holistic system that integrates previous efforts with dynamic analysis tools with modules for merging the results from the different engines into one resource and modules for using that resource with various data classification techniques to produce a final answer (malicious or benign). We begin with developing a new framework that allows modules to be added to fulfill each of these goals and then produce a proof-of-concept system based on that framework. Our work yields the following contributions to the field of malware analysis: 

We propose a system loosely based on previous work in this field that can be extended with additional modules for added functionality, such as adding a new tool that is not included in the example implementation.



We propose modules to apply data mining classification techniques to produce some final answer to the original question of “Is this software malicious?”



Finally, there seems to be a scarcity of open-source solutions to address this problem; this work will be released as open source (the exact license has not been determined at this point) upon completion of the alpha testing phase.

2. Related work To the best of our knowledge, no modular, open-source cloud-based framework for malware analysis currently exists. We have taken general inspiration from some other works however; we will briefly discuss some of these works here. Moving the analysis environment into the cloud is not a new concept. Martignoni, et al. produced a framework for performing malware analysis in the cloud while mitigating some of the flaws involved with synthetic analysis environments (Paleari, 2010; Martignoni, Paleari and Bruschi, 2009). Zheng, et al. produced a system for shifting the majority of antivirus processing into the cloud by using a lightweight host agent to proxy files into the cloud for analysis by a cluster of traditional detection engines and an artificial immune system to detect malicious activity (Zheng and Fang, 2010). Cha, et al. developed SplitScreen, a distributed malware detection system that allowed for signature based detection methods to be greatly increased in performance through distributed computing (Cha et al., 2011). Finally, Schmidt, et al. produced a framework for preventing rootkits during malware detection in cloud computing environments (Schmidt et al., 2011). The systems from Zheng, et al. and Cha, et al. are modeled in the form of a pre-existing online virus service known as VirusTotal (VirusTotal, 2013). Our work is influenced more from the work by Martignoni, et al. and Schmidt, et al than other compared systems; the work by Martignoni, et al. and Schmidt, et al. has a few weaknesses that our work attempts to correct. The former is not designed to be modular, does not deal with the outputs from multiple engines, does not deal with malware that gains root privileges, and is not opensource (Martignoni, Paleari and Bruschi, 2009), while the latter does handle multiple analysis engines and moves the monitoring component outside the virtual machine to deal with malware that gains root privilege, but is still not modular in design nor open-source (Schmidt et al., 2011). Our work is open-source and modular to help deal with these problems. We attempt to address the lack of dealing with multiple engines by creating a custom module designed to combine multiple outputs from engines into one format for analysis and a classification module to be able to use data mining techniques to predict if a file is malicious or not; we will still only have one analysis engine, but the processing module allows more engines to be added later (since it’s modular) and it should either deal with them or be easily modifiable to deal with them.

3. Design and Implementation Our framework is designed to be modular, cloud-based, and utilize multiple analysis methods. These are key attributes of our framework. The nature of the cloud and cloud technologies helps mitigate the high computational load of the analysis process, as well as allowing for a greater level of concurrency. The modular design allows for greater flexibility; modules can be added or removed as needed. Finally, the fact that multiple tools can be used allows (theoretically) for greater accuracy and reliability of the system.

3.1 Overview At the most abstract level, this system simply coordinates the execution of predefined tasks in a pipeline or concurrent fashion, depending on configuration, available resources, and category of the task. We’ve broken down tasks into 5 categories that cover the range of the analysis process. These are, in order:

128

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky 

Submission stage



Static analysis stage



Dynamic analysis stage



Post-analysis and pre-classification stage



Classification stage

The submission stage includes the method of actually submitting the sample. Static analysis tasks include simple things such as checking a database to see if the file has already been analyzed, checking the sample using traditional signature-based antivirus tools, using some sort of disassembler (if automatable), or any other analysis task that doesn’t involve executing the sample. Dynamic analysis tasks usually involve using automated tools to execute the sample in a controlled environment to monitor several different types of behavior. The post analysis and pre-classification stage is where any sort of log merging, logical processing, or otherwise administrative type jobs are done before the result of the prior stages is sent to the final classification stage. Classification is the final stage in the process. Figure 1 shows the data flow as it is processed.

Figure 1: The general analysis flow (left) and the intra-category flow (right). 3.1.1 Tools Our primary tool used during analysis is the Cuckoo Sandbox. Cuckoo is an automated analysis tool that can be integrated into larger solutions, such as our framework, and monitors a sample during execution (in a sandbox) so that a behavior report can be generated (Guarnieri, n.d.). 3.1.2 Infrastructure We have a total of four physical machines in our small cloud test environment. Three of the machines are used for processing, while one of them has been transformed into the Network Attached Storage (NAS) supporting the cloud. The NAS is a shared storage that allows virtual machines to be started on different hosts when the image is stored on the NAS. The primary virtual machine (VM) that runs the analysis framework (referred to the control VM) is a “floating” machine, meaning that it is stored on the NAS and loaded over the network. This is important because it allows the system to be restored (restarted) easily in the event of the physical host failure.

3.2 Testing In order to test our framework, we need to have a bare minimum of one module to fit into each category, although modules may cover more than one category. In order to maintain simplicity, we have only implemented the following modules: We break the processing into two general scenarios, training a classification model and general use. During the first stage (training), we have the following steps. Preparation step 1 

Update the database table that holds known classification data about a test sample set.

129

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky

Preparation step 2 

Analyze the entire set using Cuckoo and upload the results to another database table.

Model training 

Merge the data from the two preparation steps and format the data to match what the classification engine is expecting. For our implementation, we are using the DecisionTree libraries at [DECI13], so the data is formatted for that library.



Create the decision tree model and store the data in a storage file to avoid needing to re-compute the model for each use.

The steps are broken apart for the training because changes must be made to the configuration files between each of them. During use, the system can be left to continually run unless changes must be made (in which case, it must be restarted to trigger the changed code and settings). This allows one process per sample (with some maximum number of concurrent processes) that analyzes the sample to create the data necessary for classification and feeds that data straight into the classification model built in the training steps. Our current implementation of the general operation stages is as follows (for each sample queued by the system): 

Analyze the sample using the same Cuckoo modules used during the training stages to gather the necessary data for classification.



Upload the data to the same table that was used during the pre-training step 2.



Retrieve the data from the database table from the last step and format it into the expected form for the decision tree model.



Classify the data using the model.



Finally, update the results table and return the results to the user.

Currently, the system must be stopped and configuration changes made to transition from training to nontraining mode. While this implementation is simple, it adequately fills in each step in the whole process to allow a sample to go from submission to answer.

4. Evaluation In order to evaluate our implementation of the framework, we have used a small collection of malicious samples (malware) and a small collection of non-malicious samples. Combined, all the samples will be referred to as the test base. The malicious samples are from the repositories hosted at (Malware Dump, n.d.). The non-malicious samples are several small (less than 10KB) executable files gathered from a clean Windows 7 machine. The majority of the malicious samples used in the training/testing set are from the Mandiant Report APT1 (Mandiant, n.d.) and include members from the AURIGA, BANGAT, and BISCUIT malware families. AURIGA samples uses services to achieve persistence and DLL injection to infect cmd.exe to allow for keystroke logging, process creation/deletion, and other nasty behavior (Mandiant, n.d.). BANGAT is similar to AURIGA but also uses VNC for connectivity and BISCUIT builds on BANGAT with added SSL encryption (Mandiant, n.d.). There are three metrics of interest: performance, flexibility, and reliability. Each metric is influenced by a different attribute of the project. We are not examining the effects of the malware on a cloud environment, but rather the benefits of using a cloud environment to analyze the malware samples.

4.1 Performance evaluation Our definition of higher performance means a shorter time requirement to analyze the same test base. The primary influencing attribute for this metric is the fact that the framework is designed to take advantage of the cloud and utilize parallelism. The major performance boost is from the higher level of concurrency with a larger cloud. During evaluation, we measure the following times: 

The time required for the training preparation step 1 with both the clean and malicious sample sets.

130

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky ď&#x201A;§

The time required for the training preparation step 2 using the entire test base and 1 machine, 2 machines, and 3 machines.

ď&#x201A;§

The time required to generate the classification model.

ď&#x201A;§

Finally, the time required to analyze the entire test base during general operation mode using 1 machine, 2 machines, and 3 machines.

The reason why only one time measure is taken during number 1 and 3 above is because those steps are not distributed and therefore do not gain any advantage (and are not designed to) from using multiple machines. The other two steps both utilize Cuckoo for analysis and are distributed, therefore the number of machines impacts the overall time. Table 1: Total time (in seconds) to process the sample sets for the training model preparation step 1. Number of cloud machines (down) / size of test base (right)

76 malicious (training)

24 clean (training)

Combined (not counting transition time)

N/A for this test

115.6

43.5

159.1

Table 1 above shows that the time required for the initial preparation step before building the model is actually quite short. Preparing the table in this step does not use any distributed/cloud technology, although it actually could and would not be that difficult to implement. Table 2: Total time (in seconds) to process the test base for the training model preparation step 2 using 1, 2, and 3 machines. Number of cloud machines (down) / size of test base (right) 1 machine 2 machines 3 machines

Full test base (100 samples) 7557.0 4179.9 3090.4

The data in table 2 is a good example of why the cloud helps. It allows for multiple machines to each be simultaneously processing an individual sample per process. In our environment, our physical hardware is not overly powerful, so it can only support 2 processes per machine. This means that with one machine, 2 samples are processed at a time, with two machines, 4 are processed at a time, and with three machines, 6 are processed at a time. Table 3: The total time (in seconds) required to build the classification model using the data generated in the training model preparation steps 1 and 2. Time to build the classification model

25.1

As can be seen from the data in table 2, running our framework within a distributed environment (cloud) yields great improvements. During the training model preparation step 2, the test base processed in about 126 minutes on one machine, while it only took about 70 minutes on two, and about 51 minutes with 3. We did notice a slightly higher amount of timed out Cuckoo analyses during the test with 3 machines, although steps could be taken to help prevent this without much performance hit. Table 4: The total time required for processing the test base using the general operation mode. Number of cloud machines (down) / size of test base (right) 1 machine 2 machines 3 machines

Full test base (100 samples) 11410 6266 6480

Table 4 shows that we obtain similar results in the general operation mode as in the training stages. An area for future research would be overall optimization; for instance, each analysis process loads in its own copy of the classification model currently. The model is loaded in from the file system as I/O activity and is fairly decent in size, causing the loading of the model to take more time than using the model. Since it is used in a read-only fashion, there could be one (thread-safe and multi-process-safe) instantiation that is shared amongst processes that could be loaded only once.

131

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky

4.2 Flexibility evaluation We define greater flexibility as the ease with which additional functionality can be added, or existing functionality can be modified. The flexibility of this system comes from the fact that all of the important virtual machines are floating and settings can be changed easily. Adding and removing tools can take a little work, depending on the tool, although the work involved with adding or removing a tool should not be any more difficult that installing, configuring, and using that tool without the framework. This metric is difficult to measure with hard figures, especially since it will vary greatly depending on what is being added or removed. The ultimate goal of this metric is to ease the work on the users to integrate tools into the overall process. The general process of putting tools together into one seamless process involves three distinct steps in our framework: 

Configure the tool to work independently.



Add an entry to the framework’s main configuration file.



Program the module to determine how to use the tool.

Step one above is out of the scope of this work, but would be required using any sort of integration system. If the tool isn’t installed and configured, the analysis process can’t use it. Step two involves adding and/or modifying approximately 6 lines in the framework’s main configuration file. The most difficult part of this step is simply deciding at what point in the process the tool should run and what inputs/outputs should be provided to the programmed module from step 3. Exactly what to modify is listed in the framework’s usage instructions. Custom modules always provided the file path to the sample and a copy of the configuration parser and the acceptable options for other variables defined in the configuration are None or any name that begins with an underscore. The variables are converted into temporary files and the paths to those files are provided to the modules that use them. Variables that are named identically are mapped to the same temporary file. For instance, the cuckoo module has an output variable of “_cuckoo” and the cuckoo_db module has an input variable of “_cuckoo”. This matching causes the path to the same temporary file to be provided to both modules. The fact that both modules are given the path to the same file allows for writes from one module to be read by another (these operations are not thread safe though, so shared variables much come in different stages). An optional step that may be taken if needed at this point is to create a new configuration file in the conf/extras directory. At the initial loading of the framework, the framework configuration file is parsed, followed by all files (that can be parsed) in the extras folder. If a new file is created, place all options in a section with a unique title, we recommend the id or name of the tool. Figure 2 below shows the general syntax of creating a custom configuration file; this would be accessible within the module.

Figure 2: A custom configuration options file used by the training_set_db_builder module. Finally, step three is to actually program a python module to determine the behavior of the tool and its interactions with the rest of the process. This can be as simple as calling a specific command (command line interface) or as complex as integrating custom logic, parsing input files, and uploading/retrieving data from a database. Figure 3 below shows a simple database interaction module. We’ve attempted to move as much of the processing logic out into the framework and the parent class AbstractTool to simplify the work involved with adding new functionality.

132

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky

Figure 3: The code for the module to build part of the training set information in the database. As can be seen from figure 3, the code does not have to be long and complex. This example builds a table (`malicious_answer`) in the database that holds two columns: the SHA256 hash of the sample and the preset known value of whether or not the sample is malicious. At a high-level, this example establishes a connection to the database, computes the hash value, extracts the malicious value from the configuration, builds the SQL string, and executes it to store the data. The framework provides an Object-Oriented structure and some utilities to help simplify needed tasks as well. Our evaluation of this metric is that it helps alleviate some of the required work for integrating separate tools into the process compared to alternatives. We feel that our framework provides the structure and tools to simplify the process of adding or modifying functionality to the analysis process.

4.3 Reliability evaluation Our evaluation of reliability is based solely on our implementation of a decision tree based classification model that is very naïve. The decision tree is built using only the information returned from VirusTotal via Cuckoo. This is meant to serve as a proof-of-concept implementation of the framework. The test base for this metric is the same 100 sample test base used for the performance test, but with additional clean and malicious files that were not used for training. The total size of this test base is 147 files. The decision tree library for the classification was retrieved from (Kak, 2013). We are using it along with information pulled from the database to build the model. In order to evaluate this metric, there are three basic steps: 

Build the model



Test the model



Check the results

Step one is done by fetching all rows from the `virustotal` and the `malicious_answer` database tables, removing irrelevant columns (such as the file’s hash), formatting it, writing the formatted results out to a temporary file, and passing the file to the decision tree library to build the model. Step two consists of retrieving the VirusTotal results from the database based on the sample’s sha256 hash value. Finally, step three consists of reviewing the tables and comparing the returned result from step two with the expected value. We are using a known test set, so the correct result is known.

133

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky Table 5: The results of the reliability testing using all 3 machines. True malicious value (down), classified malicious value (right) Malicious Non-Malicious

Malicious

Non-Malicious

80 (TP) 1 (FP)

2 (FN) 20 (TN) + 44 unclassified

Table 5 shows the evaluation of the module’s reliability as an accuracy matrix of the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Just as a brief description, the following definition are used: 

True positive: A malicious sample is classified as malicious.



True negative: A non-malicious sample is classified as non-malicious.



False positive: A non-malicious sample is classified as malicious.



False negative: A malicious sample is classified as non-malicious.

The “good” results are the “true” results, while the “false” results are bad. Note that the sum of all four categories and any samples that could not be classified must add up to the total number of samples (147).

The accuracy of the model should grow as larger sample bases are used to regenerate the model over time. In addition, we will be investigating other means of classification (such as using support vector machines) and utilizing additional behavioral features from Cuckoo to generate more sophisticated classification models that will allow more samples to successfully be classified in future research.

5. Future work While our classification engine was fairly naïve (it basically just used the results from traditional AV solutions via VirusTotal and Cuckoo), the framework is complete enough to allow modifications and more sophisticated models to be used. Using this project as a starting base, we would like to pursue several avenues of related research. We would like our future research to be able to answer the following questions: 

How can we optimize the existing framework to allow for greater productivity on fewer resources with better accuracy?



How can we utilize the output from Cuckoo for a more diverse feature set with which to build a more sophisticated classification model?



Can we achieve better results using a different virtualization platform or a different programming language?



Can we make the process of modifying, adding, or removing tools from the processing pipeline even easier for the user?

Each of these questions is related, but serves as a slightly different research approach. We would like to focus on improving the current system and framework before we branch out and start comparing alternate technologies and other tools. Our current classification model most certainly needs work in order to be more effective. The fact that 44 samples out of the 147 samples in the test base could not be classified is something that needs to be addressed

6. Conclusion We began this project with one primary goal: to create a modular, extensible framework to help create a malware analysis lab in a cloud environment. We wanted to help answer the question of “how can we overcome the shortcomings of current cloud-based analysis frameworks in order to provide increased

134

Michael Schweiger, Sam Chung and Barbara Endicott-Popovsky performance, accuracy, and flexibility?â&#x20AC;? We believe we have answered that question with our framework and our implemented system. Our research showed that the need to effectively and efficiently analyze malware was, and still is, needed. Due to this need, there are many researchers, companies, and public institutions across the planet all working on trying to further the science of malware analysis. Our framework provides a starting place for small research organizations to build a specialized analysis environment in-house to use for testing ideas and theories. Itâ&#x20AC;&#x2122;s written in Python, and currently utilizes the Xen Cloud Platform as the underlying operating system on the physical hosts, although it is flexible enough that it could easily be moved to a different virtualization platform. Each part of the malware analysis process can be identified as a discrete component and our framework relies upon this fact. Functionality of the system and the integration of existing external tools depend on the use of custom modules, although our implementation provides a base system on which to build. We have created a sample implementation that includes basic modules for each step of the analysis process, including traditional anti-virus checks, dynamic analysis, tool output aggregation, and classification. The Cuckoo module provides for both anti-virus checking and dynamic analysis, the decision tree classification model allows for the endresult classification, and the modules the tie together Cuckoo, the classification engine, and the database together serve to aggregate the data. The beauty of our design, as well, is that each of these modules can be expanded, disabled, or completely replaced. We have shown that utilizing distributed resources through cloud computing provides a boost to the overall performance of our system, and have advocated for the benefits of using the system rather than building another solution from scratch. While it is a fairly rudimentary system, our solution allows for a modular design and is capable of growing to match additional requirements at a later time. There is always room for improvement, but the fundamental goals were met.

Acknowledgements We like to acknowledge and thank VirusTotal for graciously providing us with a private API key for use during our project. Using the public key could have drastically altered our timing results, as the public key only allows for 4 requests per minute from a particular IP address (VirusTotal, 2013).

References Cha, S.K., Moraru, I., Jang, J., Truelove, J., Brumley, D. and Anderson, D.G. (2011) 'Splitscreen: Enabling Efficient, Distributed Malware Detection', Journal of Communication and Networks, vol. 13, no. 2, pp. 187-200. Egele, M., Scholte, T., Kirda, E. and Kruegel, C. (2008) 'A Survey on Automated Dynamic Malware-Analysis Techniques and Tools', ACM Computing Survey, vol. 44, no. 2, March, pp. 6:1-6:42. Guarnieri, C. Cuckoo, [Online], Available: HYPERLINK "http://www.cuckoosandbox.org" http://www.cuckoosandbox.org [1 January 2013]. Idika, N. and Mathur, A.P. (2007) 'A Survey of Malware Detection Techniques', February. st Kak, A. (2013) Decision Tree, 171 edition. Malware Dump, [Online], Available: HYPERLINK "http://contagiodump.blogspot.com" http://contagiodump.blogspot.com [15 May 2013]. Mandiant, [Online], Available: http://contagiodump.blogspot.com/2013/03/mandiant-apt1-samples-categorized-by.html [3 March 2013]. Martignoni, L., Paleari, R. and Bruschi, D. (2009) 'A Framework for Behavior-Based Malware Analysis in the Cloud'. Microsoft (2007) Understanding Anti Malware Technologies, [Online]. Paleari, R. (2010) Dealing With Next Generation Malware, Universita degli Studi Di Milano. Schmidt, M., Baumgartner, L., Graubner, P., Bock, D. and Freisleben, B. (2011) 'Malware Detection and Kernel Rootkit Prevention in Cloud Computing Environments', 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 603-610. Verizon (2012) 2012 Data Breach Investigations Report, [Online], Available: HYPERLINK "http://www.verizonenterprise.com/resources/reports/rp_data-breach-investigations-report-2012-ebk_en_xg.pdf" http://www.verizonenterprise.com/resources/reports/rp_data-breach-investigations-report-2012-ebk_en_xg.pdf [1 January 2013]. VirusTotal (2013) Main, [Online], Available: HYPERLINK "http://www.virustotal.com" http://www.virustotal.com [5 February 2013]. Zheng, Z. and Fang, Y. (2010) 'An AIS-Based Cloud Security Model', International Conference of Intelligent Control and Information Processing, Dalian, China, 153-158.

135

A Cloud Storage System based on Network Virtual Disk Liang Zhang1, Mingchu Li1, Xinxin Fan1, Wei Yang2, Shuzhen Xu1 1 School of Software, Dalian University of Technology, Dalian, China 2 Department of Computer Engineering, Zhonghuan Information College Tianjin University of Technology, Tianjin, China zlcrypto@hotmaill.com li_mingchu@yahoo.com xinxinyuanfan@gmail.com Jsyangw3@gmail.com dlut_riddleleo@foxmail.com

Abstract: Cloud computing is a promising computing model that enables convenient and on-demand network access to a shared pool of configurable computing resources. Cloud storage becomes a novel storage service mode after the concept of cloud computing proposed. In this paper, we focus on the construction of a cloud storage system based on network virtual disk. We present the client based on virtual disk technology. We also design the cloud storage server based on the hadoop cloud platform. We present a definitional auditing scheme for the storage system integrity. The gateway plays a bridge role between client and server. The gateway also acts as a justice auditor. The virtual disk drive designing ensures the efficiency of communication. The hadoop platform assures the server stability and security. The auditing scheme guarantees the integrity of user data. We proposed the cloud storage system perform efficiently in windows system. Keywords: Cloud storage; Virtual disk; Auditing; Gateway

1. Introduction After the concept of cloud computing raised, cloud storage has become a novel computing model which enables convenient and on-demand network access to a shared pool of computing resources (Mell, P. and Grance, T, 2009: 2). Cloud storage plays an important role in the service of cloud computing, which offers space for data owners moving their data from local to the cloud. Many companies develop some service platforms for cloud storage just like HDFS, GFS, SkyDrive and Amazon S3. Most of Cloud storage services base on network file system which means that user needs to download the whole file for processing and upload the whole file for saving. To solve these problems, we propose the cloud storage system based on network virtual disk. The client sends requests through the internet. Cloud storage server provides services of management and data storage. Gateway acts as a bridge between client and server. It also acts as an auditor for integrity checking. In this paper, we mainly propose the cloud storage system. The system meets the cloud computing development trend. We use many technologies to solve many technical problems. We will introduce the detail in the following sections. The main contributions of our work can be summarized as follows: 

We propose an cloud storage system which is based on network virtual disk technology



We propose an auditing protocol to ensure the integrity of cloud storage



We first propose the system for windows that uses gateway technology with HDFS for cloud storage



We do a performance evaluation about our system

The rest of paper is organized as follows. Section 2 shows related work of cloud storage. Section 3 analyzes the system equipments and properties. Section 4 mainly illustrates the design of the system. We also analyze the necessity of auditing and propose the model. Section 5 talks about feasibility in real scenario and shows the experiment of our design system. Section 6 concludes the cloud storage system.

2. Background and related work Storage on-line is becoming more and more popular after the development of the Internet. Cloud storage has become a hot research direction since the concept of cloud computing was proposed. There are many cloud computing and cloud storage service providers, such as Google, Amazon, Microsoft, IBM, NetApp etc. There are also more and more Cloud storage products on the market, such as Google Drive, Microsoft SkyDrive, Dropbox, Apple iCloud etc.

136

Liang Zhang, Mingchu Li, Xinxin Fan et al Nearly recent years as cloud storage developing, the academics research related to cloud storage as follows: Kevin D. Bowers et al. (2010: 187) proposed a strong, formal adversarial model for HAIL, rigorous analysis, parameter choices and carried out safety and efficiency of the experiment. Brody, Tim et al. (2009) proposed an implementation of a repository storage layer which can dynamically handle and manage a hybrid storage system; The papers (Robert L Grossman and Yunhong Gu, 2008:920 and Yunhong Gu and Robert L.Grossman, 2009: 2429) proposed and implemented a wide area network based on high-computing and storage cloud Sector Sphere; The papers (James Broberg et al, 2009:178 and James Broberg et al, 2009:1012) presented MetaCDN, for the integration of different providers of cloud storage services for content creators to provide uniform, high-performance low-cost content distribution storage and distribution services; Takahiro Hirofuchi et al (2009:69) proposed a live storage migration mechanism over WAN, which can be referred in storage distributed migration. A similar system already exists in Linux. The NFS technology is taken into account for the designing of the gateway system. Shepler S, Callaghan B, Robinson D, et al (2002) proposed the NFS protocols that shows the detail of designing the NFS. Gupta D, Sharma V. (2002:71) designed a gateway which exports an “FTP file system” to NFS clients. We will take the advantages of the gateway concept. Sheepdog (K Morita: 2010: 5-6) is also an implementation of gateway technology. Also there are many existing successful storage services. The Dropbox is a prominent service provider. Houston D, Ferdowsi (2004) gives an overview of the Dropbox and shows the detail of designing. Drago I, Mellia M, M Munafo M, et al. (2012) presents a characterization of Dropbox in personal cloud storage. Cloud storage is a novel service for data storage. Cloud storage provides data storage service to users directly. There are many mechanisms to guarantee users accessing the data securely. We will design a system which meets these requirements.

3. Assumption and goals We propose the cloud storage system based on virtual disk and gateway technology which takes advantage of virtual disk and gateway technology to meet requirements of efficiently processing data and data secure and integrity. We will introduce the target environment for the system running efficiently. We also introduce the properties of the system to target environment.

3.1 Target environment The cloud storage based on virtual disk and gateway technology is mainly to efficiently transmit data which need high bandwidth. We propose an ideal environment that is a local network. The local network can ensure the bandwidth. There are many companies that need the system, such as enterprises and campus which need share data and have an ideal bandwidth. We also provide an internet access interface for users just like other cloud computing products. The method works on B/S model. We mainly describe in local network mode. We briefly describe the assumptions about the environment. 

All bandwidth in the local network should have a high transmit rate. It ensures the system performance.



Storage devices can communicate with each other for setting up the cloud storage.



The environment should have massive data to store and process. Most of them can share data in the system.



The environment should ensure the service quality of the system.



All client operating systems should be windows if the user run the client software system.

The ideal environment is campus which meets requirements and has a convenient test bed for experiment.

3.2 System goals The cloud storage system based on virtual disk and gateway technology aims to relieve users from the limitation of the physical local disk volume. It also aims to make sure the data stable and durable. It also meets the requirements of data integrity. First, the system data space is unlimited, thus the user should not worry about the volume of the storing space. Also the system establish accessing to data conveniently, thus user don’t worry about complex operation of the application. Second, the durable of data should be guaranteed, thus user should not worry about their data loss. Third, the storage procedure is easy to operate, thus users needn’t download the whole file to edit and then upload the whole file.

137

Liang Zhang, Mingchu Li, Xinxin Fan et al An additional target is data integrity: the system would ensure the data integrity. Thus we will need an auditing protocol to ensure the integrity. Also we will put the auditing in gateway for two reasons: 1. the server efficiency can be assured. 2. The justice can keep away the cheating from the server. 3.2.1 Desired properties: Now, we specify the properties and features of the system. We can conclude many properties from the above statement of the system goal. We will analyze the properties from client side to server side. Client: 

The data operation is easy just like local operation that means the system is developed in kernel mode.



To ensure data transport efficiently, we should have a data communication based on the kernel mode.



To share the data, the system should have a concurrent mechanism to process multiple user access one single data space.



To lighten the burden of server, the data should be easily analyzing for server.



To make sure the data secure and integrity, the client needs to run an auditing protocol in the initial data phase.

Server: 

There is no volume limitation for data owner data space.



There is a backup service for ensuring the data durable



The server should be easy to manage.



To keep data transparent, all data path should be unknown for the user.



To ensure the data integrity, the server should keep the data and metadata.

Gateway: 

Gateway plays a role of bridge between client and server.



To prevent the adversary attack, gateway should have a verification mechanism.



To achieve the auditing function, the gateway should store the verification data and challenge the server and verify it.

4. Design We sketch the cloud storage system to network virtual disk after analyzing the goals. We describe the main frame of the client application. Then we describe the server workflow to store data and the server management. Finally we will introduce the auditing scheme.

4.1 Overview The cloud storage system based on virtual disk and gateway technology involves three roles as follows: client, gateway and server. Client runs the virtual disk (Yuezhi Zhou et al, 2006: 26) software. In the system deployment, the client should have a verified identity to login the data space. The client also runs a data initial update operation for later auditing. The client application also needs to transmit the data for accessing and storing. Server is deployed based on the HDFS (Shvachko, K et al, 2010: 2-5 and HDFS Architecture). To keep the data security, the server should get the user verification from gateway. The server also should communicate with client for storing and accessing. The server also runs auditing part to offer proof for gateway integrity checking. Gateway plays a bridge role in the system. If the user wants to access data space, the gateway will verify the user’s identity. Gateway also will translate the data from server to client. Gateway also plays an auditor role to check the data integrity.

138

Liang Zhang, Mingchu Li, Xinxin Fan et al After introducing the main system and functions, we will introduce some related concepts. Then we will introduce detail architecture of client, server, and gateway.

4.2 Client design The client part consists of two parts, one is on the application layer which is used to manage client program and control concurrent file access. The other is network virtual disk driver which is used transmission of data. We will mainly explain the virtual disk driver part. We can clearly understand of the workflow of network virtual disk from figure 1. The user can operate the data in the cloud storage just like local data. All the communications with server will run under the driver layer.

4.3 Cloud storage server We can conclude the server’s characters from the above assumptions. First, the users’ storage space is no limitation. The server should extend user’s storage space dynamically to ensure user having enough space. Second, all the data of user should make backups, thus user should not worry about data loss. Third, the storage server cooperates with client for concurrent file controlling. In conclusion, we should design the server based on cloud storage to meet the requirement. After a research of cloud computing technology, we decided to use hadoop platform as cloud storage system infrastructure. 4.3.1 Data storage The server is mainly used to offer storage space for users. Data storage is the basic service of server. The server offers unlimited storage space for user. We design the server based on HDFS which Win32 Application Runtime DLL Through Internet

Win32 DLL

System Service Object Management IRP IO Stack

Client Management

Server Management

NameNode Server

Filesystem Dirver Virtual Disk Driver

IO Stack IO Stack

Local Virtual Disk Driver

NetWork Virtual Disk

DataNode Server

TDI Transport Interface

Figure 1: Architecture of network virtual disk Ensure the storage volume and management of server. The server listens to the clients’ request all the time. When the server receives a request, it will analyze the request and does a response. The storage service is the work of translating the data to client. 4.3.2 Data concurrent The mechanism is designed to control the group space situation when more than one user operates one data space. The system should consider how to arrange the operating orders. We design a queue tactics to process data concurrent. The server will allocate one lock for each group space. If user wants to write data to space, he should get the lock first. Then he has the right to operate the space.

139

Liang Zhang, Mingchu Li, Xinxin Fan et al 4.3.3 Server management The management part is used to respond the client application request and manges the server data and secure. In the management of server section, it includes load balance, data security, data auditing and data redanduncy.

4.4 Gateway Client uses network virtual disk to access the cloud storage just like local data. We should design a bridge between client and cloud storage to meet the requirement. Gateway is the ideal mechanism between client and HDFS. Gateway consists of file management, access control, security management, cache of read and write. File management is used to map the file stored in the cloud storage to the client. Access control is used to accomplishment multiple user to access share data space. All users can access the data, but only the authenticated user can modify the data. Security management is used to prevent information be invaded. Cache of read and write is used to ensure the read and write operation running fluently. The gateway also acts as an auditor for integrity checking. We will introduce the auditing in an independent section.

4.5 Auditing Model In the cloud storage system, gateway is not only a bridge role between client and server but also an auditor. According to the knowledge that we knew, the Third Party Auditing is preferred. Thus gateway plays the auditor. As we can see from Figure 2, the model can be categorized into two phases. In the initialization phase, data owner should communicate with gateway and server. And the data owner computes the metadata of their data and gets the cryptographic keys with gateway and cloud server. The second phase is query after initialization. The query is run by a challenge-response auditing protocol (Kan Yang and Xiaohua Jia, 2012: 412). We can get the procedure from Figure 3, which contains three steps: Challenge, Proof and Verification. If the gateway needs to check the integrity of data that stored in the cloud storage, it will generate and send a challenge to the cloud storage. The cloud storage will generate a proof and send it back to the gateway after receiving the challenge. Then, the gateway runs the verification to check the correctness of the proof from the cloud storage and store the result for user looking over. 4.5.1 Discussion of Auditing By hosting data in the Cloud, it brings new security challenges. Data owner would worry about their data stored in the Cloud. Therefore it is desirable to have storage auditing service to assure data owners that their data are correctly stored in the Cloud. Considering various factors, gateway should be an ideal auditor. It can act justice for auditing. 4.5.2 Definition We adopt the DPDP definition from (Erway, C et al., 2009: 215). We will introduce new DPDP with the gateway element which plays as an auditor. 

KeyGen (1)→{sk, pk} is a probabilistic algorithm run by the client. It takes a secure key as input, and outputs a secret key sk and a public key pk. The client stores the secret and public keys, sends the secret and public keys to the gateway and sends the public key to the cloud storage.



UpdateRequest ()→{F, info} is an program run by the client when the user does an update requirement. The client sends the file F with the operation info of the update to act.



PrepareUpdate (sk, pk, F, info, Mc)→{e(F), e(info), e(M), Mc’} is an algorithm run by the gateway to prepare update part of the file. As input, it takes secret and public keys update part of the file F with the definition info of the update to be performed which is sent by the client, and the previous metadata Mc. The output is an “encoded” version of the updating part of file e(F). Along with the information e(info) about the update, and the new metadata e(M) to the cloud storage, and the output new version metadata Mc’, the updating part of file F, the definition info of update to be performed sent to the gateway.



PerformUpdate (pk, Fi-1, Mi-1, Mi, e(F), e(info), e(M))→{Fi, Mi,PMc’} is an algorithm run by the cloud storage in response to an update request from the client. It takes as input the public key, the previous version of the file Fi-1, the metadata Mi-1 and the message e(F), e(info), e(M) which sent by the gateway. We should notice that the values e(F), e(info), e(M) are the values sent by PrepareUpdate. The output is

140

Liang Zhang, Mingchu Li, Xinxin Fan et al the new version of the file Fi and the metadata Mi, along with metadata sent to the gateway PMc’. The cloud storage sends PMc’ to the gateway; 

VerifyUpdate (sk, pk, F, info, Mc, Mc’, PMc’)→{accept, reject} is run by the gateway to verify the cloud storage behavior during the update. It takes as all inputs of public and secret keys, the updating part of file F, the definition info of updating part of file performed, previous metadata Mc, updating metadata Mc’ and the proof PMc’. It outputs acceptance (F should be deleted in the case) or rejection signals.



Challenge (pk,sk,Mc)→{c} using a probabilistic procedure run by the gateway to create a challenge for the cloud storage. It takes secret and public keys, along with the latest user’s metadata Mc as input, and outputs a challenge c which will be sent to the cloud storage;



Prove (pk,Fi,Mi,c)→{P} is the program run by the cloud storage on receiving of a challenge from the gateway. It takes the public key pk, the latest version of the file and metadata, and the challenge c as input. It generates a proof P which is sent to the gateway;



Verify (sk,pk,Mc,c,P)→{accept,reject} is the procedure run by the gateway on receiving of the proof p from the cloud storage. It takes the secret key sk and public key pk, the gateway metadata Mc which is user’s metadata, the challenge c, and the proof P sent by the cloud storage as input. An output of accept ideally represents that the cloud storage still keeps the file intact.

Figure 2: System Model of Auditing

5. Feasibility and experiment In this section, we will explore whether the cloud storage system based on network virtual disk is feasible. We should consider about the cost of device, user availability, complexity of deploy the device. First, most of users use windows operating system; client just needs to install the client application. There is no need to add additional device. The server can deploy in a machine which can run many virtual machine system. We can also deploy in different machines if the condition allows. Thus storage cloud will process more efficiently. Next we will introduce our experiment and show the result of running the system. Client can be any PC which runs windows system in the lab. User can use the PC after installing the client application. Server is deployed on the lab server which runs linux system, thus will ensure a high quality of service. Namenode is deployed on the host of lab server. Datanode is deployed on the virtual machine of lab server. Thus we will ensure the communication of the Namenode and Datanode. Gateway is also running on the host of lab server. After we introduce the equipment, we deploy the program. We will illustrate the software with picture. 

If user wants to use the system, he should register as an authorized user.



User starts up the program, then he can click login button to check identity



After identified, user can operate the software. He can do operations as follows: mount group disk, mount personal disk, unmount, review the list of data space, extend data space, apply access control etc.



After user successfully mounting a data space, then user can operate the virtual disk as local disk. He can run the data with win32 application without intervention of the client application.



If user accesses a personal data space, user can modify the data without additional operation. If user accesses a share data space, user can only read data. If he wants to modify the data, he needs to apply for

141

Liang Zhang, Mingchu Li, Xinxin Fan et al the access control of write. This is because share data space that has already designed an access control in gateway. ď&#x201A;§

If user applies for accessing control of write to share space, we should design a strategy to get back access control of write. The strategy can ensure the access control of write running efficiently.

ď&#x201A;§

The server will check the free space of each data space regularly. If the free space is lower than a specified value. Then server will extend the space automatically. If user needs a large free space which the old one is not enough, he can apply extension immediately.

5.1 System performance We now test the time performance of transparent cloud storage to network virtual disk. We test time for the operation from aspects of read, write and delete. We will do our test using file size range from 1KB to 1MB and get the average value of five times. We can conclude from table 1 that average time of reading presents linear growth with file size. Time will be longer when the file size is bigger. Average writing rate will grow between 1KB and 500KB of file. Then the rate will draw to a stable value. We can view the performance of read from figure 3. Table 1: Performance of Read Data FileSize 1 10 100 500 1024 2048

1st 3468 3012 8147 22435 107640 155757

2nd 3457 3054 11258 28976 49568 143921

3rd 3486 2910 19874 19785 81485 203512

4th 3439 3145 9720 28746 67458 139874

5th 3470 2874 7061 24492 94680 137426

avg 3464 2999 9212 24887 80161 156098

Performance(KB/S) 0.28 3.34 10.85 20.09 12.77 13.12

Figure 3: Read Performance We can conclude from table 2 that average time of writing presents linear growth with file size. Time will be longer when the file size is bigger. Average writing rate will grow between 1KB and 500KB of file. Then the rate will draw to a stable value. We can view the performance of write from figure 4. Table 2: Performance of Write Data FileSize 5 10 100 500 1024 2048

1st 1895 1432 4798 13068 52496 119876

2nd 1745 1752 4985 14874 58726 98763

3rd 1688 1858 5283 11953 55954 129071

4th 1921 1535 5065 14985 53749 89412

142

5th 1741 1561 5127 13685 56248 101543

avg 1798 1628 5052 13713 55435 107733

Performance(KB/S) 2.78 6.14 19.79 36.46 18.47 19.01

Liang Zhang, Mingchu Li, Xinxin Fan et al

Figure 4: Write Performance We can conclude from table 3 that average time of deleting file has nothing to do with file size. The cost of deleting time ranges between 1400ms and 1800ms. The delete efficiency can be assured for the negligible time. The trend of time cost shows in figure 5. We can get the same conclusion from figure 5. Table 3: Performance of Delete Data FileSize 1 10 100 500 1024 2048

1st 1480 1586 1684 1468 1548 1412

2nd 1468 1524 1764 1487 1486 1428

3rd 1486 1561 1541 1538 1586 1297

4th 1432 1535 1671 1498 1524 1862

5th 1477 1561 1586 1366 1423 1466

avg 1469 1553 1649 1471 1513 1493

Figure 5: Delete Performance From the test results we can conclude that transparent cloud storage to network virtual disk can run stably for data storage and data access.

5.2 Auditing Performance In the design of the system, gateway plays the role of auditor and a brige of cummunication between client and server. We will do a performance evaluation about gateway from overhead and auditing performance.

143

Liang Zhang, Mingchu Li, Xinxin Fan et al 5.2.1 Gateway Overhead We evaluate the overhead of our auditing scheme in storage cost. We do the evaluation in a scenario where we suppose server possesses a 1 GB file in the server. We strict the detecting a 1% fraction of incorrect data with 99% confidence requires challenging a constant number approximating 460 blocks. The proof size is shown in figure 6. We can see the proof size grows almost linearly as block size. And the proof size is a small percetange comparsion to the stored file. It would be an acceptable overhead for the gateway. Also we can get a conclusion the 16KB would be a nice balance point for the cost consideration. 5.2.2 Gateway performance We evaluete the performance of gateway from the checking incorrect data with different confidence. Then we will test the performance of Gateway auditing. We detect 1% fraction of incorrect data of 1GB file what is block with 16 KB. We do the test 100 times. We can see from table 4 that the hit rate is similar with confidence of 97% and 99%. However the overhead is much more acceptable for 97% confidence. 97% confidence would a better choice with no strict confidence. Table 4: Performance of Gateway Fraction of incorrect data 1% 1%

Confidence 99% 97%

Challenging time 460 120

Proof Size(KB) 225 128

Figure 6: Size proof of possession 1GB file We give a comprehensive evaluation for the cloud storage system. First, we evaluate the performance of read, write and delete. Then, we evalute the gateway performance from overhead and checking performance. All the performance of the system is acceptable.

6. Conclusions As a new mode of data storage, the cloud storage will become a main method for people to access service and store data. The cloud storage system based on virtual disk and gateway technology can regard as architecture to accomplish cloud storage. It is easy and convenient for client to use. Evaluation of cloud storage systemâ&#x20AC;&#x2122;s performance should include durability, reliability; general standards of data read and write efficiency, access control of data. The cloud storage system based on virtual disk and gateway technology can meet the demands of cloud storage. It also can accomplish auditing requirement for ensuring the integrity of data. There are many aspects need to improve. For example the scheduling algorithms for gateway read and write response and request. We will do the research in the future work. Now the system only runs in the experimental environment. With the improvement, our system will be run on the internet and our system will be deployed in campus and enterprise.

144

Liang Zhang, Mingchu Li, Xinxin Fan et al

Reference Brody, Tim et al. (2009) “From the Desktop to the Cloud: Leveraging Hybrid Storage Architectures in Your Repository”, International Conference on Open Repositories, 19 May. Bin Chen et al (2009) “Virtual Disk Image Reclamation for Software Updates in Virtual Machine Environments”, in IEEE International Conference: Networking, Architecture, and Storage, July, pp. 43-50. Drago I, Mellia M, M Munafo M, et al. (2012) “Inside dropbox: understanding personal cloud storage services” [C], the 2012 ACM conference on Internet measurement conference. ACM, 481-494. Erway, C et al. (2009) ”Dynamic provable data possession”, the 16th ACM Conference on Computer and Communications Security, ACM, New York, NY, USA, pp. 213–222. Gupta D, Sharma V. (2002) ”Design and Implementation of a Portable and Extensible FTP to NFS Gateway”, the inaugural conference on the Principles and Practice of programming, 2002 and the second workshop on Intermediate representation engineering for virtual machines, National University of Ireland, 71-76. Houston D, Ferdowsi A. (2008), Dropbox[J]. James Broberg et al. (2009) “Creating a ‘Cloud Storage’, Mashup for High Performance, Low Cost Content Delivery”, Workshops: Service-Oriented Computing, pp. 178–183. James Broberg et al. (2009) “MetaCDN: Harnessing ‘Storage Clouds’ for high performance content delivery”, Journal of Network and Computer Applications, Volume 32, pp.1012–1022. Kan Yang and Xiaohua Jia (2012) “Data storage auditing service in cloud computing: challenges, methods and opportunities”, World Wide Web, Volume 15, Issue 4, pp 409-428. Kevin D.Bowers et al (2010) “HAIL: A High-Availability and Integrity Layer for Cloud Storage”, the 16th ACM conference on Computer and communications security, New York, NY, USA, pp.187-198 Morita K. (2010) Sheepdog: Distributed storage system for qemu/kvm [J]. LCA 2010 DS&R miniconf, Mell, P and Grance, T. (2009) “The NIST definition of cloud computing”, Tech. Rep: National Institute of Standards and Technology. Robert L Grossman and Yunhong Gu. (2008) “Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere”, the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.920-927. Shvachko, K. et al (2010) “The Hadoop Distributed File System”, the 26th Symposium: Mass Storage System and Technology (MSST), 3-7 May, pp. 1-10. Shepler S, Callaghan B, Robinson D, et al. (2003) Network file system (NFS) version 4 protocol [J]. Takahiro Hirofuchi et al. (2009). ”A live storage migration mechanism over wan and its performance evaluation”, the 3rd international workshop on Virtualization technologies in distributed computing, Barcelona, Spain, pp. 67-74. Yunhong Gu and Robert L.Grossman. (2009) “Sector and Sphere: the design and implementation of a high-performance data cloud”, Philosophical Transactions of the Royal Society. 28 June, pp. 2429-2445. Yuezhi Zhou et al. (2006) “Virtual disk based centralized management for enterprise networks”, the SIGCOMM workshop: on Internet network management, pp. 23-28. HDFS Architecture, [online], http://hadoop.apache.org/

145

146

Non Academic Papers

147

148

Disaster Recovery on Cloud - Security Compliance Challenges and Best Practices Sreenivasan Rajendran, Karthik Sundar and Rajat Maini Infrastructure Management Services, Infosys Limited, Bangalore, India Sreenivasan_r02@infosys.com Karthik_Sundar@infosys.com Rajat_Maini@infosys.com

Abstract: Business continuity and Disaster recovery (DR) are crucial aspects for any enterprise. With IT Infrastructure landscape becoming more and more complex and the ever increasing dependence of Business Processes on IT systems, ensuring high availability becomes more challenging. But traditional methods of DR by Hardware duplication and DATAonly back up fail to strike a balance between " Invest Less = Gain more " objective of Corporations. The methods are extremely capital, labor intensive. Whereas a perfect alternative for this traditional approach of DR is Cloud based Disaster recovery, which will reduce TCO by more than 40 percent. Also with reduction in carbon foot print it provides way for Green IT, at the same time not compromising on RTO and RPO requirements. Also they come with the ease on DR testing which seem to be a much neglected across Enterprises. A major concern that CIO have is the security compliance and the regulatory constraints that they face. Even though cloud computing has evolved over a years, the security aspects are still in nascent stage. An investment on DR on cloud will turn to be more attractive once these security compliance concerns are addressed and rectified. In the recent years we have observed that corporations are keen on virtualizing their server landscape. Since Virtualizing resources pays way for tapping the potential of cloud computing, DR on cloud becomes a key area of interest to enterprises. This holds very true in the current scenario wherein DATA CENTER Consolidation is a key focus stream across enterprises irrespective of the industry they belong to. Since ensuring security on cloud is quite challenging, it is vital that corporates have a deep understanding of the various security aspects related to cloud. Thus this paper catches the inputs on operational level details from professionals working on cloud based solutions and tries to outline the challenges faced and best practices that has to be looked upon in adopting Cloud based DR model. Keywords: Disaster Recovery; Cloud; Cloud Security

1. Introduction A disaster can strike your IT infrastructure at any time. Without prior planning the consequences can be devastating. Thus an investment in Disaster recovery proves to be a critical area for enterprises world over. As the dependencies of business operations on the IT systems increases, the importance towards setting in place an effective DR Infrastructure becomes inevitable. Traditional approaches to disaster recovery revolve around the concepts of 1) DR by Duplicating the Infrastructure 2) DR by Back UP (or) Data only protection. But these approaches require labor-intensive backups and expensive standby equipment whose capacity is wasted. Also with growing Infrastructure landscape the traditional methods lack in Ease of scalability. Today, cloud-based DR is poised to redefine these legacy approaches and offer corporations a great alternative. Instead of enterprises buying resources in case of a disaster, pay-per-use pricing model in cloud ecosystem ensures protection at lowest cost. This paper examines DR on cloud with specific focus on cloud based security threats, security features, compliance and security standards that are key in opting for DR on cloud. 

Cloud Ecosystem - Overview of IaaS, PaaS, SaaS:

Before we get into the concept of Cloud Models, it is important to understand the conceptual difference between the terms “virtualization” and “cloud”. Virtualization is the ability to emulate hardware resources with help of software. The imitation of the hardware resources is achieved by loading a software stack called Hypervisor over base operating system. This hypervisor creates simulated environments with multiple virtual instances / Servers running on a single hardware resource. On other hand, cloud can be precisely defined as “aaS " - As A Service model, wherein IT resources (e.g., networks, servers, storage, applications, and services) can be provisioned and released with minimal management effort or service provider interaction (Peter 2011) . It is important to understand that, Virtualization is a key driver for cloud but it’s not necessary that all IT systems on Cloud be virtual. There are 3 service models that are available in cloud ecosystem (Peter 2011) namely, 

IaaS – Infrastructure as a Service: Provision processing, storage, networks, and other fundamental computing resources (Example – AWS)

149

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini 

PaaS – Platform as a Service: In-house applications or acquired applications deployment in cloud (Example – Vblock™ Infrastructure Platforms)



SaaS

– Software as a Service: Make use of applications running on cloud (Example- Microsoft 360)

The below Figure 1 gives a holistic picture of various cloud models described above

Key Note: IaaS plays key role in DR on cloud model. In this paper the following discussions are restricted to leveraging IaaS for DR. 

Disaster Recovery – RTO and RPO defined

Disaster recovery is defined as the technology aspect of the BCP which primarily focuses on RECOVERY AND RESTORATION of business functions from a disaster & ensuring that the business is able to get back to normal within a short time span WITH VERY MINIMAL LOSS TO THE BUSINESS 

RPO: Recovery Point Objective (The Measure of Data Loss)

Recovery point objective describes a point in time to which data must be recovered. Achieving RPO is very important and is of extreme priority to organizations, as no enterprise can afford to lose data 

RTO: Recovery Time Objective (The Measure of Downtime)

Recovery time objective measures the amount of time a system or application can stop functioning before it enters the intolerable limits on the organization. RTO varies based on the criticality of applications. Mission critical applications may be subject RTO of few minutes to hours, as their availability is considered vital for the organization. . RTO is used to determine the type of backup and disaster recovery plans and processes that should be implemented to protect applications. 

Challenges with Traditional DR approach:

According to Forrester Research (Rachel 2012) the two prime challenges that traditional approach of DR pose are 

Increased complexity of IT Infrastructure Landscape



Budget constraints and Cost associated in maintaining the DR set up

Figure 2: Traditional DR approach - Challenges 

DR on Cloud:

150

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini The challenges faced by the traditional approach for DR gives birth to the new evolutionary concept of DR on cloud. Especially with the advent of technological advancements in Virtualization space, DR on cloud is an option that is to be explored. There are 3 types of DR on cloud models, 

Hot Cloud

– Suited for Mission critical applications (RPO and RTO – Very minimum)



Warm Cloud – Suited for Business critical and Business Important applications



Cold Cloud

– Suited for Business supporting applications

Table 1: Mapping of Applications to respective DR Cloud models Application Type

RTO Range (in hrs)

RPO Range(in hrs)

Mission Critical Applications

Minimum 0

Maximum 4

Maximum 0~1

DR Cloud model Recommended Hot Cloud

Business Critical Applications 4 12 1 12 Warm Cloud Business Important Applications 12 24 12 24 Warm Cloud Business Supporting Applications 24 Max* 24 Max* Cold Cloud Note – Data points are based in Internal study and Secondary research on DR on cloud offering by various vendors like SunGard, Xerox and Citrix etc. *Max- Limit based on requirement

The above Table1, maps the various applications types within an organization to that of suitable cloud DR model.

1.1 Hot Cloud Recovery A hot cloud is more or less synonymous to the HOT SITE we have in traditional approach. In case of HOT CLOUD based Recovery, the DR VMs – Virtual Machines are run constantly and kept up-to-date using replication or backups (depending on desired RPO). The RTOs in this approach are usually in the range of few minutes to max 4 hours. This model does not take advantage of one of the key advantages of cloud — the ability to run or stop VMs on demand. Here the protection of the applications is of utmost priority and hence there is a regular replication between the primary site and the secondary DR site.

1.2 Warm Cloud Recovery All the business critical and business important applications will fall under this category. A warm cloud recovery site is a cloud that contains up-to-date versions of production VMs that are kept offline and idle. During a disaster or test, you can quickly spin up virtual machines (VMs) from offline VMs — resulting in RTOs usually of 4 to 24 hours and RPOs in the range of few hours to maximum 24 hours.

1.3 Cold Cloud Recovery This is recommended for less critical / business supporting applications. In case of cold cloud recovery model, there is no dedicated Hardware resource. Resources are identified and provisioned only in event of disaster. The only cost that will be incurred is the storage infrastructure where the backup images of the environment are saved. 

Cloud readiness: Cloud Fit Assessment

Even though virtualization is not a pre-requisite for adopting cloud platform, yet one should understand that virtualization brings home a unique ability to efficiently share resources. This ability in turn results in cost savings for enterprise. One of the preliminary steps in adopting a Cloud based DR is to evaluate the fit of that particular application(s) to cloud. Not every application is fit to be in cloud. A typical example would be, A ERP application if moved to cloud will definitely have its performance impacted. Also an application that has high put on cloud will experience performance degradation. A cloud fit assessment is one of the pre requisites that is to be taken up in moving towards a cloud based DR. 

Key challenges in Going for DR on cloud:

151

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini According to Forrester research “IBM 2012” the key barriers for cloud based DR adoption are 

Status Quo



Security Concerns

Status Quo has got to do with the mind-set of management and other stakeholders in the corporate system. But Security concerns are the most challenging aspects when opting for cloud based DR. Security on cloud becomes even more challenging when we think of DR on cloud because, investment in DR is predominately made for mission critical and Business Critical applications. As DR is designed for TOP Critical applications, ensuring security becomes even more crucial. 

Security on Cloud – How safe is your DATA?

According to ISO 27002:2005 definition of Information Security is the preservation of 

CONFIDENTIALITY -

Ensuring that information is accessible only to those authorized to have access to it.



INTEGRITY - Safeguarding the accuracy and completeness of information and processing methods



AVAILABILITY - Ensuring that authorized users have access to information and associated assets when required

1.4 Components of information Security People who use the information?

Work practices within your organization?

Physical Security Components Network Security Application Security Access Devices

Figure 3: Cloud Security Pyramid Security is not just confined to technology. People and Processes are other two critical components that play a crucial role in ensuring the cloud security. PEOPLE may be internal as in case of employees, management, customers, clients etc. It also has external stakeholders like the cloud service provider, regulators etc. Ensuring Access controls and other restrictions will help address the various threats posed by PEOPLE component. PROCESS refers to the work practices in the system. Help desk, service management, change request, identity management are some process areas. Some of the tools deployed in executing security processes are ArcSight Security Information and Event Management (SIEM), Symantec™ Endpoint Protection 12 and Qualys Guard for Vulnerability Management. TECHNOLOGY is all about what is used to improve what is done on daily basis. All the Infrastructure components ranging from network (Routers , Firewalls , VPNs etc.) , Physical Security components (CCTV camera, Biometrics , Electricity and Power back up etc. ) , Application level security ( more customized to the application in use) and finally the Access Devices (Desktops, VDIs, PDAs , Laptops, Printers, scanners etc.). More on the technology front will be discussed in the following sections.

2. Protection on cloud 2.1 Virtualization The concept of virtualization is available in two types, 

Host OS Based Virtualization

152

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini 

Bare Metal Virtualization

In case of Host OS based model the virtualization runs as a service administrated by Host operating system. This results in performance overburden as execution of codes by HOST OS increases multifold level. On the other hand Bare metal virtualization deploys an operating system (mostly Linux based) exclusively perform the virtualization function. Thus Bare metal virtualization with a customized operating system is less vulnerable to security attacks compared to Host OS based virtualization.

2.2 Virtual Instance Protection In case of Virtualization it is important to understand that each virtual instance is connected via Software Bridge. This underlying concept of connected architecture between the various virtual instances running on a single host ensures the security even more challenging. Especially in a cloud based DR set up, this is even more challenging as the virtual instances within the same Host machine might be shared by multiple corporations. This poses a severe vulnerability especially when the hypervisor is compromised by any means. Thus it becomes very important to understand the concept of virtual instance protection so as to address the security concerns with respect to cloud based DR. Even though Cloud based virtualized model provides us with many advantages like cost effectiveness, agentless monitoring, high visibility into the systems etc., yet it’s important to understand the various security challenges it brings home. While a physical partition (Example – A dedicated storage disk mapped to particular virtual instance) reduces the chances for security threat by multifold levels but comes with extra cost burden. The logical partitions, underlying principle for effective virtualization calls in for enhanced security features. 

Cloud Security Framework

Ensuring a robust security in place is of highest priority because a security breach may result in any of the following situations, 

Reputation loss.



Financial loss.



Intellectual property loss.



Legislative Breaches leading to legal actions.



Loss of customer confidence.



Business interruption costs.



Loss of Goodwill.

Thus it’s important to protect the most valuable asset of any corporations i.e. DATA. Not all DATA is critical but investments in DR are made predominantly focused on Critical Applications. This blows the whistle for increased security features in the DR site. The security becomes even demanding when the Infrastructure is Shared and virtualized as in case of any cloud offering. But technological advancements over the years have helped corporations build highly secure Infrastructure on Cloud. A secured environment is one which uses the right mix of technology tools clubbed with various compliance standards and International Frameworks. It’s the blend of these 3 (Technological Tools, Compliance Standards and Frameworks) which can ensure highly secured environment.

153

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini

Figure 4: DR Cloud Security Framework The above fig 04 clearly depicts the security framework model that blends the mix of Technology, Compliance Standards and Frameworks. In working with cloud based model, there are 2 protection check points that is to be considered. They are 

Data in transit :Protection layer 01 (* Forward/Backward movement to/from Cloud)



Data at Rest : Protection layer 02 (* On the cloud and Vaulted location)

2.3 DATA in Transit The protection layer 01 deals more with the Network infrastructure (In precise Network connectivity). Tampering of data from the network pipeline poses severe security threat. Use of VPNs, dedicated connectivity pipeline, Encryption of data using keys, cryptography etc. are some of the proactive measures that are to be deployed to address the security concerns.

2.4 DATA at REST Once the DATA reaches the cloud via replication methodology, it’s stored in the appropriate Infrastructure servers on the cloud. Here the People, Process and the TECHNOLOGY components of the security pyramid plays crucial role. The various security measures that can be used are outlined in the below sections. Log management acts as an important monitoring mechanism to trace back the root cause of any problem. Effective log management aids in incident and problem management. We may request the readers to read more about these in ITIL official website where detailed information is published. Physical and Environmental Security 

Environmental management Systems: Weather and Temperature



Fire Detection and Suppression



CCTV Cameras



Clock in systems / Biometrics



Electricity / Power backup



Monitoring/Management



Storage Device Decommissioning - Standard procedure that prevents data theft

154

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini Network Security 

Host-based firewalls



Host-based intrusion detection/prevention



Encryption at each level



Secure Access Points: Firewall defined with specific Port Numbers based on criticality of applications



Transmission Protection - Secure Sockets Layer (SSL), a cryptographic protocol



Secure Shell (SSH) network protocol



Virtual Private Network connectivity between primary and DR site



Automated network monitoring system - monitor server and network usage, port scanning activities, application usage, and unauthorized intrusion attempts



Account Review and Audit



Identity and Access Management - create multiple users and manage the permissions for each of these users within specified Account

End Point Security 

AV - Endpoint Antivirus



HDLP - Host Data Loss Prevention Endpoints



HIPS - Host-based Intrusion Prevention Systems

2.4.1 IDAM (Application Identity and Access Management ) 

Authentication



Authorization



Federation



Entitlement

Application Security 

Application testing / Application Penetration testing



Application Scan



Network Scan

3. GRC (Governance Risk management and Compliance) Governance, Risk management and compliance are Key checkpoints for organizations to ensure the CIA of Security.

3.1 RISK Assessment Performing Risk assessment will help eliminate/mitigate most challenges that one will foresee with cloud based DR. The below section describes the DR specific Risk Assessment template which will help corporations identify, document and mitigate risks. Risk Category – Broadly classifies risks into IT Infrastructure specific risks and Non- IT Infrastructure Specific Risk Groups - Subcategories of the main Risk Category. Risks are the individual risks under each group that can affect the business. 

Likelihood is estimated on a scale from 0 to 10, with 0 being not probable and 10 highly probable. The likelihood that something happens should be considered in a long plan period, such as 3, 5, 10 years.



Impact is estimated on a scale from 0 to 10, with 0 being no impact and 10 being an impact that threatens the company's existence. Impact is highly sensitive to time of day and day of the week.

155

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini Restoration Difficulty Index (RTO Achievability * RPO Achievability) RTO Achievability Index: Restoration time taken in event of particular Disaster be X Case 01: If X < 60 % (RTO); RTO Achievability Index = (1 ~ 4) Case 02: If 60 % > X < 90 % (RTO); RTO Achievability Index = (5 ~ 7) Case 03: If X > 90 % (RTO); RTO Achievability Index = (8 ~ 10) RPO Achievability Index: Let Y be the Restoration time for Restoring DATA in the event of a particular Disaster Case 01: If Y < 85 % (RPO); RPO Achievability Index = (1 ~ 4) Case 02: If 85 % > Y < 95 % (RPO); RPO Achievability Index = (5 ~ 7) Case 03: If Y > 95 % (RPO); RPO Achievability Index = (8 ~ 10) Risk Categorization

Figure 5: Risk Categorization calculations

Figure 6: Risk Assessment template

156

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini

3.2 Compliance standards: The below table outlines the various regulatory compliance standards with respect to specific industry, Table 2: Regulatory/ Compliance standards

Best Practices for Cloud Based DR 

Logging all activity for security review



Define key operational metrics for network monitoring and protection and notify personnel accordingly



Background checks for CSP employees who will get access to DR infrastructure



Robust password policy



Robust Change Management- Routine, emergency, and configuration changes to existing client infrastructure should authorized, logged, tested, approved, and documented in accordance with industry norms for similar systems.



Multi-Factor Authentication - provide a single-use codes in addition to standard user name and password credentials. More like one time passwords.



Physically isolated instances at the host hardware level (* But it’s not cost effective)

4. Benefits of Cloud Based DR Cost Savings – Minimum 40 % reduction in TCO. The same is represented in graphical form in Fig below

Figure 7: Cost saving in Cloud based DR 

Faster recovery – As the provisioning of Hardware and replication technologies can be upgraded at no extra cost. The CSP will constantly update their technology to stay upbeat with market trends



Ease of DR testing – DR testing becomes less burdensome with the Cloud based DR



Scalable - Extremely scalable and ON-DEMAND expansion possible.

157

Sreenivasan Rajendran, Karthik Sundar and Rajat Maini

5. DR Roadmap As corporations look at Consolidate their existing IT landscape via virtualization, it’s high time they look at their traditional DR methodology with multiple DR instances. As discussed in the initial sections Cloud readiness of the application is key assessment area that is to be taken up before moving to Cloud based DR. The below fig describes a road map for organizations to move from current DR set ups to DR on cloud. The road map can be clearly inferred from the Fig 08.

Figure 8: Smart DR Roadmap

6. Conclusion There is a growing concern and focus on DR and it is just matter of time before DR becomes an integral part of organizations IT Systems. Especially with stringent regulatory constraints and need to safeguard reputation, DR is becoming a red alert area for investment. It is to be noted that with the advent of virtualization technology and corporations embracing the same, DR solutions have become more simplified. A clear prediction based on our internal study is that DR on cloud is going to be key focus area in the years to come. Cloud service providers going for standardized reporting like SOC 1 (SSAE 16/ISAE 3402) and SOC 2, increases transparency and reduces the cloud based security concerns. In specific as stated in initial sections, a right blend of Technology, Compliance standards and Frameworks can only ensure secured concerns are addressed. A clear movement of enterprises towards cloud based DR in the years to come is expected. Acknowledgement - We would like to extend our wholehearted thanks to our Management, Infrastructure Management Services, Manufacturing vertical – Infosys Limited.

References Amazon web services white paper “Amazon Web Services - Security Best Practices, “January 2011 Amazon web services white paper “Amazon Web Services – Risk and Compliance , “January 2013 Amazon web services white paper “Amazon Web Services – overview of Security processes , “March 2013 Chris Brenton , “ Basics of Virtualization Security “ – Cloud Security Alliance presentation IBM commissioned study, “Cloud-Based Disaster Recovery Barriers And Drivers In The Enterprise “, March 2012– Forrester study Peter Mell and Timothy Grance , “The NIST Definition of Cloud Computing - Recommendations of the National Institute of Standards and Technology “ , September 2011 Rachel A. Dines and Lauren E. nelson, “An Infrastructure And Operations Pro’s Guide To Cloud-Based Disaster Recovery Services “March 20, 2012 – Forrester study

158

Work in Progress Papers

159

160

Records in the Cloud (RiC): Profiles of Cloud Computing Users Georgia Barlaoura, Joy Rowe and Weimei Pan University of British Columbia, Vancouver, Canada georgiabarlaoura@alumni.ubc.ca joy.rowe@alumni.ubc.ca weimei.pan@alumni.ubc.ca Abstract: Cloud computing is an evolving technological concept that offers flexibility and scalability to its users. This paper describes a survey focusing on the users of cloud services that it is part of the â&#x20AC;&#x153;Records in the Cloudâ&#x20AC;? research project of the University of British Columbia. The survey contributes to the discussion about the cloud services, and tries to determine the reasons why small and medium-sized organizations choose cloud technology, the issues they encounter and their future expectations. The participants of the survey identified cost and collaboration as their main motives for using cloud computing, while security risks and legal implications were their greatest concerns. For the users, steps need to be taken towards a better regulated and simpler cloud environment. Keywords: cloud computing, survey, cloud users, security risks, challenges, legal framework

1. Background Advanced technological systems led to new opportunities for individuals and all kinds of organizations. Besides facilitating communications, state-of-the-art technological models present challenges and raise a number of issues. Within this environment, the impressive quantity of information created, exchanged and used is scattered across jurisdictions and business contexts with different requirements and mandates. Cloud computing is an evolving technological environment that offers many advantages to its users, e.g. flexibility and scalability. Despite its popularity, though, there are aspects of the cloud environment that need clarification and that raise concerns that must be addressed. Since its introduction, cloud environment has been the subject of extended discussions, research and surveys. These different approaches tend to examine cloud technology in regards to a specific issue (Derek 2011), challenge (KPMG 2013) or professional field (Stuart and Bromage 2010). In addition, the studies examine the progress of the technology in the market or its maturity (Cloud Security Alliance 2012), or they attempt to capture the perspectives of all parties involved (users, providers, vendors etc.). Records in the Cloud (RiC) is an international research project, a collaboration between the University of British Columbia and a number of universities in North America and Europe, that researches the issues surrounding the management and storage of records in the cloud environment. More specifically, it investigates the challenges, goals and expectations regarding cloud computing from the points of view of users and providers. The survey presented in this paper was developed as part of the research regarding the users of the cloud that we conducted in order to allow them to reflect on the reasons they chose this technology. Our focus was on companies and industries that gradually move the whole of their records, applications and informational assets to the cloud. During the first steps of our research, we examined many examples of satisfied cloud users. These cases of the successful implementations of cloud applications were presented by cloud services providers. Therefore, we decided to dedicate part of our research to identifying the concerns and degree of satisfaction experienced by the users of cloud computing, as well as the challenges they faced. Another reason for the development of this survey tool is that, as stated above, the existing research, to our knowledge, either examines the perspectives of all parties involved simultaneously, or else present cloud environment through the lenses of a specific issue or professional field.

2. Methodology Initially, we developed a questionnaire that went through a number of revisions and testing stages. The goal was to get a clearer view of the terms, the extent of technical details and the general character of the questions we could include. Through the comments gathered from the pilot respondents, we concluded that

161

Georgia Barlaoura, Joy Rowe and Weimei Pan the breadth and depth of knowledge about cloud computing varied significantly. Our challenge was to contact as many types of cloud users as possible, and we revised our terminology and questions to make the survey accessible. The final survey included thirty-four questions intended to elicit basic information related to the views of current and past users of cloud technology, as well as from those who are considering using it in the future. We posted the survey to archival, library and records management listservs with international scopes, as well as to various legal, IT and cloud security listservs primarily in North America. Besides using the listservs, we circulated the survey via social media networks such as LinkedIn, Facebook and Twitter. Both listservs and social media networks received one email when the survey was opened and a reminder email when the survey deadline approached.

3. Survey outcomes While there is pervasive discussion and marketing about cloud computing, our survey shows that the use of cloud computing is far from mature. In terms of both the rate of cloud computing use and of users’ capacity to exploit its potential, the cloud computing market is still developing. While this may be normal in a technology’s developing stage, the absence of related regulations, standards, laws, and security assurances can also help to explain users’ hesitancy to embrace cloud computing. Among those who already use cloud computing, only 38% of survey respondents work for organizations with more than three years’ experience with cloud computing, which indicates that the majority of these organizations are still in the exploration stage of forming a comprehensive understanding of cloud computing (Figure 1). Perhaps reflecting their relative inexperience with cloud computing, most survey respondents also gave a neutral answer when asked their degree of satisfaction with cloud computing, with 40% selecting the middle choice of a five-point scale (Figure 2). Response

Chart

Percentage

More than five years

18%

More than three years

20%

More than one year

36%

Less than one year

12%

I don't know

14%

Figure 1: How long has your organization been using cloud computing? Response

Chart

Percentage

10%

40%

19%

I don't know

27%

Figure 2: To what degree is your organization satisfied with cloud computing? Respondents stated that their organizations’ primary motivations for adopting cloud computing were to reduce cost and to increase collaboration, storage capacity and performance (Figure 3). Among organizations that are considering moving to the cloud, the top four motivations are identical: cost, collaboration, storage

162

Georgia Barlaoura, Joy Rowe and Weimei Pan and performance (Figure 4). When asked to comment freely on their motivations for moving to the cloud, respondents noted their reasons as the desire to have “access to files from multiple computers” and to “send and receive files”. Apparently, while the consensus that cloud computing increases performance demonstrates a belief in the technology’s transformative power, the current use of cloud computing is still limited to the personal level and is not directed towards increasing cooperation within the whole organization. Response

Chart

Percentage

Reduce cost

54%

Increase performance

30%

Improve security

15%

Increase storage capacity

33%

Increase collaboration

54%

Keep pace with the industry

21%

Other (Please specify)

21%

Figure 3: What are the primary reasons your organization uses cloud computing?(Select all that apply) Response

Chart

Percentage

Reduce cost

57%

Increase organizational performance

51%

Improve security

16%

Increase storage capacity

54%

Increase collaboration

41%

Keep pace with the industry

27%

Drive business process transformation

22%

Enhance new market entry

Other (Please specify)

Figure 4: What are your organization’s motivations for considering cloud computing?(Select all that apply) One of the reasons for this situation can be deduced from users’ concerns about cloud computing. More than 30% of respondents noted that six of the eight options were a concern for them (Figure 5). Moreover, the narrow range between different options (32% - 56%) indicates it is the whole cloud-computing environment that requires regulation, rather than a particular aspect.

163

Georgia Barlaoura, Joy Rowe and Weimei Pan

Response

Chart

Percentage

Security risk

56%

Privacy risk

40%

Technological complexity

20%

Legal implications

48%

Loss of control of data

44%

Cost

32%

We don’t know about cloud computing

32%

We don’t trust cloud computing

16%

Figure 5: Why is your organization not considering cloud computing? (Select all that apply) Despite respondents’ limited experience with cloud computing (less than three years), 36% of users have already encountered issues (Figure 6). These issues are mostly related to access (34%); in addition, 26% of respondents reported challenges regarding the management of information in the cloud environment (Figure 7). Response

Chart

Percentage

Yes

36%

31%

I don't know

32%

Figure 6: Has your organization experienced any issues with cloud computing? Response

Chart

Percentage

Legal issues

18%

Unexpected costs

11%

Internal security issues

18%

External security breach

15%

Privacy issues

23%

Loss of information

18%

Information or records management (IRM) issues

26%

Access issues

34%

Forensic issues

I don't know

34%

Other (Please specify)

14%

Figure 7: Which of the following issues has your organization experienced using cloud that apply)

164

computing? (Select all

Georgia Barlaoura, Joy Rowe and Weimei Pan It is also interesting that users tend not to negotiate the service level agreements with cloud providers (62%) but instead sign standardized contracts. However, they are highly interested in seeing the agreements address issues related to data ownership (93%), security (87%) and information backup and recovery (87%). These practices indicate that the users know their needs and the risks they need to mitigate but they do not feel confident to define enough the parameters of the cloud services they are being provided. By the end of the month during which the survey was accessible on listservs and social media, we received approximately 400 responses, with the majority of respondents representing organizations with over 500 employees.

4. Conclusion The initial processing of these responses indicated that despite the popularity of cloud computing, users are hesitant to use its services and have a basic understanding of its potential. The users identify as paramount the need to establish a clear regulatory environment that would provide the necessary trusted foundation and allow them take full advantage of the benefits of cloud computing services. There is a common understanding of the benefits of cloud computing, but users choose to take small steps towards this technological environment and to make low-risk decisions when using it. The lack of the security assurances and regulations are the reason why the majority of the users opt for Infrastructure as a Service (IaaS) and why, when possible, they choose to develop private cloud environments (North Bridge Venture Partners 2013). The broad and undefined legal framework and its associated business risks are not encouraging for those considering using cloud in the future (Cloud Security Alliance 2010). Moreover, the fact that many have already encountered issues with the cloud, despite its early stage of development, doesn’t improve the impression that users have of the cloud. The review of the literature and of existing surveys and studies shows that there is a unanimous request for a clear legal framework (Narayanan 2012), and for international standards and regulations. The existing standards, guidelines (Australian Government 2011), and the fragments of legislation, mainly at the national level, are not enough to control and secure the evolution of the cloud. In order for the users to build their trust in the cloud, become familiar with its components and feel confident to choose what best serves their needs, they must know that there are the legal mechanisms capable of protecting their interests and of helping them mitigate risks. In addition, the establishment of explicit standardized requirements will facilitate the work of cloud providers and vendors, enabling them to develop optimized models and products and to introduce them properly to users. Given that there are differences among the countries, professionals, functional requirements and financial capacities, the development of a commonly accepted legal “umbrella” is imperative in order to allow cloud computing to reach maturity.

References Australian Government (2011) “Cloud Computing Strategic Direction Paper: Opportunities and applicability for use by the Australian Government”, Canberra, Australia. Derek, Mohammed (2011) “Security in Cloud Computing: An Analysis of Key Drivers and Constraints”, Information Security Journal: A Global Perspective, Vol. 20, No.3, pp123 – 127. Cloud Security Alliance, Information Systems Audit and Control Association (2012) “2012 Cloud Computing Market Maturity Study Results, Rolling Meadows, Illinois. Cloud Security Alliance, Information Systems Audit and Control Association (2010) “Top Threats to Cloud Computing V1.0”, Rolling Meadows, Illinois. KPMG International (2013) “The cloud takes shape, Global cloud survey: the implementation challenge”. Narayanan, Vineeth (2012) “Harnessing the Cloud: International Law Implications of Cloud – Computing”, Chicago Journal of International Law, Vol. 12, No. 2, pp783 – 809. North Bridge Venture Partners (2013) “Future of Cloud Computing Survey”, Waltham Massachusetts. Stuart, Katharine and Bromage, David (2010) “Current State of Play: Records Management and the cloud”, Records Management Journal, Vol. 20, No. 2, pp217 – 225.

165

Records in the Cloud – A Metadata Framework for Cloud Service Providers Dan Gillean1, Valerie Leveillé2 and Corinne Rogers2 1 Artefactual Systems, Inc. Canada 2 University of British Columbia daniel.gillean@gmail.com valerieleve@gmail.com corinne.rogers@gmail.com

Abstract: The Records in the Cloud project (http://recordsinthecloud.org/) involves exploring both the benefits and risks of the creation, use, management, storage, long-term preservation and access of records in the "cloud", or virtual computing infrastructure. The research seeks to weigh the risks versus benefits of keeping records in the cloud within Canadian legal, administrative and value systems by exploring questions of confidentiality, security, organizational forensic readiness, information governance, and the ability to prove records’ evidentiary capacity by establishing their accuracy, reliability and authenticity. This poster will focus primarily on the project’s initiative of establishing a framework for a standard of trust for the cloud, one that guarantees the authenticity, reliability and integrity of the data and records that users entrust to cloud infrastructures. The current goal is to work towards consolidating user interests into one trusted, standardized framework that could be applied by cloud providers as a best practice standard in the design of their infrastructure and the deployment of their services. In this poster we present preliminary results of our research into identifying the provisions that would need to be addressed within such a framework, such as questions of custody, control, and privacy, as well as of integrity, authenticity and reliability. Keywords: cloud computing; records; security; trust

1. Introduction The cloud has become one of the fastest growing business models for pervasive computing. While reduced initial costs, scalability, and collaboration capabilities are depicted as “advantages” that have motivated some organizations to deploy their records, documents, and data to the cloud, they are also factors that pose a series of security concerns that continue to be brought to the attention of both vendors and consumers when considering cloud computing as an alternative or complementary record-keeping strategy. At the University of British Columbia (UBC), a SSRCH-funded research project entitled Records in the Cloud (RiC) is exploring both the benefits and risks of the creation, use, management, storage, access, and long-term preservation of records in the cloud. While research conducted to date has produced significant results, it has also highlighted equally important gaps and, therefore, opportunities for the future research within the project. This paper will begin to explore one of these topics: cloud metadata, or more specifically, integrity metadata generated within a cloud environment. Building on previous research to create a metadata application profile for authenticity metadata (Tennis & Rogers, 2013), this paper presents initial research toward creating a platform-neutral method of ensuring appropriate metadata capture with vendors and providers who are interested in increasing the trustworthiness of their services. It will begin with a summary of the concerns related the proper capture and maintenance of this metadata by cloud service providers (CSPs) by drawing upon current literature, past research conducted at UBC, and on the results of the research conducted to date through the RiC project. The goal is to explore the potential for a metadata schema that could be integrated as an abstracted layer to existing cloud computing environments via application protocol. Such a protocol could interact with existing provider metrics systems to automatically capture information required by the InterPARES Chain of Preservation (COP) Model for maintaining authentic records over time (Duranti & Preston, 2008).

2. Metadata, digital preservation, and trustworthiness in the cloud In traditional archival theory, the concept of trustworthiness is derived from “the rationalist tradition of legal evidence scholarship … the modernist tradition of historical criticism … and the diplomatic tradition of documentary criticism,” (MacNeil, 2002). Trustworthiness is the quality of being dependable, reliable, and able to produce consistent results over time. As Duff (1998) points out, the “mere existence of a record does not ensure that it will faithfully represent a transaction or an event; its credibility must be ensured through the establishment of reliable methods and procedures for its creation, maintenance, and use over time.”

166

Dan Gillean, Valerie Leveillé and Corinne Rogers Trustworthiness is comprised of several interrelated archival concepts, each of which depend in some way on the context of the record’s environment: reliability (the trustworthiness of a record as a statement of fact) depends on the degree of control on a record’s procedure of creation; accuracy (the correctness and precision of a record’s content) depends in part on the controls placed upon the content’s recording and transmission; and authenticity (the trustworthiness of a record to be what it purports to be) depends in part on the reliability of the records system in which the record resides, and can be presumed when there is an unbroken chain of responsible and legitimate custody. These concepts are mirrored in the juridical context of legal admissibility, where for example, the best evidence rule asserts that an electronic record’s admissibility may be determined by establishing the reliability of the recordkeeping system in which it resides, and demonstrating a trustworthy chain of custody over the record throughout this recordkeeping system (Canada Evidence Act, s. 31.2, 31.3). As such, the recordkeeping environment in which records reside can “either increase or decrease their reliability and trustworthiness” (Duff, 1996), thereby affecting the admissibility of the records. In the digital environment, the metadata associated with a record can be an effective means of establishing, maintaining, and assessing its trustworthiness over time, and increasingly, automated metadata capture is becoming a crucial feature in records management environments. Metadata are the machine- and human-readable assertions about information resources that allow for physical, intellectual and technical control over those resources. Users create and attach, and then maintain and preserve metadata, either automatically and/or manually, when maintaining their digital records, documents, and data. These metadata may be technical, administrative, or descriptive. They codify and track the identity and integrity of the material over time and across technological change. However, in cloud environments, technological change no longer means simply refreshing the material or migrating it to new media under the control of the creator of the material. When these records are entrusted to cloud systems, this creator-generated metadata, remaining inextricably linked to the records, are also stored, and CSPs assume control of the material. Within this new environment, these user records will acquire additional metadata from the CSP that will be indicative of a number of important elements, including, but not limited to, storage locations, access controls, security or protection measures, failed or successful manipulations or breaches, etc. CSPs may also outsource some components of their services to other third parties, who may also generate service metadata that provide assertions about the maintenance and handling of the material, and about their own actions taken in the course of handling the material. While these metadata are linked to the users’ records, much of it remains proprietary to the provider and not the user. Consequently, proprietary CSP metadata present a sort of event horizon, beyond which the ability to establish an unbroken chain of custody is lost to the owner of the records. CSPs remain reluctant to share information about the cloud environment itself, the movements of a client’s data within the system, and when the provider (or its contracted third parties) might have access to the data. Additionally, the network of third-party subcontracts employed by a provider may make it impossible for them to know such information. Nevertheless, these metadata remain invaluable to the user in assessing and ensuring the accuracy, reliability, and integrity of the material over the whole service lifecycle (Castro-Leon et al. 2013, Smit et al. 2012). Is there a way in which a balance might be struck between a provider’s desire to protect the confidentiality of their business processes and trade secrets, and a client’s need to ensure trustworthy records in the cloud? Much of the reluctance to engage cloud services might be mitigated by transparent and standardized metadata that is collected, managed, and then shared with users by CSPs Castro-Leon et al. 2013). The RiC project has already begun initial qualitative research with both vendors/providers and users of cloud services; this has been achieved through the design of semi-structured interviews. Approximately eight interviews have been conducted to date with representatives of various cloud service providers, based in North America, Europe and China. While this phase of the research is still within its early stages, the results from the interviews conducted to date have been twofold. First, they have confirmed the trends identified in both current cloud computing-related literature, as well as those deduced from a cloud user survey distributed by RiC researchers earlier in 2013. The survey identified the major concerns of users who have expressed reluctance of entrusting their records to the cloud: loss of control of their data, potential legal implications (i.e. e-discovery, legal holds, data storage in international jurisdictions), and risks to the protection and security of their information and data are among the most reoccurring user concerns (RiC, 2013). However, these concerns are not necessarily the result of a lack of security measures employed by CSPs. Whereas the provider interview questionnaires were designed in a way to highlight the presumed shortcomings of cloud design and provider services, the results were quite opposite of what researchers had expected to find.

167

Dan Gillean, Valerie Leveillé and Corinne Rogers Research revealed that cloud providers were often aware of and had adopted preventative measures to appease their clients’ concerns regarding issues of security and access control in the cloud environment; these measures included conducting background checks on employees who regularly accessed clients’ materials in the cloud (Pan, 2013; Mawjee, 2013), requiring a multi-step authentication process for CSP employees accessing records in the cloud (Pan, 2013), providing clients with up-to-date and transparent access control lists (Bäriswil, 2013) or allowing clients to maintain their own (Croisier, 2013), allowing clients the opportunity to conduct an audit of the provider (Bäriswil, 2013) or supplying clients with the results of a recent audit (Mawjee, 2013; Gupta, 2013), and, in some cases, allowing the client to even choose the storage location of their data (Bäriswil, 2013; Croisier, 2013). However, the results of provider interviews also revealed important gaps; while it may be true that these CSPs were adopting such measures to address client concerns with regards to security and access, the actual proof of such procedures remained almost entirely imbedded within the metadata, a topic that was rarely mentioned, and the importance of which only became clear to researchers following further research on the topic.

3. Next Steps: Developing a Standardized Framework for the Capture of Cloud Metadata Recent research conducted on cloud environments shows some promising developments that will impact the direction of this research. Castro-Leon et al. (2012), in outlining advances made in automated metadata capture and machine-to-machine exchanges, explore the concept of service metadata exchange, “the standardize exchange of information describing a service between a potential service provider and subscriber during service setup and operations.” Similarly, Smit et al. (2012) have developed a “metadata service listing the available cloud services, their properties, and some basic cross-cloud metrics for comparing instances,” organizing this into provider-level metadata, resource-level metadata, and resource-level metrics. These efforts suggest that automated capture of relevant cloud metadata is possible even from outside the cloud environment. While Smit et al. intend to gather and make accessible information about the service providers and their services in an agnostic and standardized manner, a similar methodology can also be applied to the concept modeling service metadata generated by the provider but useful to the consumer for the purposes of assurance of security, reliability, and integrity. Previous research conducted by the InterPARES 2 project at the University of British Columbia led to the creation of the COP Model, which used Integrated Definition Function Modeling to depict “all the activities and the inputs and outputs that are needed to create, manage and preserve reliable and authentic digital records” (Eastwood et al., 2008). Since then, Tennis & Rogers (2012) have created a metadata application profile that captures the necessary and sufficient information needed to satisfy the requirements of the InterPARES COP Model. Future research will thus seek to explore the potential for this metadata schema to be integrated as an abstracted layer to existing cloud computing environments via an API that could interact with existing provider metrics systems to automatically capture information required by the InterPARES model for maintaining authentic records over time. However, such an approach would require support and participation from vendors and providers and would need to balance the concerns around protecting business-sensitive information with the client need for system accountability and authentic recordkeeping practices. The next step in this research will therefore seek out vendor and provider input and feedback regarding the potential integration of a cloud metadata schema, specifically with regards to the implications that such an application could have on these CSPs and their system design. Questions will explore which metadata CSPs would or would not be willing to provide to clients if requested; which metadata generated by the CSP are proprietary; whether clients have expressed any previous concern regarding the transfer of metadata; whether integrity metadata, generated within the cloud, is addressed in Service Level Agreements that are drafted between the CSP and the client; and whether CSPs would be willing to adopt a metadata schema application that would automatically provide clients with this metadata. From here, RiC can begin to draft the framework of an application protocol that would allow for the maintenance of authentic, and therefore trustworthy records over space and time. The soaring popularity of cloud computing technologies has given rise to major concerns for users, specifically with regards to data security, legal implications, breach of privacy and access control capabilities. As this paper has shown, both the user surveys and provider interviews conducted to date by the RiC project have helped confirm these concerns. Additional issues that have arisen have had to do with access to resources, records and information management issues, including guaranteeing the integrity and authenticity of records entrusted to the cloud, and protection of privacy (Bushey, 2013). Many of the issues that arise around

168

Dan Gillean, Valerie Leveillé and Corinne Rogers managing records in the cloud – i.e. access, security, chain of custody, legal and e-discovery capabilities, etc. – are either helped or hindered by the metadata that is produced and attached to the records while they are within a specific cloud infrastructure. They are helped, because within the metadata lie the answers to the main concerns that clients have about the integrity of their records; they are hindered in the event that the rights and ownership of that metadata are maintained by the CSP and therefore become unavailable to the clients in the event of e-discovery, data migration or departure from cloud altogether. The truth – and therefore the trust – lies in the details.

References Bäriswil, S. 2013. Unpublished research interview with Basma Shabou, Records in the Cloud. [July 12, 2013]. Bushey, Jessica. 2013. Records in the Cloud. Presentaion at the International Council on Archives Section on University and Research Institution Archives, Barbados, June 2013. http://www.icasuv2013.com/. Castro-Leon, Enrique, Mrigank Shekhar, John M. Kenney, Jerry Wheeler, Robert R. Harmon, Javier Martinez Elicegui, and Raghu Yeluri. “On the Concept of Metadata Exchange in Cloud Services.” Intel Technology Journal 16 no. 4 (2012): 5877 Croisier, S. 2013. Unpublished research interview with Basma Shabou, Records in the Cloud. [July 12, 2013]. Duff, Wendy, Ensuring the Preservation of Reliable Evidence: A Research Project Funded by the NHPRC. Archivaria 42 (Fall 1996), p. 28–45. Duff, Wendy, Harnessing the Power of Warrant. American Archivist 61:1 (Spring 1998), p. 88–105. Eastwood, Terry, Hans Hofman and Randy Preston. “Part Five—Modeling Digital Records Creation, Maintenance and Preservation: Modeling Cross-domain Task Force Report,” [electronic version]. In International Research on Permanent Authentic Records in Electronic Systems (InterPARES) 2: Experiential, Interactive and Dynamic Records, Luciana Duranti and Randy Preston, eds. (Rome, Italy: Associazione Nazionale Archivistica Italiana, 2008). Accessed September 1, 2013. http://www.interpares.org/display_file.cfm?doc=ip2_book_part_5_modeling_task_force.pdf Gupta, S. 2013. Unpublished research interview with Gabriela Andaur-Gomez, Georgia Barlaoura, and Rachel Pan, Records in the Cloud. February 22, 2013. Jie, L. 2013. Unpublished research interview Vivian Zhang, Records in the Cloud. March 23, 2013. Jun, Y. 2013. Unpublished research interview Vivian Zhang, Records in the Cloud. March 25, 2013. Li, Y. 2013. Unpublished research interview Gabriela Andaur, Weimei Pan & Joy Rowe, Records in the Cloud. March 10, 2013. MacNeil, Heather, Trusting Records in a Post-Modern World [web page]. Institute for Advanced Technology in the Humanities, University of Virginia, May 2002. Accessed September 1, 2013. http://www2.iath.virginia.edu/sds/macneil_text.htm. Mawjee, R. 2013. Unpublished research interview with Basma Shabou, Records in the Cloud. July 5, 2013. Pan, H. 2013. Unpublished research interview with Gabriella Andaur, Georgia Barloura, Justin Brecese, Luciana Duranti, Valerie Léveillé, and Casey Rogers, Records in the Cloud. March 28, 2013. Smit, Michael, Przemyslaw Pawluk, Bradley Simmons, and Marin Litoiu. "A web service for cloud metadata." In Services (SERVICES), 2012 IEEE Eighth World Congress on, pp. 361-368. IEEE, 2012. Accessed September 2, 2013. http://api.cloudymetrics.com/servicescup.pdf Tennis, J. T., & Rogers, C. (2012). Authenticity Metadata and the IPAM: Progress toward the InterPARES Application Profile. In Proceedings of the International Conference on Dublin Core and Metadata Applications (pp. 38–45). Presented at the DCMI International Conference on Dublin Core and metadata Applications, Kuching, Sarawak, Malaysia: DCMI. Retrieved from http://dcevents.dublincore.org/index.php/IntConf/dc-2012/schedConf/presentations

169

2nd International Conference on Cloud Security y Management g Defence Academy of the United Kingdom University of Cranfield Shrivenham, UK

info@academic-conferences.org +44-(0)-118-972-4148