Instant download Big data management and analytics brij b gupta & mamta pdf all chapter

Page 1


https://ebookmass.com/product/big-data-management-and-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Convergence of Cloud with AI for Big Data Analytics: Foundations and Innovation 1st Edition Danda B. Rawat

https://ebookmass.com/product/convergence-of-cloud-with-ai-for-bigdata-analytics-foundations-and-innovation-1st-edition-danda-b-rawat/

ebookmass.com

Data Science in Theory and Practice: Techniques for Big Data Analytics and Complex Data Sets Maria C. Mariani

https://ebookmass.com/product/data-science-in-theory-and-practicetechniques-for-big-data-analytics-and-complex-data-sets-maria-cmariani/

ebookmass.com

BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and Machine-Learning Raj Kamal

https://ebookmass.com/product/big-data-analytics-introduction-tohadoop-spark-and-machine-learning-raj-kamal/

ebookmass.com

As Regras de Ouro Napoleon Hill

https://ebookmass.com/product/as-regras-de-ouro-napoleon-hill/

ebookmass.com

Modern

Matt Taddy

https://ebookmass.com/product/modern-business-analytics-practicaldata-science-for-decision-making-matt-taddy/

ebookmass.com

The Roots of Modern Psychology and Law: A Narrative History Thomas Grisso

https://ebookmass.com/product/the-roots-of-modern-psychology-and-lawa-narrative-history-thomas-grisso/

ebookmass.com

Alternative Approaches in Conflict Resolution 1st Edition Martin Leiner

https://ebookmass.com/product/alternative-approaches-in-conflictresolution-1st-edition-martin-leiner/

ebookmass.com

Lippincott Illustrated Reviews: Biochemistry (Lippincott Illustrated Reviews Series) 7th Edition, (Ebook PDF)

https://ebookmass.com/product/lippincott-illustrated-reviewsbiochemistry-lippincott-illustrated-reviews-series-7th-edition-ebookpdf/ ebookmass.com

The Philosophical Journey: An Interactive Approach 7th Edition William Lawhead

https://ebookmass.com/product/the-philosophical-journey-aninteractive-approach-7th-edition-william-lawhead/

ebookmass.com

How to Get Your PhD : A Handbook for the Journey 1st

https://ebookmass.com/product/how-to-get-your-phd-a-handbook-for-thejourney-1st-edition-gavin-brown/

ebookmass.com

World Scientific Series on Future Computing Paradigms

and Applications

Series Editor: Brij B Gupta (Asia University, Taichung, Taiwan)

Published:

Vol. 1 Big Data Management and Analytics by Brij B Gupta and Mamta

Published by

World Scientific Publishing Co. Pte. Ltd.

5 Toh Tuck Link, Singapore 596224

USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601

UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data

Names: Gupta, Brij, 1982– author. | Mamta, author.

Title: Big data management and analytics / Brij B Gupta, Asia University, Taiwan, Mamta, Thapar Institute of Engineering and Technology, India.

Description: New Jersey : World Scientific, [2024] |

Series: World Scientific series on future computing paradigms and applications ; vol. 1 | Includes bibliographical references and index.

Identifiers: LCCN 2023017153 | ISBN 9789811257117 (hardcover) | ISBN 9789811257124 (ebook for institutions) | ISBN 9789811257131 (ebook for individuals)

Subjects: LCSH: Big data. | Database management. | Data mining.

Classification: LCC QA76.9.B45 G86 2024 | DDC 005.7--dc23/eng/20230601

LC record available at https://lccn.loc.gov/2023017153

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

Copyright © 2024 by World Scientific Publishing Co. Pte. Ltd.

All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

For any available supplementary material, please visit https://www.worldscientific.com/worldscibooks/10.1142/12869#t=suppl

Desk Editors: Sanjay Varadharajan/Amanda Yun

Typeset by Stallion Press

Email: enquiries@stallionpress.com

Printed in Singapore

Dedicated to my parents and family for their constant support during the course of this book.

Dedicated to my parents, beloved husband, my son and my mentor for their motivation throughout the journey of completion of this book.

This page intentionally left blank

Foreword

In the dynamic and ever-changing world of data management and analytics, Big Data Management and Analytics stands out as a prominent source of lucidity and a vast reservoir of expertise. In the current epoch of digital transformation, it is of utmost importance to comprehend the fundamental principles and complexities associated with big data. This book serves as an invaluable resource, offering the necessary guidance to attain this objective.

Within the domain of big data, characterized by an overwhelming abundance of information in terms of volume, velocity, and variety, this literary work emerges as a beacon of guidance, catering to a wide-ranging audience. This work is expected to provide significant benefits to a wide range of individuals, including students, data analysts, data scientists, and IT professionals. What is particularly noteworthy about this is that it operates under the assumption that the user possesses no prior understanding or expertise in the field of big data management and analytics. As a result, it is designed to be easily understood and utilized by individuals at all levels of proficiency, including those who are new to the subject matter as well as those who have extensive experience in the field.

The book’s structure is characterized by the inclusion of 10 carefully constructed chapters, each designed to guide readers through a comprehensive journey. The initial chapter serves as an introduction to big data, providing readers with a foundational understanding of the subject matter. This introductory chapter serves the purpose of elucidating the concept of big data, aiming to dispel any misconceptions or uncertainties surrounding it. Additionally, it delves into the diverse origins of big data, its

viii Big Data Management and Analytics

distinctive attributes, and the various challenges that arise in relation to its effective management and analysis. Furthermore, this provides valuable insight into the practical implementations of big data in various industries and sectors.

The subsequent chapters of this research work delve into a comprehensive examination of Big Data Management and Modelling, Big Data Processing, Big Data Analytics and Machine Learning, and Big Data Analytics Through Visualization. The aforementioned chapters not only explicate the theoretical concepts but also offer valuable practical perspectives on effectively tackling the challenges that manifest within these specific domains.

The chapter titled “Taming Big Data with Spark 2.0” provides an indepth exploration of Apache Spark, emphasizing its architectural design and constituent elements. This study aims to provide a comprehensive understanding of the significance of Resilient Distributed Datasets (RDDs) and their utilization in the processing and analysis of large-scale datasets, commonly referred to as Big Data. The acquisition of this knowledge is of utmost significance in contemporary society where the utilization of real-time analytics and processing holds great prominence.

The chapter titled “ Managing Big Data in Cloud Storage” provides a thorough examination of data storage systems, specifically focusing on Hadoop Distributed File System (HDFS) and cloud storage. This chapter aims to equip readers with the necessary understanding to effectively navigate the complexities associated with storing data in the cloud.

The book’s utility is further expanded through an in-depth exploration of specific industry applications, which serves as a testament to its practical significance in real-world contexts. The chapters titled “Big Data in Healthcare” and “Big Data in Finance” offer valuable insights into the significant impact of big data within these respective industries. The following topics will be explored in this text: personalized medicine, clinical decision support, population health management, fraud detection, risk management, trading strategies, and customer segmentation.

The concluding chapter, titled “Enabling Tools and Technologies for Big Data Analytics,” presents a comprehensive collection of essential tools and technologies that are imperative for individuals aiming to leverage the complete capabilities of big data.

The publication accommodates a wide range of readers, rendering it appropriate for individuals with diverse educational and occupational

backgrounds. The provided resource employs a systematic methodology to ensure that individuals acquire a comprehensive comprehension of the big data ecosystem, starting from its foundational elements and progressing incrementally. In addition, the book presents various real-world application scenarios that serve to provide readers with the requisite skills and knowledge required to effectively tackle the complexities associated with the management and analysis of big data.

This book will be regarded as an invaluable resource by publishers and industry professionals alike. In the current era characterized by the widespread adoption of data-driven decision-making practices, the content presented in this book serves as a valuable resource for organizations seeking to harness the full potential of their data. By leveraging the insights and methodologies outlined within, organizations can effectively enhance their decision-making processes, leading to improved performance outcomes and ultimately gaining a competitive edge in their respective industries.

In summary, Big Data Management and Analytics is a highly valuable resource that merits inclusion in the personal libraries of individuals with a keen interest in the field of data management and analytics. Presented before us is a remarkable occasion to commence a voyage that shall profoundly influence our comprehension of data and unlock avenues to prospects that were previously unattainable. The commencement of the expedition is now underway.

Professor, School of Artificial Intelligence, Guilin University of Electronic Technology, 510004, China

Department of Electrical Engineering, University of Chile, 8370451, Chile

This page intentionally left blank

Preface

In today’s digital age, data has become an integral part of our lives, both personally and professionally. The amount of data generated every day is massive, and traditional methods of data management and analysis are no longer sufficient to handle this volume of information. This is where big data management and analytics come into play. Big data management refers to the efficient handling, storage, and processing of large and complex datasets, while big data analytics involves the use of advanced analytical techniques to extract valuable insights and patterns from this data. Together, they have the potential to transform businesses and industries, enabling organizations to make data-driven decisions and gain a competitive edge.

This book on Big Data Management and Analytics aims to provide a comprehensive and practical guide to the principles, technologies, and tools involved in the effective management and analysis of big data. It covers a range of topics, including data modeling, data storage, data processing, data analysis, and data visualization, including the relevant applications and use cases.

This book is designed for a broad audience, including students, data analysts, data scientists, and IT professionals. It assumes no prior knowledge of big data management and analytics and provides a step-by-step approach to learning the key concepts and techniques involved. By the end of the book, readers will have a thorough understanding of the big data ecosystem and be equipped with the skills and knowledge necessary to effectively manage and analyze big data in real-world scenarios.

xii Big Data Management and Analytics

This book contains 10 chapters, with each chapter covering different aspects of big data management and analytics. We provide a detailed overview of the topics covered by each chapter in the following:

Chapter 1: Introduction to Big Data — This chapter introduces the concept of big data and discusses various sources of big data, its characteristics, and the challenges associated with its management and analysis. It also provides an overview of the popular use cases of big data.

Chapter 2: Big Data Management and Modeling — This chapter provides a comprehensive overview of the various techniques and strategies used to manage and model large datasets. Further, this chapter covers the challenges associated with managing and modeling big data.

Chapter 3: Big Data Processing — This chapter provides an overview of big data processing, including its definition, characteristics, and challenges. It covers the various requirements for big data processing and discusses in detail the big data processing pipeline. Finally, it discusses the role of Splunk and Datameer in big data processing.

Chapter 4: Big Data Analytics and Machine Learning — This chapter covers the process of collecting and preprocessing data for big data analytics. It further covers data transformation techniques and data quality issues that need to be addressed before analysis. It also provides an overview of the role of machine learning in big data analytics and the challenges of implementing machine learning on big data, including scalability and data preprocessing.

Chapter 5: Big Data Analytics Through Visualization — This chapter discusses the role of visualization in big data analytics. It covers the various types of visualizations that can be used to explore and analyze big data with special attention to graph analytics.

Chapter 6: Taming Big Data with Spark 2.0 — This chapter provides an overview of Apache Spark. It discusses the key features and benefits of Spark 2.0, such as the architecture of Spark 2.0, including its components, such as Spark Core, Spark SQL, Spark Streaming, and MLlib. It discusses the role of resilient distributed datasets (RDDs) in Spark 2.0 and how they can be used to process and analyze big data.

Preface xiii

Chapter 7: Managing Big Data in Cloud Storage — This chapter focuses on the storage and retrieval of big data. It covers the different types of data storage systems, such as Hadoop distributed file system (HDFS), Hue, and cloud storage.

Chapter 8: Big Data in Healthcare — This chapter provides a comprehensive introduction to the role of big data in healthcare. It further discusses the architectural framework of big data in healthcare. Finally, it covers some common applications of big data in healthcare, such as personalized medicine, clinical decision support, and population health management.

Chapter 9: Big Data in Finance — This chapter discusses the role of big data in finance and how big data is transforming the financial sector. It covers the various sources of financial data, such as transaction data, market data, social media data, and news feeds, and throws light on the challenges associated with processing and analyzing such data. The chapter also details some common applications of big data in finance, such as fraud detection, risk management, trading strategies, and customer segmentation.

Chapter 10: Enabling Tools and Technologies for Big Data Analytics — This chapter provides a comprehensive introduction to various tools and technologies that enable big data analytics. It covers the tools and technologies for big data management, modeling, integration, processing, analytics and visualization.

This page intentionally left blank

About the Authors

Brij B. Gupta is the Director of the International Center for AI and Cyber Security Research and Innovations and a Distinguished Professor at the Department of Computer Science and Information Engineering (CSIE), Asia University, Taiwan. In more than 17 years of his professional experience, he has published over 500 papers in journals/conferences, including 35 books and 12 Patents with over 25,000 citations. He has received numerous national and international awards, including the Canadian Commonwealth Scholarship (2009), the Faculty Research Fellowship Award (2017), the Visvesvaraya Young Faculty Research Fellowship Award from the Indian Ministry of Electronics and Information Technology, Government of India, the IEEE GCCE outstanding and Women in Engineering (WIE) paper awards, and National Institute of Technology Kurukshetra, India’s Best Faculty Award (2018 and 2019), respectively. Prof. Gupta was selected for Clarivate’s Web of Science Highly Cited Researchers in Computer Science (top 0.1% researchers in the world) consecutively in 2022 and 2023. He was also listed in Stanford University’s ranking of the world’s top 2% of scientists in 2020, 2021, 2022, and 2023.

Dr Gupta is also a visiting/adjunct professor with several universities worldwide and an IEEE Senior Member (2017). He was selected as the 2021 Distinguished Lecturer in IEEE Consumer Technology Society (CTSoc). Dr Gupta is also serving as a Member-at-Large of IEEE

Consumer Technology Society’s Board of Governors (2022–2024). He is also the Editor-in-Chief of the International Journal on Semantic Web and Information Systems, International Journal of Software Science and Computational Intelligence, Sustainable Technology and Entrepreneurship, and International Journal of Cloud Applications and Computing, the Lead Editor of a Book Series with CRC and IET Press, and the Associate/ Guest Editor of various journals and transactions. He also served as a member of the technical program committee of more than 150 international conferences. His research interests include information security, cyber physical systems, cloud computing, blockchain technologies, intrusion detection, AI, social media, and networking.

Mamta received her BTech (with honors) in computer engineering in 2011 from the University Institute of Engineering and Technology, Kurukshetra University, India, and her MTech in computer engineering (with distinction) in 2014 from the National Institute of Technology, Kurukshetra, India. She received her PhD in computer engineering from the National Institute of Technology, Kurukshetra, India, in 2020. At present, Dr. Mamta is an assistant professor in the Department of Computer Science and Engineering at Punjab Engineering College, Chandigarh, India. Her research interests include applied cryptography, searchable encryption, information security, big data security, and cloud computing. She has published several research articles with various reputed publishers, including IEEE, Springer, and Wiley.

Acknowledgments

First and foremost, we would like to express our deepest and most sincere gratitude to the Almighty for granting us the strength and guidance to complete this book. We would like to thank our families for their unwavering support and encouragement. We are also grateful to our friends and colleagues who have provided invaluable feedback and helped us refine our ideas.

Further, our deepest appreciation goes to the team at World Scientific Publishing, who believed in this project and worked tirelessly to bring it to fruition.

Finally, we would like to acknowledge the countless individuals who have shared their knowledge and expertise with us through interviews, research, and personal interactions. Their insights have been invaluable in shaping this book and we are deeply grateful for their contributions.

We thank all for the support and encouragement, without which this book would not have been possible.

This page intentionally left blank

6.3.4

7.1

7.1.1

7.2

7.2.1

7.2.2

7.2.3

7.2.4

7.3

8.2

8.2.1

8.2.2

8.2.3

8.3

8.6

Chapter

9.1

This page intentionally left blank

List of Figures

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.