The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be used on even the largest datasets.
Anand Rajaraman is a serial entrepreneur, venture capitalist, and academic based in Silicon Valley. He has founded two successful start-ups: Junglee, acquired by Amazon.com and Kosmix, acquired by Walmart. As a Founding Partner of two early-stage venture capital firms, Milliways Labs and Cambrian Ventures, he has been the earliest investor in many successful companies. Anand was, until recently, Senior Vice President at Walmart Global eCommerce and co-head of @WalmartLabs, where he worked at the intersection of social, mobile, and commerce. As an academic, Anand’s research has focused at the intersection of database systems, the World Wide Web, and social media. His research publications have won several awards at prestigious academic conferences, including two retrospective 10-year Best Paper awards at ACM SIGMOD and VLDB. He is also a co-inventor of Amazon Mechanical Turk, which pioneered the concept of crowdsourcing. You can follow Anand on Twitter at @anand_raj.
General computer science 1 Statistical theory and methods 2 Knowledge management, databases, datamining 3 Pattern recognition and machine learning 6 Algorithmics, complexity, computer algebra and computational geometry 6 Cryptography, cryptology and coding 7 Discrete mathematics, information theory and coding theory 8 Probability theory and stochastic processes 9 Control systems and optimization 10 Optimization, OR and risk analysis 11 Computer hardware, architecture and systems 13 Communications and signal processing 13 Wireless communications 16 Artificial intelligence and natural language processing 18 Social, educational and philosophical aspects of computing, e-publishing, HCI 20 Computer graphics, image processing, robotics and computer vision 22 Engineering design, kinematics, and T robotics 23 Image processing and machine vision 24 Programming languages and applied logic 24 Application development and software engineering 25 Logic, categories and sets 26 Computational biology and bioinformatics 26 Computational science 28 Numerical analysis 28 Also of interest 29 Information on related journals Inside back cover This second edition includes new and extended coverage on social networks, machine learning and dimensionality reduction. Written by leading authorities in database and web technologies, it is essential reading for students and practitioners alike.
Jeffrey David Ullman is the Stanford W. Ascherman Professor of Computer Science (Emeritus) and he is currently the CEO of Gradiance. His research interests include database theory, data mining, and education using the information infrastructure. He is one of the founders of the field of database theory, and was the doctoral advisor of an entire generation of students who later became leading database theorists in their own right. Recent awards include the Knuth Prize (2000), and the Sigmod E. F. Codd Innovations award (2006). Ullman is also the co-recipient (with John Hopcroft) of the 2010 IEEE John von Neumann Medal, for “laying the foundations for the fields of automata and language theory and many seminal contributions to theoretical computer science.”
Mining of Massive Datasets
It begins with a discussion of the mapreduce framework, an important tool for parallelizing algorithms automatically. The tricks of locality-sensitive hashing are explained. This body of knowledge, which deserves to be more widely known, is essential when seeking similar objects in a very large collection without having to compare each pair of objects. Stream processing algorithms for mining data that arrives too fast for exhaustive processing are also explained. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering, each from the point of view that the data is too large to fit in main memory, and two applications: recommendation systems and Web advertising, each vital in e-commerce.
Jure Leskovec is Assistant Professor of Computer Science at Stanford University. His research focuses on mining large social and information networks. Problems he investigates are motivated by large scale data, the Web and online media. This research has won several awards including a Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, Okawa Foundation Fellowship, and numerous best paper awards. Leskovec has also authored the Stanford Network Analysis Platform (SNAP, http:// snap.stanford.edu), a general purpose network analysis and graph mining library that easily scales to massive networks with hundreds of millions of nodes and billions of edges. You can follow him on Twitter @jure.
see page 4
Leskovec Rajaraman Ullman
9781107077232 Leskovec, Rajaraman & Ullman PPC C M Y K
Contents
Jure Leskovec Anand Rajaraman Jeffrey David Ullman
Mining of Massive Datasets SECOND EDITION
Cover illustration: © S. Ullman 2011
Based on successful courses taught by the authors, and liberally sprinkled with examples and exercises, this comprehensive textbook describes not only the theoretical issues underlying the Semantic Web, but also algorithms, optimisation ideas and implementation details. The book will therefore be valuable to practitioners as well as students, indeed to anyone who is interested in Internet technology, knowledge engineering or description logics.
•
The first comprehensive textbook on the Semantic Web designed for higher education courses
•
Includes a full introduction to description logics, the formalism underlying the Semantic Web
•
Contains numerous exercises to aid teaching and learning
•
Further resources are available from the author’s website at http://www.swexpld.org
The Semantic Web Explained
Supplementary materials available online include the source code of program examples and solutions to selected exercises.
Szeredi, Lukácsy and Benko˝
9780521700368 Szeredi, Lukacsy & Benko PB c m Y K
The Semantic Web is a new area of research and development in the field of computer science, aimed at making it easier for computers to process the huge amount of information on the web, and indeed other large databases, by enabling them not only to read but also to understand the information.
Péter Szeredi Gergely Lukácsy Tamás Benko˝
see page 4
tHe
ExplainEd The Technology and Mathematics behind Web 3.0
Cover designed by Zoe Naylor
• Covers both core methods and cutting-edge research • Algorithmic approach with open-source implementations • Minimal prerequisites, as all key mathematical concepts are presented, as is the intuition behind the formulas • Short, self-contained chapters with class-tested examples and exercises that allow for flexibility in designing a course and for easy reference • Supplementary online resource containing lecture slides, videos, project ideas, and more
“The authors are world-class experts that provide encyclopedic coverage of all data mining topics, from basic statistics to fundamental methods (clustering, classification, frequent item sets) to advanced methods (svd, svm, kernels, spectral graph theory). For each concept, this book thoughtfully balances the intuition, the arithmetic examples, and the rigorous math details. It can serve as both a textbook and a reference book.” – Professor Christos Faloutsos, Carnegie Mellon University, and winner of the ACM SIGKDD Innovation Award “This book by Mohammed Zaki and Wagner Meira Jr. is a great option for teaching a course in data mining or data science. It covers both fundamental and advanced data mining topics, emphasizing the mathematical foundations and the algorithms; it also includes exercises for each chapter and provides data, slides, and other supplementary material on the companion Web site.” – Gregory Piatetsky-Shapiro, Editor, KDnuggets.com, and Founder, ACM SIGKDD
Data Mining and Analysis
Key Features:
ZAKI MEIRA JR.
The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike.
Data Mining and Analysis
see page 4
FUNDAMENTAL CONCEPTS AND ALGORITHMS MOHAMMED J. ZAKI WAGNER MEIRA JR.
Cover image: La Mezquita de Córdoba, Mohammed J. Zaki, 1992 Cover design: Alice Soloway
he growth of social media over the last decade has revolutionized the way individuals interact and industries conduct business. Individuals produce data at an unprecedented rate by interacting, sharing, and consuming content through social media. Understanding and processing this new type of data to glean actionable patterns presents challenges and opportunities for interdisciplinary research, novel algorithms, and tool development. Social Media Mining integrates social media, social network analysis, and data mining to provide a convenient and coherent platform for students, practitioners, researchers, and project managers to understand the basics and potentials of social media mining. It introduces the unique problems arising from social media data and presents fundamental concepts, emerging issues, and effective algorithms for network analysis and data mining. Suitable for use in advanced undergraduate and beginning graduate courses as well as professional short courses, the text contains exercises of different degrees of difficulty that improve understanding and help apply concepts, principles, and methods in various scenarios of social media mining.
SOCIAL MEDIA MINING
“This is a delightful exploration of a multi-disciplinary field in its simple and straightforward style. Social Media Mining introduces and connects underlying concepts with clarity and enables you to explore this amazing field further with confidence.” – Philip Yu, University of Illinois at Chicago
Zafarani Abbasi • Liu
“This is an exceptionally well-constructed book on social media that will be useful to academia and industry alike. The book covers the entire area of social network analysis in a comprehensive and understandable way.” – Charu Aggarwal, IBM T. J. Watson Research Center
SOCIAL MEDIA MINING
see page 5
An Introduction
Reza Zafarani Mohammad Ali Abbasi Huan Liu
Cover Image: Shutterstock / Jozsef Bagota Cover design by Alice Soloway
MULTIMEDIA COMPUTING
Gerald Friedland Ramesh Jain
see page 22