Data Mining and Data Warehousing
Principles and Practical Techniques
Parteek Bhatia
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, vic 3207, Australia
314 to 321, 3rd Floor, Plot No.3, Splendor Forum, Jasola District Centre, New Delhi 110025, India
79 Anson Road, #06–04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781108727747
© Cambridge University Press 2019
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.
First published 2019
Printed in India
A catalogue record for this publication is available from the British Library ISBN 978-1-108-72774-7 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
To my parents, Mr Ved Kumar and Mrs Jagdish Bhatia my supportive wife, Dr Sanmeet Kaur loving sons, Rahat and Rishan
6.
5.7
5.8
5.6.6
5.6.7
5.6.8
5.6.9
5.6.10
5.7.1 Applying Naïve Bayes classifier to the ‘Whether Play’ dataset
5.7.2
5.8.1
5.8.4
5.8.5
5.8.6
5.8.7
5.8.8
5.8.9
6.1.5
6.1.6
6.2
6.3
6.4
6.5
7.
7.6
7.6.1.
7.6.2
7.6.3
7.7
7.7.2
7.7.3
7.7.4
7.7.5
7.7.6
8.
8.2
8.4
8.4.1
8.4.3
8.5
8.6
9.1
9.2
9.3
9.4
8.6.1
9.5
9.6
9.6.1
9.7
9.9
15.3
15.4
15.3.1
15.3.2
3.19
3.25
3.26
3.27
3.28
3.29
3.30
3.31
4.1
5.1
5.2
5.14
5.15
5.16
5.17
5.18
5.22
5.23
5.24
5.25 Decision tree after splitting of attribute Y having value ‘1’
5.26 Decision tree after splitting of attribute Y value ‘0’
5.27 Dataset for play prediction based on given day weather conditions
5.28 Selection of Outlook as root attribute
5.29 Data splitting based on Outlook attribute
5.30 Humidity attribute is selected from dataset of Sunny instances
5.31 Decision tree after spitting data on the Humidity attribute
5.32 Decision tree after analysis of Sunny and Overcast datasets
5.33 Decision tree after analysis of Sunny, Overcast and Rainy datasets
5.34 Final decision tree after analysis of Sunny, Overcast and Rainy datasets
5.35 Prediction of play for unknown instance
5.36 Dataset for play prediction based on a given day’s weather conditions
5.37 Probability of whether play will be held or not on a Sunny day
5.38 Summarization of count calculations of all input attributes
5.39 Probability of play held or not for each value of attribute
5.40 Probability for play ‘Yes’ for an unknown instance
5.41 Probability for play ‘No’ for an unknown instance
5.42 Probability of play not being held when outlook is overcast
5.43 Values of attributes after adding Laplace estimator
5.44 Probability of play held or not for each modified value of attribute
5.45 Attribute values for given example instance
5.46 Confusion matrix for bird classifier
5.47 Confusion matrix for tumor prediction
6.1 Classification using Weka’s decision tree
6.2 Classification of an unknown sample using Weka’s decision tree
6.3 Working of the decision tree
6.4 Loading the iris.arff file
6.5 Selecting Weka J48 algorithm
6.6 Selection of the Weka J48 algorithm
6.7 Selection of percentage split test option
6.8 Saving output predictions
6.9 Original Fisher’s Iris dataset
6.10 Building the decision tree
6.11 Decision tree accuracy statistics
6.12 Visualization of the tree
6.13 Decision tree for the Iris dataset
6.14 Confusion matrix
6.15 Decision tree showing condition for Setosa
6.16 Decision tree showing conditions for Virginica
6.17 Decision tree showing condition for Versicolor
6.18 Rules identified by the decision tree
6.19 Classification of an unknown sample according to decision tree rules
6.20 Size and leaves of the tree
6.21 Selecting dataset file
6.22 Selecting the classifier and setting classifier evaluation options
6.23 Classifier results after processing
6.24 Naïve Bayes classifier with Laplace estimator
6.25 Selecting ArffViewer option
6.26 Opening the arff file
6.27 Selecting and deleting the rows
6.28 Setting the values of the testing record
6.29 Setting class attribute to blank
6.30 Saving the test.arff file
6.31 Opening the weather.nominal.arff file
6.32 Building the Naïve Bayes classifier
6.33 Supplying test dataset for predicting the value of known instance(s)
6.34 Predicting the value of an unknown sample as Play: Yes
6.35 Building J48 on the training dataset
6.36 Predicting the value of an unknown sample as Play: Yes by J48
6.37 Implementation of the decision tree in R
6.38 Decision tree rules in R
6.39 Plotting the decision tree
6.40 Prediction by the
6.41 Summary
6.42 Instances
6.43
6.44
7.1 Characteristics of
7.2 Representation of Euclidean distance
7.3 Representation of Manhattan distance
7.4
7.5
7.8
7.9 Plot of
7.10
7.11 Raw
7.12
7.13
7.14
7.15
7.16
7.17 Hierarchical tree of clustering of students on the basis of
7.18 (a) Distance matrix at m: 0 (b) Data objects split after two
7.19 Concept of neighborhood and MinPts
7.20
7.21 Concept of
7.22
7.23
7.24 Some
8.1
8.4
8.5 Applying the simple k-means algorithm in Weka:
8.6
8.7
8.8
8.9
8.10
8.11
8.14
8.15
8.16
8.17
8.18
8.19
8.20
8.32
8.33
8.34
8.35
9.7
9.8
9.9
9.10
9.14 Process by the Aprioiri algorithm
9.15 Process by the DHP algorithm
9.16 Identifying frequent item pairs and groups by Apriori algorithm
9.17 FP-tree for first transaction, i.e., 100 only
9.18
9.21 The final FP-tree for the example database after
9.22 Illustrating the step-by-step creation of the FP-tree
9.23 Final FP-tree for database given in Table 9.105
9.24 Illustrating the step-by-step creation of the FP-tree
10.1
11.12
12.4
12.5
12.6
12.7
12.8
13.1
13.6
13.7
13.9
15.1
15.4
15.5
15.8
7.4
7.5
7.6
7.7
7.8
7.16
7.17 Final allocation
7.18 Within (intra) cluster and between (inter) clusters distance
7.19 Calculations for within-cluster and between-cluster variance using Euclidean distance
7.20 Chemical composition of wine samples
7.21 Input distance matrix (L = 0 for all the clusters)
7.22 Input distance matrix, with m: 1
7.23 Input distance matrix, with m: 2
7.24 Input distance matrix, with m: 3
7.25 Input distance matrix, with m: 4
7.26 Record of students’
7.27 Distance matrix at m: 0
7.28 Cells involved in C1
7.29 Input distance matrix, with m: 2
7.30 Cells involved in C2
7.31 Input distance matrix, with m: 3
7.32 Cells involved in creating C3
7.33 Input distance matrix, with m: 4
7.34 Cells involved in creating C4
7.35 Input distance matrix, with m: 5
7.36
7.37
7.38
7.39
C5
with m: 6
with m: 7
7.40 Cells involved in creating C7
7.41 Input distance matrix, with m: 8
7.42 Cells involved in creating C8
7.43 Input distance matrix, with m: 9
7.44 Record of students’
7.45
7.46 Distance matrix for cluster C1
7.47 Splitting of cluster C1 into two new clusters of S7 and S8
7.48 Distance matrix for cluster C2
7.49 Splitting of cluster C2 into two new clusters of S3 and S9
7.50 Distance matrix for cluster C4
7.51 Splitting of cluster C4 into two new clusters of S6 and S8
9.6
9.7
9.8 Sale record of
9.9 List of all itemsets and their
9.10 The set of all frequent items
9.11 All possible combinations with nonzero frequencies
9.12 Frequencies of all itemsets with nonzero frequencies
9.13 A simple representation of transactions as an item list
9.14 Horizontal storage representation
9.15 Vertical storage representation
9.16 Frequency of item pairs
9.17 Transactions database
9.18 Candidate one itemsets C1
9.19 Frequent items L1
9.20 Candidate item pairs C2
9.21 Frequent two item pairs L2
9.22 L1 for generation of C2 having only one element in each list
9.23 L2 for generation of C3 (i.e., K=3) having two elements in each list
9.24 L3 for generation of C4 (i.e., K = 4) having three elements in each list
9.25 L1
9.26 L2
9.27 L3
9.28 L1
9.29 Generation of C2
9.30 Generated C2
C1
9.42 Generation of C3
9.43 Pruning of candidate itemset C3
9.44 Pruned C3
9.45 C4
9.46 Pruned C4
9.47
9.48 Frequent 2-itemsets, i.e., L2
The frequent 1-itemset or L1
The 21 candidate 2-itemsets or C2
9.54 Frequency count of candidate 2-itemsets
9.55 The frequent 2-itemsets or L2
9.56 Candidate 3-itemsets or C3
Another random document with no related content on Scribd:
TO BOYS AND GIRLS WHO LIKE GOOD STORIES
W I was a girl, taking many a long journey in the land of storybooks, my favorite stories were of two kinds. One was about boys and girls who lived in the country, spending long, happy days wading in rollicking brooks, riding on fragrant loads of hay, picking blueberries, playing in the great barn, making pets of turtles and field mice and all sorts of creatures. The reason I dearly loved these stories was because, in summer, I was a country girl myself, on the beautiful Massachusetts farm of my great-great-grandfathers. I loved to hear my mother tell stories of her girlhood, about her good times with the boys and girls of the little red schoolhouse, about singing school and cattle show and sugaring off and endless pleasures delightfully unlike those of my own experience. The queer, oldfashioned clothes that we children found, on rainy days in grandmother’s attic, the spinning wheels and candle molds and quilting frames, the quaint cradle, the hair trunk, the “till chest,” the yellowed diary and account books in the brass-handled desk, all made us children feel very close to those bygone days which our elders told about on evenings when nearby uncles and aunts and cousins gathered in grandmother’s sitting room. How small and quiet we children tried to make ourselves those evenings in the sweet summer dark, hoping our parents would forget to say “Bed-time for
the young fry,” and drive us away from their jolly and thrilling reminiscences of old times. Our best-loved story was one about plucky great-grandmother and how she frightened a bear away without a gun. And how we envied our parents when we heard that they had played Indian and early settler in the ruins of the very blockhouse which our forbears and their neighbors had built for refuge from King Philip’s redskins back in the sixteen-seventies.
You see, it was quite natural that stories of old times, both of country and city life, should have a special charm for me.
My other favorite storybooks were about English children. “A Sea Change,” “A York and a Lancaster Rose,” “The Story of a Short Life,” “Merrie England” were among those I read again and again. As for “The Prince and the Pauper,” it would take a volume larger than this one if I were to try to tell you how I felt about that beautiful story.
One of the queer things about grown-ups which boys and girls do not often suspect is that, while we seem so different from you, with our grey heads and bald heads and glasses, our shortness of breath when we run for the train, our strange preference for an easy-chair and a book by the fire rather than the chance to dance all night or to skate all day, inside of us there is something that never grows up. And because a part of us always stays “boy” or “girl,” what we particularly loved when we were children we keep on particularly loving as long as we live.
So, when I am hunting for books for the children’s room shelves of our public libraries, although I try to find those on every kind of interesting subject—because reading in ruts is bad for any one, young or old—I confess to keen delight when I come upon first-rate stories of the sort that were my favorites when I was a girl.
A few years ago an old Brooklyn library was preparing to move from its ancient quarters into a spick-and-span new building. Going over the dusty shelves one day, I found a shabby little book in faded red covers with funny, old-fashioned pictures among the yellowed pages. The title of the book at once caught my fancy, “Memoirs of a London Doll.”
In two seconds I was miles and miles away from the dusty shelves of the prosaic library on the clattering, commonplace Brooklyn street. I was up in the Sprats’ garret room, under the eaves of the dingy tenement on the dusky London street where the Sprat family, father, mother, and three children, ate and slept and worked at their trade of making jointed, wooden dolls. I followed with absorbed interest the fortunes of what must have been the most remarkable doll ever turned out by the Sprats, the one whose first little mother named her “Maria Poppet.”
Maria Poppet was a doll of character who kept her eyes open and who never neglected an opportunity to learn from every event of her varied life; who was not puffed up by association with rank and wealth nor cast down by harrowing experiences; who valued loving hearts above jewels and titles and the glitter and show of fashion.
Maria Poppet had fine gifts as a story-teller too. If she had been required by one of her little mothers’ governesses to write a composition, the task would have offered no difficulties to her. What she saw in the London of nearly a hundred years ago she makes us see—the Twelfth-night customs, the Lord Mayor’s Show, Punch and Judy, the Christmas Pantomime, the Zoo, the life of little Lady Flora, waited upon by governess and maids and powdered footmen, and the lives of the little milliner girls, driven by cruel Aunt Sharpshins from six o’clock in the morning until eight o’clock at night.
How delightfully puzzling are some of the quaint old words Maria Poppet uses. She speaks of the “turbans” worn by the ladies of her day. Did you think turbans belonged to “Arabian Nights” characters only? Maria wore “a frock and trousers” and “stays”—or no, she wore “a small under-bodice of white jean instead of stays”; her frock was made of “lemon-colored merino”; her little mother pattered about the room in “list” shoes. And Maria’s mothers did not go to the dry-goods store nor the grocer’s, nor did they buy pies at the baker’s. They visited the “draper’s” and the “green-grocer’s” and bought “raspberry tarts” at the “pastry-cook’s.” And what do you suppose a “teetotum” is? And a “tinkerum”? If you have a great-grandmother perhaps she can tell you; and she may sing, as did my grandmother, the quaint
old tunes, “They’re all nodding” and “Cherry ripe” and others which Maria heard the London street organ play.
Sometimes boys and girls look rather scornfully upon old-fashioned things. They think that nothing which is not “up to date” can possibly be as fine as modern shows. Well, I, for one, never saw a Fifth Avenue window display—and I love to gaze into Fifth Avenue shops —more dazzling than the pastry-cook’s window on Twelfth-night; nor a more gorgeous parade than the Lord Mayor’s; nor a play more enchanting than the New Grand Christmas Pantomime which the London Doll saw at Drury-Lane Theatre in the “old-fashioned” days of the story. And I do not believe that any American child, visiting one of our enormous, bewildering toy departments at Christmas time, sees treasures more truly satisfying than Lady Flora found in the London toyshops years and years ago.
You will not wonder that when I finished reading Maria Poppet’s most entertaining “Memoirs” I was eager to find copies of her story to place in all our Brooklyn children’s libraries. I searched the shops in vain. The little book had been “out of print” for many years, the booktrade people reported.
So I treasured in my office the shabby copy I had found in the old library, hoping that some day something would happen that would give to all our children the fun of reading the charming story. That something has happened at last.
Miss Seaman, as delighted as I with the London Doll’s “Memoirs,” has persuaded The Macmillan Company to bring out a new edition of the old story; and to make everything as nice as possible the new volume is to be of the same size as the original book and its pages are to be printed in the same type. Only the pictures will be different, and very much prettier than the old pictures. I feel certain that Maria would approve of the new dress in which her story appears.
I expect that when boys glance at the title they will immediately decide that this story cannot possibly be interesting to them. They will miss some good entertainment if they make that mistake. Maria Poppet was no ordinary, coddled baby-girl doll, but a young person
who saw more interesting sights in a short time than falls to the lot of one boy in a thousand.
I am quite sure that whoever reads this story to the last page will close the book eager to read at once “Memoirs of a Country Doll” which Maria hoped would sometime be made public. Alas! I fear no search in the bookshops of London or New York would bring to light that story. I suspect that the Country Doll, like many people we know, was better at telling a story to her friends than in setting it down on paper.
C W H
LIST OF ILLUSTRATIONS