Marketing_Gupta.qxd

1/28/05

1:36 PM

Page i

Applied Statistics for the Six Sigma Green Belt

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page ii

Also Available from ASQ Quality Press: Design of Experiments with MINITAB Paul Mathews Six Sigma for the Shop Floor: A Pocket Guide Roderick A. Munro Six Sigma for the Office: A Pocket Guide Roderick A. Munro Defining and Analyzing a Business Process: A Six Sigma Pocket Guide Jeffrey N. Lowenthal Six Sigma Project Management: A Pocket Guide Jeffrey N. Lowenthal The Six Sigma Journey from Art to Science Larry Walters The Six Sigma Path to Leadership: Observations from the Trenches David H. Treichler Failure Mode and Effect Analysis: FMEA From Theory to Execution, Second Edition D. H. Stamatis Customer Centered Six Sigma: Linking Customers, Process Improvement, and Financial Results Earl Naumann and Steven Hoisington Design for Six Sigma as Strategic Experimentation: Planning, Designing, and Building World-Class Products and Services H.E. Cook To request a complimentary catalog of ASQ Quality Press publications, call 800-248-1946, or visit our Web site at http://qualitypress.asq.org.

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page iii

Applied Statistics for the Six Sigma Green Belt

Bhisham C. Gupta H. Fred Walker

ASQ Quality Press Milwaukee, Wisconsin

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page iv

American Society for Quality, Quality Press, Milwaukee 53203 © 2005 by American Society for Quality All rights reserved. Published 2005 Printed in the United States of America 12 11 10 09 08 07 06 05

5 4 3 2 1

Library of Congress Cataloging-in-Publication Data Gupta, Bhisham C., 1942– Applied statistics for the Six Sigma Green Belt / Bhisham C. Gupta, H. Fred Walker.— 1st ed. p. cm. Includes bibliographical references and index. ISBN 0-87389-642-4 (hardcover : alk. paper) 1. Six sigma (Quality control standard) 2. Production management. 3. Quality control. I. Walker, H. Fred, 1963– II. Title. TS156.G8673 2005 658.4'013—dc22 2004029760 ISBN 0-87389-642-4 No part of this book may be reproduced in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Publisher: William A. Tony Acquisitions Editor: Annemieke Hytinen Project Editor: Paul O’Mara Production Administrator: Randall Benson ASQ Mission: The American Society for Quality advances individual, organizational, and community excellence worldwide through learning, quality improvement, and knowledge exchange. Attention Bookstores, Wholesalers, Schools, and Corporations: ASQ Quality Press books, videotapes, audiotapes, and software are available at quantity discounts with bulk purchases for business, educational, or instructional use. For information, please contact ASQ Quality Press at 800-248-1946, or write to ASQ Quality Press, P.O. Box 3005, Milwaukee, WI 53201-3005. To place orders or to request a free copy of the ASQ Quality Press Publications Catalog, including ASQ membership information, call 800-248-1946. Visit our Web site at www.asq.org or http://qualitypress.asq.org. Printed on acid-free paper

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page v

In loving memory of my parents, Roshan Lal and Sodhan Devi. —Bhisham In loving memory of my father, Carl Ellsworth Walker. —Fred

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page vi

THE NORMAL LAW OF ERROR STANDS OUT IN THE EXPERIENCE OF MANKIND AS ONE OF THE BROADEST GENERALIZATIONS OF NATURAL PHILOSOPHY • IT SERVES AS THE GUIDING INSTRUMENT IN RESEARCHES IN THE PHYSICAL AND SOCIAL SCIENCES AND IN MEDICINE AGRICULTURE AND ENGINEERING • IT IS AN INDISPENSIBLE TOOL FOR THE ANALYSIS AND THE INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION AND EXPERIMENT

—W. J. Youden

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page vii

Contents

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv Chapter 1 Setting the Context for Six Sigma . . . . . . . . . . . . . . . . 1.1 Six Sigma Defined as a Statistical Concept . . . . . . . . . . . . . . . 1.2 Now, Six Sigma Explained as a Statistical Concept . . . . . . . . . 1.3 Six Sigma as a Comprehensive Approach and Methodology for Problem Solving and Process Improvement . . . . . . . . . . . . 1.4 Understanding the Role of the Six Sigma Green Belt as Part of the Bigger Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Converting Data into Useful Information . . . . . . . . . . . . . . . . .

1 1 2

Chapter 2 Getting Started with Statistics. . . . . . . . . . . . . . . . . . . 2.1 What Is Statistics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Populations and Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Classification of Various Types of Data . . . . . . . . . . . . . . . . . . 2.3.1 Nominal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Ordinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Interval Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Ratio Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 9 10 11 12 12 13 13

Chapter 3 Describing Data Graphically . . . . . . . . . . . . . . . . . . . . 3.1 Frequency Distribution Table . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Qualitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Quantitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Graphical Representation of a Data Set . . . . . . . . . . . . . . . . . . 3.2.1 Dot Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Pie Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Histograms and Related Graphs . . . . . . . . . . . . . . . . . . .

15 15 15 18 20 20 22 23 27

vii

3 5 6

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page viii

viii Contents

3.2.5 Line Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.6 Stem and Leaf Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.7 Measure of Association . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4 Describing Data Numerically. . . . . . . . . . . . . . . . . . . . 4.1 Numerical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Measures of Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Coefficient of Variation . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Measures of Central Tendency and Dispersion for Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Empirical Rule (Normal Distribution) . . . . . . . . . . . . . . . . . . . 4.6 Certain Other Measures of Location and Dispersion . . . . . . . . . 4.6.1 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Quartiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.3 Interquartile Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Box Whisker Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Construction of a Box Plot . . . . . . . . . . . . . . . . . . . . . . 4.7.2 How to Use the Box Plot . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5 Probability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Probability and Applied Statistics . . . . . . . . . . . . . . . . . . . . . . 5.2 The Random Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Sample Space, Simple Events, and Events of Random Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Representation of Sample Space and Events Using Diagrams . . 5.4.1 Tree Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Permutation and Combination . . . . . . . . . . . . . . . . . . . . 5.5 Defining Probability Using Relative Frequency . . . . . . . . . . . . 5.6 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45 45 46 46 48 50 52 53 53 55 57 57 58 58 59 60 60 63 63 64 64 66 66 67 71 71 72 73 75 75 77 83 86 88

Chapter 6 Discrete Random Variables and Their Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2 Mean and Standard Deviation of a Discrete Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.2.1 Interpretation of the Mean and the Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page ix

Contents ix

6.3 The Bernoulli Trials and the Binomial Distribution . . . . . . . . . 101 6.3.1 Mean and Standard Deviation of a Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.3.2 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . 102 6.3.3 Binomial Probability Tables . . . . . . . . . . . . . . . . . . . . . . 105 6.4 The Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . 107 6.4.1 Mean and Standard Deviation of a Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.5 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Chapter 7 Continuous Random Variables and Their Probability Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.1 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.2 The Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.2.1 Mean and Standard Deviation of the Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.3 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.3.1 Standard Normal Distribution Table . . . . . . . . . . . . . . . . 123 7.4 The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.4.1 Mean and Standard Deviation of an Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.4.2 Distribution Function F(x) of the Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.5 The Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.5.1 Mean and Variance of the Weibull Distribution . . . . . . . . 133 7.5.2 Distribution Function F(t) of Weibull . . . . . . . . . . . . . . . 133 Chapter 8 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1 Sampling Distribution of Sample Mean . . . . . . . . . . . . . . . . . . 138 8.2 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.2.1 Sampling Distribution of Sample Proportion . . . . . . . . . . 147 8.3 Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.4 The Studentâ€™s t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.5 Snedecorâ€™s F-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.6 The Poisson Approximation to the Binomial Distribution . . . . . 158 8.7 The Normal Approximation to the Binomial Distribution . . . . . 159 Chapter 9 Point and Interval Estimation . . . . . . . . . . . . . . . . . . . 165 9.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.1.1 Properties of Point Estimators . . . . . . . . . . . . . . . . . . . . 167 9.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.2.1 Interpretation of a Confidence Interval . . . . . . . . . . . . . . 172 9.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.3.1 Confidence Interval for Population Mean When the Sample Size Is Large . . . . . . . . . . . . . . . . . . . . . . . . 173 9.3.2 Confidence Interval for Population Mean When the Sample Size Is Small . . . . . . . . . . . . . . . . . . . . . . . . 177 9.4 Confidence Interval for the Difference between Two Population Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page x

x Contents

9.4.1 Large Sample Confidence Interval for the Difference between Two Population Means . . . . . . . . . . 180 9.4.2 Small Sample Confidence Interval for the Difference between Two Population Means . . . . . . . . . . 183 9.5 Confidence Intervals for Population Proportions When Sample Sizes Are Large . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.5.1 Confidence Interval for p the Population Proportion . . . . 188 9.5.2 Confidence Interval for the Difference of Two Population Proportions . . . . . . . . . . . . . . . . . . . . . . 189 9.6 Determination of Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . 192 9.7 Confidence Interval for Population Variances . . . . . . . . . . . . . . 195 9.7.1 Confidence Interval for a Population Variance . . . . . . . . . 196 Chapter 10 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 201 10.1 Basic Concepts of Testing Statistical Hypotheses . . . . . . . . . . . 201 10.2 Testing Statistical Hypotheses about One Population Mean When Sample Size Is Large . . . . . . . . . . . . . . . . . . . . . . . . . . 208 10.2.1 Population Variance Is Known . . . . . . . . . . . . . . . . . . . 208 10.2.2 Population Variance Is Unknown . . . . . . . . . . . . . . . . . 213 10.3 Testing Statistical Hypotheses about the Difference Between Two Population Means When the Sample Sizes Are Large . . . . 216 10.3.1 Population Variances Are Known . . . . . . . . . . . . . . . . . 216 10.3.2 Population Variances Are Unknown . . . . . . . . . . . . . . . 219 10.4 Testing Statistical Hypotheses about One Population Mean When Sample Size Is Small . . . . . . . . . . . . . . . . . . . . . . . . . . 222 10.4.1 Population Variance Is Known . . . . . . . . . . . . . . . . . . . 223 10.4.2 Population Variance Is Unknown . . . . . . . . . . . . . . . . . 226 10.5 Testing Statistical Hypotheses about the Difference Between Two Population Means When Sample Sizes Are Small . . . . . . . 229 10.5.1 Population Variances 12 and 22 Are Known . . . . . . . . 230 10.5.2 Population Variances 12 and 22 Are Unknown But 12 22 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 10.5.3 Population Variances 12 and 22 Are Unknown and 12 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 10.6 Paired t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 10.7 Testing Statistical Hypotheses about Population Proportions . . . 240 10.7.1 Testing of Statistical Hypotheses about One Population Proportion When Sample Size Is Large . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 10.7.2 Testing of Statistical Hypotheses about the Difference Between Two Population Proportions When Sample Sizes Are Large . . . . . . . . . . . . . . . . . . . 242 10.8 Testing Statistical Hypotheses about Population Variances . . . . 244 10.8.1 Testing Statistical Hypotheses about One Population Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 10.8.2 Testing Statistical Hypotheses about the Two Population Variances . . . . . . . . . . . . . . . . . . . . . . . . . . 247 10.9 An Alternative Technique for Testing of Statistical Hypotheses Using Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xi

Contents xi

Chapter 11 Computing Resources to Support Applied Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 11.1 Using MINITAB, Version 14 . . . . . . . . . . . . . . . . . . . . . . . . . 255 11.1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 11.1.2 Calculating Descriptive Statistics . . . . . . . . . . . . . . . . . 258 11.1.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 269 11.1.4 Estimation and Testing of Hypotheses about Population Mean and Proportion . . . . . . . . . . . . . . . . . 273 11.1.5 Estimation and Testing of Hypotheses about Two Population Means and Proportions . . . . . . . . . . . . . . . . 276 11.1.6 Estimation and Testing of Hypotheses about Two Population Variances . . . . . . . . . . . . . . . . . . . . . . . . . . 280 11.1.7 Testing Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 11.2 Using JMP, Version 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 11.2.1 Getting Started with JMP . . . . . . . . . . . . . . . . . . . . . . . 284 11.2.2 Calculating Descriptive Statistics . . . . . . . . . . . . . . . . . 286 11.2.3 Estimation and Testing of Hypotheses about One Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 11.2.4 Estimation and Testing of Hypotheses about Two Population Variances . . . . . . . . . . . . . . . . . . . . . . . . . . 300 11.2.5 Normality Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 11.3 Web-based Computing Resources . . . . . . . . . . . . . . . . . . . . . . 303 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table I Binomial probabilities . . . . . . . . . . . . . . . . . . . . . . . Table II Poisson probabilities . . . . . . . . . . . . . . . . . . . . . . . . Table III Standard normal distribution . . . . . . . . . . . . . . . . . . Table IV Critical values of Ď‡2 with degrees of freedom . . . . Table V Critical values of t with degrees of freedom . . . . . . Table VI Critical values of F with numerator and denominator degrees of freedom 1, 2 respectively ( 0.10)

311 312 315 317 318 320 321

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xii

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xiii

Figures

Figure 1.1 Figure 1.2 Figure 1.3 Figure 1.4 Figure 1.5 Figure 2.1 Figure 3.1 Figure 3.2 Figure 3.3 Figure 3.4 Figure 3.5 Figure 3.6 Figure 3.7 Figure 3.8 Figure 3.9 Figure 3.10 Figure 3.11 Figure 3.12 Figure 3.13 Figure 3.14 Figure 3.15 Figure 3.16 Figure 3.17 Figure 3.18

The normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Six Sigma (Motorola definition) . . . . . . . . . . . . . . . . . . . . . . . . . . . Current Six Sigma implementation flow chart . . . . . . . . . . . . . . . . Six Sigma support personnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classifications of statistical data . . . . . . . . . . . . . . . . . . . . . . . . . . . Dot plot for the data on defective motors that are received in 20 shipments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pie chart for defects associated with manufacturing process steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bar chart for annual revenues of a company over the period of five years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bar graph for the data in Example 3.7 . . . . . . . . . . . . . . . . . . . . . . Bar charts for types of defects in auto parts manufactured in Plant I (P1) and Plant II (P2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frequency histogram for survival time of parts under extreme operating conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relative frequency histogram for survival time of parts under extreme operating conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frequency polygon for the data in Example 3.9 . . . . . . . . . . . . . . . Relative frequency polygon for the data in Example 3.9 . . . . . . . . A typical frequency distribution curve . . . . . . . . . . . . . . . . . . . . . . Three types of frequency distribution curves . . . . . . . . . . . . . . . . . Cumulative frequency histogram for the data in Example 3.9 . . . . Ogive curve for the survival data in Example 3.9 . . . . . . . . . . . . . Line graph for the data on lawn mowers given in Example 3.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ordinary and ordered stem and leaf diagram for the data on survival time for parts in extreme operating conditions in Example 3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ordered stem and leaf diagram for the data in Table 3.10 . . . . . . . Ordered two-stem and leaf diagram for the data in Table 3.12 . . . Ordered five-stem and leaf diagram . . . . . . . . . . . . . . . . . . . . . . . .

xiii

2 3 4 6 7 12 21 23 25 26 26 29 30 30 31 31 32 32 33 34 36 37 38 38

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xiv

xiv Figures

Figure 3.19 MINITAB display depicting eight degrees of correlation: (a) represents strong positive correlation, (b) represents strong negative correlation, (c) represents positive perfect correlation, (d) represents negative perfect correlation, (e) represents positive moderate correlation, (f) represents negative moderate correlation, (g) represents a positive weak correlation, and (h) represents a negative weak correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Figure 4.1 Frequency distributions showing the shape and location of measures of centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 4.2 Two frequency distribution curves with equal mean, median and mode values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Figure 4.3 Application of the empirical rule . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Figure 4.4 Amount of soft drink contained in a bottle . . . . . . . . . . . . . . . . . . . 62 Figure 4.5 Dollar value of units of bad production . . . . . . . . . . . . . . . . . . . . . 62 Figure 4.6 Salary data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Figure 4.7 Quartiles and percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Figure 4.8 Box-whisker plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Figure 4.9 Example box plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Figure 4.10 Box plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Figure 5.1 Tree diagram for an experiment of testing a chip, randomly selecting a part, and testing another chip . . . . . . . . . . . . . . . . . . . . 76 Figure 5.2 Venn diagram representing the sample space S and the event A in S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Figure 5.3 Venn diagram representing the union of events A and B (shaded area) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Figure 5.4 Venn diagram representing the intersection of events A and B (shaded area) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Figure 5.5 Venn diagram representing the complement of an event A (shaded area) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Figure 5.6 Venn diagram–representing A∪B – {1, 4, 5, 6, 7, 8, 9, 10}, A∩B {7}, A {2, 3, 5, 9, 10}, B {1, 2, 3, 4, 6, 8} . . . . . . . . 82 Figure 5.7 Two mutually exclusive events, A and B . . . . . . . . . . . . . . . . . . . . 83 Figure 5.8 Venn diagram showing the phenomenon of P(A∪B) P(A) P(B) P(A∩B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Figure 6.1 Graphical representation of probability function in Table 6.2 . . . . 96 Figure 6.2 Graphical representation of probability function f(x) in Table 6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Figure 6.3 Graphical representation of the distribution function F(x) in Example 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Figure 6.4 Location of mean and the end point of interval ( 2, 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Figure 6.5 Binomial probability distribution with n 10, p 0.80 . . . . . . . . 105 Figure 7.1 An illustration of a density function of a continuous random variable X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Figure 7.2 Graphical representation of F(x) P(X x) . . . . . . . . . . . . . . . . 117 Figure 7.3 Uniform distribution over the interval (a, b) . . . . . . . . . . . . . . . . . 118 Figure 7.4 Probability P(x1 X x2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Figure 7.5 The normal density function curve with mean and standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Figure 7.6 Curves representing the normal density function with different means, but with the same standard deviation . . . . . . . . . . 122 Figure 7.7 Curves representing the normal density function with different standard deviations, but with the same mean . . . . . . . . . . . . . . . . . 123

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xv

Figures xv

Figure 7.8 Figure 7.9 Figure 7.10 Figure 7.11 Figure 7.12 Figure 7.13 Figure 7.14 Figure 7.15 Figure 7.16 Figure 7.17 Figure 7.18 Figure 7.19 Figure 7.20 Figure 7.21 Figure 7.22 Figure 7.23 Figure 7.24 Figure 7.25 Figure 8.1 Figure 8.2 Figure 8.3 Figure 8.4 Figure 8.5 Figure 8.6 Figure 8.7 Figure 8.8 Figure 8.9 Figure 8.10 Figure 8.11 Figure 8.12 Figure 8.13 Figure 8.14 Figure 8.15 Figure 8.16 Figure 8.17 Figure 8.18 Figure 8.19 Figure 8.20 Figure 8.21 Figure 9.1 Figure 9.2

The standard normal density function curve . . . . . . . . . . . . . . . . . 123 Probability (a Z b) under the standard normal curve . . . . . . . 124 Shaded area equal to P(1 Z 2) . . . . . . . . . . . . . . . . . . . . . . . . 125 Two shaded areas showing P(1.50 Z 0) P(0 Z 1.50) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Two shaded areas showing P(2.2 Z 1.0) P(1.0 Z 2.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Showing P(1.50 Z .80) P(1.50 Z 0) P(0 Z 0.80) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Shaded area showing P(Z 0.70) . . . . . . . . . . . . . . . . . . . . . . . . . 126 Shaded area showing P(Z 1.0) . . . . . . . . . . . . . . . . . . . . . . . . 126 Shaded area showing P(Z 2.15) . . . . . . . . . . . . . . . . . . . . . . . . . 127 Shaded area showing P(Z 2.15) . . . . . . . . . . . . . . . . . . . . . . . 127 Converting normal N(6,4) to standard normal N(0,1) . . . . . . . . . . 128 Shaded area showing P(0.5 Z 2.0) . . . . . . . . . . . . . . . . . . . . . 128 Shaded area showing P(1.0 Z 1.0) . . . . . . . . . . . . . . . . . . . 128 Shaded area showing P(-1.50 Z 0.50) . . . . . . . . . . . . . . . . 129 Graphs of exponential density function for 0.1, 0.5, 1.0, and 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Curves of three hazard rate functions . . . . . . . . . . . . . . . . . . . . . . . 132 Hazard function h(t) with 1; 0.5, 1, 2 . . . . . . . . . . . . . . . 133 Weibull density function (a) 1, 0.5 (b) 1, 1 (c) 1, 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Shaded area showing P(2 Z 2) . . . . . . . . . . . . . . . . . . . . . . 142 Shaded area showing P(Z 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Shaded area showing P(2.28 Z 2.28) . . . . . . . . . . . . . . . . . 143 Shaded area showing P(Z 1.14) . . . . . . . . . . . . . . . . . . . . . . . . . 143 Shaded area showing P(1.5 Z 1.5) . . . . . . . . . . . . . . . . . . . 144 Shaded area showing P(1.6 Z 1.6) . . . . . . . . . . . . . . . . . . . 144 Shaded area showing P(2 Z 2) . . . . . . . . . . . . . . . . . . . . . . 144 Shaded area showing P(1.71 Z 1.71) . . . . . . . . . . . . . . . . . 146 Shaded area showing P(2.23 Z 2.23) . . . . . . . . . . . . . . . . . 147 Chi-square distribution with different degrees of freedom . . . . . . . 149 Chi-square distribution with upper-tail area . . . . . . . . . . . . . . . . 149 Chi-square distribution with upper-tail area 0.05 . . . . . . . . . . 150 Chi-square distribution with lower-tail area . . . . . . . . . . . . . . . . 150 Chi-square distribution with lower-tail area 0.10 . . . . . . . . . . 151 Frequency distribution function of t-distribution with, say n 15 degrees of freedom and standard normal distribution . . . . . . . . . . 154 t-distribution with shaded area under the two tails equal to P(T tn,) P(T tn,) . . . . . . . . . . . . . . . . . . . . . . . . . . 154 A typical probability density function curve of F1, 2 . . . . . . . . . 156 Probability density function curve of F1, 2 with upper-tail area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Probability density function curve of F1, 2 with lower-tail area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Comparison of histograms for various binomial distributions (n 15, p 0.2, 0.3, 0.4, 0.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 (a) Showing the normal approximation to the binomial. (b) Replacing the shaded area contained in the rectangles by the shaded area under the normal curve . . . . . . . . . . . . . . . . . . . . . . . . 163 An interpretation of a confidence interval . . . . . . . . . . . . . . . . . . . 172 Standard normal-curve with tail areas equal to /2 . . . . . . . . . . . . 174

Marketing_Gupta.qxd

xvi

1/28/05

1:36 PM

Page xvi

Figures

(a) Standard normal curve with lower-tail area equal to (b) Standard normal curve with upper-tail area equal to . . . . . . . 175 Figure 9.4 Studentâ€™s t-distribution with tail areas equal to /2 . . . . . . . . . . . . 178 Figure 9.5 Chi-square distribution with two tail areas each equal to 0.025 . . . 197 Figure 9.6 F-distribution curve (a) shaded area under two tails each equal to 0.025 (b) shaded area under left tail equal to 0.05 (c) shaded area under the right tail equal to 0.05 . . . . . . . . . . . . . . 200 Figure 10.1 Critical points dividing the sample space of Ë† in two regions, the rejection region and the acceptance region . . . . . . . . . . . . . . . . . . . 204 Figure 10.2 OC-curves for different alternative hypotheses . . . . . . . . . . . . . . . 206 Figure 10.3 Power curves for different hypotheses . . . . . . . . . . . . . . . . . . . . . . 207 Figure 10.4 Rejection regions for hypotheses (i), (ii), and (iii) . . . . . . . . . . . . . 209 Figure 10.5 Lower-tail rejection region with 0.01 . . . . . . . . . . . . . . . . . . . 210 Figure 10.6 Two-tail rejection region with 0.01 . . . . . . . . . . . . . . . . . . . . . 211 Figure 10.7 Power curve for the test in example 10.2 . . . . . . . . . . . . . . . . . . . . 213 Figure 10.8 Rejection regions for hypotheses (i), (ii), and (iii) . . . . . . . . . . . . . 214 Figure 10.9 Rejection region under the lower test with 0.05 . . . . . . . . . . . 215 Figure 10.10 Rejection regions for testing hypotheses (i), (ii), and (iii) at the 0.05 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Figure 10.11 Rejection region under the upper tail with 0.05 . . . . . . . . . . . 218 Figure 10.12 Rejection regions under the two tails with 0.05 . . . . . . . . . . . 219 Figure 10.13 Rejection regions for testing hypotheses (i), (ii), and (iii) at 0.05 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Figure 10.14 Rejection regions for a two-tail test with 0.05 . . . . . . . . . . . . 222 Figure 10.15 Rejection regions for testing hypotheses (i), (ii), and (iii) at the 0.05 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Figure 10.16 Rejection region under the lower tail with 0.05 . . . . . . . . . . . 225 Figure 10.17 Rejection regions under the two tails with 0.05 . . . . . . . . . . . 226 Figure 10.18 Rejection regions for testing hypotheses (i), (ii), and (iii) at the given . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Figure 10.19 Rejection region under the upper tail with 0.05 . . . . . . . . . . . 228 Figure 10.20 Rejection region under the upper tail with 0.01 . . . . . . . . . . . 231 Figure 10.21 Rejection regions for testing hypotheses (i), (ii), and (iii) at the level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Figure 10.22 Rejection region under the upper tail with 0.025 . . . . . . . . . . 234 Figure 10.23 Rejection regions for testing the hypotheses (i), (ii), and (iii) at the level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Figure 10.24 The rejection region under the two tails with 0.01 . . . . . . . . . 237 Figure 10.25 Rejection region under the lower tail with 0.05 . . . . . . . . . . . 240 Figure 10.26 Rejection regions under the two tails with 0.05 . . . . . . . . . . . 242 Figure 10.27 Rejection regions for testing hypotheses (i), (ii), and (iii) at the 0.05 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . 244 Figure 10.28 Rejection region under the chi-square distribution curve for testing hypotheses (i), (ii), and (iii) at the level of significance . . . . . . . 245 Figure 10.29 Rejection region under the lower tail with 0.05 . . . . . . . . . . . 246 Figure 10.30 Rejection region under the F-distribution curve for testing hypotheses (i), (ii), and (iii) at the level of significance . . . . . . . 247 Figure 10.31 Rejection region under the two tails with 0.05 . . . . . . . . . . . . 249 Figure 10.32 Rejection region under the right tail with 0.05 . . . . . . . . . . . . 249 Figure 11.1 The screen that appears first in the MINITAB environment . . . . . 256 Figure 11.2 MINITAB window showing the menu command options . . . . . . . 257 Figure 11.3 MINITAB window showing input and output for Column Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Figure 9.3

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xvii

Figures xvii

Figure 11.4 MINITAB window showing various options available under Stat command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Figure 11.5 MINITAB display of histogram for the data given in example 11.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Figure 11.6 MINITAB window showing Edit Bars dialog box . . . . . . . . . . . . 262 Figure 11.7 MINITAB display of histogram with 5 classes for the data in Example 11.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Figure 11.8 MINITAB output of Dotplot for the data in Example 11.4 . . . . . . 264 Figure 11.9 MINITAB output of Scatterplot for the data given in Example 11.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Figure 11.10 MINITAB display of box plot for the data in Example 11.6 . . . . . 266 Figure 11.11 MINITAB display of graphical summary for the data in example 11.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Figure 11.12 MINITAB display of bar graph for the data in Example 11.8 . . . . 269 Figure 11.13 MINITAB display of pie chart for the data in Example 11.9 . . . . . 270 Figure 11.14 MINITAB printout of 95% Bonferroni confidence interval for standard deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Figure 11.15 MINITAB display of normal probability graph for the data in example 11.19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Figure 11.16 The screen that appears first in the JMP environment . . . . . . . . . . 284 Figure 11.17 JMP menu command options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Figure 11.18 JMP window showing input and output for Column Statistics . . . 287 Figure 11.19 JMP Distribution dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Figure 11.20 JMP display of histogram for the data given in Example 11.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 Figure 11.21 JMP printout of stem and leaf for the data given in Example 11.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Figure 11.22 JMP display of box plot with summary statistics for Example 11.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Figure 11.23 JMP display of graphical summary for the data in Example 11.22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Figure 11.24 JMP display of bar graph for the data in Example 11.23 . . . . . . . . 293 Figure 11.25 JMP printout of pie chart for the data in Example 11.24 . . . . . . . . 295 Figure 11.26 JMP printout of 1 sample t-test for the data in Example 11.25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 Figure 11.27 JMP printout of 1 sample z-test for the data in Example 11.26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Figure 11.28 JMP printout of 2-sample t-test for the data in Example 11.27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Figure 11.29 JMP printout of paired t-test for the data in Example 11.28 . . . . . 300 Figure 11.30 JMP printout of test of equal variances in Example 11.29 . . . . . . . 301 Figure 11.31 JMP display of normal quantile graph for the data in Example 11.30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xviii

Tables

Table 1.1 Table 1.2 Table 3.1 Table 3.2 Table 3.3 Table 3.4 Table 3.5 Table 3.6 Table 3.7 Table 3.8 Table 3.9 Table 3.10 Table 3.11 Table 4.1 Table 5.1 Table 5.2 Table 6.1 Table 6.2 Table 6.3 Table 6.4 Table 6.5 Table 7.1 Table 8.1 Table 8.2 Table 8.3

Process step completion times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Annual revenues of 110 small to midsize companies in the midwestern United States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Frequency distribution table for 110 small to midsize companies in the midwestern United States . . . . . . . . . . . . . . . . . . 17 Complete frequency distribution table for the 110 small to midsize companies in the midwestern United States . . . . . . . . . . . 17 Complete frequency distribution table for the data in Example 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Frequency table for the data on rod lengths . . . . . . . . . . . . . . . . . . 20 Understanding defect rates as a function of various process steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Frequency distribution table for the data in Example 3.7 . . . . . . . . 25 Frequency distribution table for the survival time of parts . . . . . . . 28 Data on survival time (in hours) in Example 3.9 . . . . . . . . . . . . . . 35 Number of parts produced by each worker per week . . . . . . . . . . . 37 Cholesterol levels and systolic BP of 30 randomly selected U.S. men . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Age distribution of group of 40 people watching a basketball game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Classification of technicians by qualification and gender . . . . . . . 89 Classification of technicians by qualification and gender . . . . . . . 91 Probability distribution of a random variable X . . . . . . . . . . . . . . . 95 Probability distribution of random variable X defined in Example 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Probability function of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Portion of Table I of the appendix for n 5 . . . . . . . . . . . . . . . . . 106 Portion of Table II of the appendix . . . . . . . . . . . . . . . . . . . . . . . . . 114 A portion of standard normal distribution Table III of the appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Population with its distribution for the experiment of rolling a fair die . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 All possible samples of size 2 with their respective means . . . . . . 139 Different sample means with their respective probabilities . . . . . . 139

xviii

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xix

Tables xix

Table 8.4 Table 8.5 Table 8.6 Table 10.1 Table 10.2 Table I Table II Table III Table IV Table V Table VI

A portion of the t-table giving the value of tn, for certain values of n and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Comparison of approximate probabilities to the exact probabilities (n 5, p 0.4, 0.5) . . . . . . . . . . . . . . . . . . . . . . . . . 161 Showing the use of continuity correction factor under different scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Presenting the view of type I and type II errors . . . . . . . . . . . . . . . 205 Confidence intervals for testing various hypotheses . . . . . . . . . . . 252 Binomial probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Poisson probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Critical values of 2 with degrees of freedom . . . . . . . . . . . . . . . 320 Critical values of t with degrees of freedom . . . . . . . . . . . . . . . . 322 Critical values of F with numerator and denominator degrees of freedom 1, 2 respectively ( 0.10) . . . . . . . . . . . . . . . . . . . . . 323

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xx

Preface

A

pplied Statistics for the Six Sigma Green Belt was written as a desk reference and instructional aid for individuals involved with Six Sigma project teams. As Six Sigma team members, green belts will help select appropriate statistical tools, collect data for those tools, and assist with data interpretation within the context of the Six Sigma methodology. Composed of steps or phases titled Define, Measure, Analyze, Improve, and Control (DMAIC), the Six Sigma methodology calls for the use of many more statistical tools than is reasonable to address in one large book. Accordingly, the intent of this book is to provide Green Belts with the benefit of a thorough discussion relating to the underlying concepts of â€œbasic statistics.â€? More advanced topics of a statistical nature will be discussed in three other books that, together with this book, will comprise a four-book series. The other books in the series will discuss statistical quality control, introductory design of experiments and regression analysis, and advanced design of experiments. While it is beyond the scope of this book and series to cover the DMAIC methodology specifically, we do focus this book and series on concepts, applications, and interpretations of the statistical tools used during, and as part of, the DMAIC methodology. Of particular interest in this book, and indeed the other books in this series, is an applied approach to the topics covered while providing a detailed discussion of the underlying concepts. This level of detail in providing the underlying concepts is particularly important for individuals lacking a recent study of applied statistics as well as for individuals who may never have had any formal education or training in statistics. In fact, one very controversial aspect of Six Sigma training is that, in many cases, this training is targeted at the Six Sigma Black Belt and is all too commonly delivered to large groups of people with the assumption that all trainees have a fluent command of the underlying statistical concepts and theory. In practice this assumption commonly leads to a good deal of concern and discomfort for trainees who quickly find it difficult to keep up with and successfully complete black beltâ€“level training. This concern and discomfort becomes even more serious when individuals involved with Six

xx

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xxi

Preface xxi

Sigma training are expected to pass a written and/or computer-based examination that so commonly accompanies this type of training. So if you are beginning to learn about Six Sigma and are either preparing for training or are supporting a Six Sigma team, the question is: How do I get up to speed with applied statistics as quickly as possible so I can get the most from training or add the most value to my Six Sigma team? The answer to this question is simple and straightforward—get access to a book that provides a thorough and systematic discussion of applied statistics, a book that uses the plain language of application rather than abstract theory, and a book that emphasizes learning by examples. Applied Statistics for the Six Sigma Green Belt has been designed to be just that book. This book was organized so as to expose readers to applied statistics in a thorough and systematic manner. We begin by discussing concepts that are the easiest to understand and that will provide you with a solid foundation upon which to build further knowledge. As we proceed with our discussion, and as the complexity of the statistical tools increases, we fully intend that our readers will be able to follow the discussion by understanding that the use of any given statistical tool, in many cases, enables us to use additional and more powerful statistical tools. The order of presentation of these tools in our discussion then will help you understand how these tools relate to, mutually support, and interact with one another. We will continue this logic of the order in which we present topics in the remaining books in this series. Getting the most benefit from this book, and in fact from the complete series of books, is consistent with how many of us learn most effectively—start at the beginning with less complex topics, proceed with our discussion of new and more powerful statistical tools once we learn the “basics,” be sure to cover all the statistical tools needed to support Six Sigma, and emphasize examples and applications throughout the discussion. So let us take a look together at Applied Statistics for the Six Sigma Green Belt. What you will learn is that statistics aren’t mysterious, they aren’t scary, and they aren’t overly difficult to understand. As in learning any topic, once you learn the “basics” it is easy to build on that knowledge—trying to start without a knowledge of the basics, however, is generally the beginning of a difficult situation!

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xxii

Acknowledgments

W

e would like to thank Professors John Brunette and Cheng Peng of the University of Southern Maine, and Ramesh Gupta and Pushpa Gupta of the University of Maine, Orono, for reading the final draft line-by-line. Their comments and suggestions have proven to be invaluable. We would like to thank Professor Joel Irish of the University of Southern Maine for help in writing a computer program in Mathematica that was used to prepare all the figures in this book. We thank graduate students Mohamad Ibourk, Seetha Shetty and Melanie Thompson for help preparing the chapter on computer resources, as well as Mary Ellen Costello and Stacie Santomango for general manuscript preparation. Also, we thank Laurie McDermott, administrative assistant of the Department of Mathematics and Statistics of the University of Southern Maine, for help in typing the various drafts of the manuscript. We would like to thank the several anonymous reviewers whose constructive suggestions greatly improved the presentations. We also want to thank Annemieke Hytinen, acquisition editor, and Paul O’Mara, project editor, of ASQ Quality Press for their patience and cooperation throughout the preparation of this project. We thank Minitab Inc. for permitting us to print MINITAB® screen shots in this book. MINITAB® and the MINITAB logo® are registered trademarks of Minitab Inc. We also thank SAS Institute Inc., of Cary, North Carolina, for permitting us to reprint screen shots of JMP v. 5.1 (© 2004 SAS Institute Inc. SAS, JMP and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration). Most of all, the authors would like to thank their families. Bhisham is grateful to his wife, Swarn, daughters Anita and Anjali, and son, Shiva, for their deep love and support. He is grateful to his son-in-law, Mark, for his expressed curiosity. Last but not least, he is grateful to his first grandchild,

xxii

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xxiii

Acknowledgments xxiii

Priya, for reminding him that there is always time for play. Fred would like to sincerely thank his wife, Julie, and sons, Carl and George, for their love, support, and patience as he worked on this and two previous books. Without their encouragement, such projects would not be possible or meaningful. â€”Bhisham C. Gupta â€”H. Fred Walker

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xxiv

Introduction

W

henever a process is not producing products or services at a desired level of quality, an investigation is launched to better understand and improve the process. In many instances such investigations are launched to rapidly identify and correct underlying problems as part of a problem solving methodologyâ€”one such methodology is commonly known as â€œroot cause analysis.â€? Many problem-solving methodologies, such as root cause analysis, rely on the study of numerical (quantitative) or non-numerical (qualitative) data as a means of discovering the true cause to one or more problems negatively impacting product or service quality. The problemsolving methodologies, however, are all too commonly used to investigate problems that need a quick solution and thus are not afforded the time or resources needed for a particularly detailed or in-depth analysis. Further, problem-solving methodologies are also all too commonly used to investigate problems without sufficient analysis of a series of costs associated with a given problem as they relate to lost profit or opportunity, human resources needed to investigate the problem, and so forth. Let us not have the wrong impression of problem-solving methodologies such as root cause analysis! Each of these methodologies has a proper place in quality and process improvement; however, the scope or size of the problem needs also to be considered. In this context, when problems are smaller and are easier to understand, we can effectively use less rigorous, complicated, and thorough problem-solving methodologies. When problems become large, complex, and expensive, a more detailed and robust problem-solving methodology is needed, and that problem-solving methodology is Six Sigma. While it is beyond the intended scope of this book to discuss, in detail, the Six Sigma methodology as an approach to problem solving, it is the explicit intent of this book to describe the concepts and application of tools and techniques used to support the Six Sigma methodology. Next we give a brief description of the topics discussed in the book, followed by where in the Six Sigma methodology you can expect to use these tools and techniques. In Chapter 1 we introduce the concept of Six Sigma from both statistical and quality perspectives. We briefly describe what we need for converting data into information. In statistical applications we come across various xxiv

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xxv

Introduction

types of data that require specific analyses that depend upon the types of data we are working with. It is therefore important to distinguish between different types of data. In Chapter 2 we discuss and provide examples for different types of data. In addition, terminology such as population and sample are introduced. In Chapter 3 we introduce several graphical methods found in descriptive statistics. These graphical methods are some of the basic tools of statistical quality control (SQC). These methods are also very helpful in understanding the pertinent information contained in very large and complex datasets. In Chapter 4 we learn about the numerical methods of descriptive statistics. Numerical methods that are applicable to both sample as well as population data provide us with quantitative or numerical measures. Such measures further enlighten us about the information contained in the data. In Chapter 5 we proceed to study the basic concepts of probability theory and see how probability theory relates to applied statistics. We also introduce the random experiment and define sample space and events. In addition, we study certain rules of probability and conditional probability. In Chapter 6 we introduce the concept of a random variable, which is a vehicle used to assign some numerical values to all the possible outcomes of a random experiment. We also study probability distributions and define mean and standard deviation of random variables. Specifically, we study some special probability distributions of discrete random variables such as Bernoulli, binomial, hypergeometric, and Poisson distributions, which are encountered frequently in many statistical applications. Finally, we discuss under what conditions (e.g., the Poisson process) these probability models are applicable. In Chapter 7 we continue studying probability distributions of random variables. We introduce the continuous random variable and study its probability distribution. We specifically examine uniform, normal, exponential, and Weibull continuous probability distributions. The normal distribution is the backbone of statistics and is extensively used in achieving Six Sigma quality characteristics. The exponential and Weibull distributions form an important part of reliability theory. The hazard or failure rate function is also introduced. Having discussed probability distributions of data as they apply to discrete and continuous random variables in Chapters 6 and 7, in Chapter 8 we expand our study to the probability distributions of sample statistics. In particular, we study the probability distribution of the sample mean and sample proportion. We then study Studentâ€™s t, chi-square, and F distributions. These distributions are an essential part of inferential statistics and, therefore, of applied statistics. Estimation is an important component of inferential statistics. In Chapter 9 we discuss point estimation and interval estimation of population mean and of difference between two population means, both when sample size is large and when it is small. Then we study point estimation and interval estimation of population proportion and of difference between two population proportions when the sample size is large. Finally, we study the estimation of a population variance, standard deviation, ratio of two population variances, and ratio of two population standard deviations.

xxv

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page xxvi

xxvi Introduction

Table 1 Applied statistics and the Six Sigma methodology. Six Sigma Phase

Tool or Technique

Where in this book?

Define

Descriptive Statistics Graphical Methods Numerical Descriptions

Chapter 2 Chapter 3 Chapter 4

Measure

Sampling Point & Interval Estimation

Chapter 8 Chapter 9

Analyze

Probability Discrete & Continuous Distributions Hypothesis Testing

Chapter 5 Chapters 6 & 7 Chapters 10

Improve Control

In Chapter 10 we study another component of inferential statistics, which is the testing of statistical hypotheses. The primary aim of statistical hypotheses is to either refute or support the existing theory, which is, in other words, what is believed to be true based upon the information contained in sample data. This further enhances good procedures. In this chapter we discuss the techniques of testing statistical hypotheses for one population mean and for differences between two population means, both when sample sizes are large and when they are small. We also discuss techniques of testing hypotheses for one population proportion and for differences between two population proportions when sample sizes are large. Finally, we discuss testing of statistical hypotheses for one population variance and for ratio of two population variances under the assumption that the populations are normal. The results of Chapter 9 and this chapter are frequently used in statistical quality control (SQC) and design of experiments (DOE). In Chapter 11 we consider computer-based tools for applied statistical support. Computing resources were purposefully included at the end of the book so as to encourage readers not to rely on computers until after they have gained a mastery of the statistical content presented in the preceding chapters. But where in the Six Sigma methodology do we use these tools and techniques? The answer is throughout the methodology! Let’s take a closer look. The information contained in Table 1 will help us better relate specific tools and techniques to phases of the Six Sigma methodology as they relate to the intended scope and purpose of this book—a basic level of applied statistics. Additional topics will be discussed in later books in this series. As topics are discussed in later books, these topics will be added to content of Table 1 and readers can use the table to help associate specific tools to the Six Sigma methodology. The array of topics as they relate to the Six Sigma methodology is helpful in understanding where you may use these tools and techniques. It is important to note however, that any of these tools and techniques may come into play in more than one phase of the Six Sigma methodology, and in fact, should be expected to do so. What is presented in Table 1 is a first point in the methodology you may expect to use these tools and techniques. From here it’s time to get started! Enjoy!

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 1

1 Setting the Context for Six Sigma

I

t is important to begin our discussion of applied statistics by recognizing that Six Sigma (6) has come to refer simultaneously to two related but different ideas. The first idea is that of the technical definition as a statistical concept—this technical definition will be provided and explained in sections 1.1 and 1.2, respectively. The second idea is that of a comprehensive approach and methodology for problem solving and process improvement— this comprehensive approach and methodology will be briefly outlined in section 1.3; however, a thorough discussion of the 6 approach and methodology is beyond the scope of this book. The remainder of the chapter will be devoted to describing how the green belt contributes to 6 efforts (section 1.4) and how the green belt goes about the task of converting data into useful information (section 1.5).

1.1 Six Sigma Defined as a Statistical Concept Six Sigma is a measure of process quality wherein the distance between a target value and the upper or lower specification limit is at least six standard deviations. The most widely publicized consequence of a 6 process is that there are 3.4 defects per million opportunities (DPMO). DPMO is defined not as a count of defects alone, but rather as a ratio of defects compared to the number of opportunities for defects to occur. Since most operators, service providers, technicians, engineers, and managers are trained to think in terms of counting total defects, the concept of comparing defects to opportunities for defects to occur is counterintuitive. In fact, determining what constitutes an “opportunity” for a defect to occur has, in some circles, become controversial. Now combining these ideas of 3.4 DPMO, that defects are compared to opportunities for those defects to occur and that the definition of an opportunity is not universally agreed upon, means we have a statistical concept (i.e., 6) that is difficult for a great many people to understand—even for professionals with advanced levels of statistical training and education!

1

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 2

2 Chapter One

Not to worry! We can readily understand the meaning of this 6 concept if we avoid the unnecessary rigor of a theoretical discussion and focus on its application.

1.2 Now, Six Sigma Explained as a Statistical Concept In its purest statistical form, 6 refers to six standard deviations and describes the variability of a process in what is commonly referred to as a “measure of dispersion.” In this case, three standard deviations would be located above some “measure of location” such as a mean or average, and three standard deviations would be located below the same measure of location, as illustrated in Figure 1.1. As you can see from Figure 1.1, the standard deviations are combined to form the boundaries of what is referred to as a “normal” distribution—this normal distribution is also commonly referred to as the “bell-shaped” curve. It is important to note that a much more detailed discussion of the topics identified above, and related topics, will be provided where they are appropriate later in this book. For now, let us continue with our explanation of 6. As was stated above, 6 refers to a defect rate equivalent to 3.4 DPMO—this is where understanding the term and concept of 6 can become unnecessarily difficult. And while some people take great satisfaction in being able to explain 6 at an excruciating level of technical detail, such detail is not necessary to grasp a general understanding of the concept. To avoid an unnecessary level of complexity, while still being able to understand the concept, let us think of 6 as illustrated in Figure 1.2. In Figure 1.2 we can readily see there is a normal or bell-shaped distribution. What makes the distribution interesting is that the width of the distribution that describes variability is quite narrow compared to some limits, for example, specification limits. These specification limits are generally provided

µ − 3 σ

Figure 1.1

The normal distribution.

µ

µ +3 σ

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 3

Setting the Context for Six Sigma 3

0 +-1.5 σ

LSL

USL Cp=2

Cpk=1.5 3.4 DPMO

3.4 DPMO

Cpk=1.5

Cp=2 Cp=Cpk=2

−6 σ

−5 σ

−4 σ

−3 σ

−2 σ

6 σ to LSL

Figure 1.2

−1σ

0

+1σ

+2 σ

+3 σ

+4 σ

+5 σ

6 σ to USL

Six Sigma (Motorola definition).

by customers in the form of tolerances and describe the values for which products or services must conform to be considered “good” or acceptable. There is more to this explanation, however. Again, looking at Figure 1.2 we see that because the width of the distribution is so much smaller than the width of the limits that it is possible for the location of the distribution to move around, or vary, within the limits. This movement or natural variation is inherent in any process, and so anticipating the movement is exactly what we want to be able to do! In fact, we want as much room as possible for the distribution to move within the limits so we do not risk the distribution moving outside these limits. Now someone may ask, “Why would the distribution move around within the limits?” and “How much movement would we expect?” Both are interesting questions, and both questions help us better understand this concept called 6 as it refers to quality. When a process is operating, whether that process involves manufacturing operations or service delivery, variation within that process is to be expected. Variation occurs both in terms of the measures of dispersion (i.e., the width of a process) and measures of location (i.e., where the center of that process lies). During normal operation we would expect the location of a process (described numerically by the measure of location) to vary or move / 1.5 standard deviations. Herein lies explanation of 6. Our goal is to reduce the variability of any process as compared to the process limits to a point where there is room for a / 1.5 standard deviation move, accounting for the natural variability of the process while containing all that variability within the limits. Such a case is referred to as a 6 level of quality, wherein no more than 3.4 DPMO would be expected to fall outside the limits.

1.3 Six Sigma as a Comprehensive Approach and Methodology for Problem Solving and Process Improvement Having been mystified or confused about the technical definition of 6, many people never fully develop an understanding that 6 is really referring

+6 σ

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 4

4 Chapter One

Commitment made to implement Six Sigma

Champion team formed

Potential projects identified and evaluated

Begin/Charter projects with DMAIC methodology

Yes

Do projects meet criteria?

No

Discontinue consideration of project

Define phase Measure phase Analyze phase

Is phase review successfully completed?

No

Improve phase Control phase

Yes

Complete projects with DMAIC methodology

Verify financial payback criteria have been met

Have financial payback criteria been met?

No

Yes

Complete project involvement and documentation

Figure 1.3

Current Six Sigma implementation flow chart.

Reconsider project selection criteria

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 5

Setting the Context for Six Sigma 5

more to a comprehensive approach and methodology for problem solving and process improvement than to a statistical concept. Developing such an understanding is necessary sooner, rather than later, because implementation of 6 is based on the use of a wide variety of tools and techniquesâ€”some statistical in nature and some notâ€”where they are appropriate to support each of several phases in the methodology. While originally developed as a phased approach to problem solving and process improvement, 6 started as a sequential progression of phases titled Measure, Analyze, Improve, and Control (MAIC). Six Sigma was later expanded to include a Define phase, as it became apparent more attention was needed to identify, understand, and adequately describe problems or opportunities. In what is now known as the DMAIC approach and methodology, 6 continues to be improved upon, and the addition of new phases as formal components of the methodology is being discussed in various venues. In its current form of implementation, however, Six Sigma is practiced as identified in Figure 1.3. However, as 6 evolves, it is clear that several levels of stakeholders, participants, and team members will be needed to apply the tools and techniques as they are called for within the methodology. And as a percentage of the total number of people involved with 6 efforts, green belts will continue to represent one of the largest groups of stakeholders, participants, and team members.

1.4 Understanding the Role of the Six Sigma Green Belt as Part of the Bigger Picture Green belts constitute one of the largest contributors to 6 efforts, as highlighted in Figure 1.4. As seen in Figure 1.4, green belts are close to process operations and work directly with shop floor operators and service delivery personnel. Green belts most commonly collect data, make initial interpretations, and begin to formulate recommendations that are fed to black belts. Black belts then perform more thorough analyses, generally with additional data and input from other sources, and make recommendations to master black belts and project champions. The flow of involvement and responsibilities described above is the essence of how 6 has been implemented to date. What is interesting, though, is not how 6 has been implemented to date, but how the implementation of 6 is changing. A current trend consistent with administration of quality and certain management functions is to push responsibility to lower levels within organizations. How this applies to implementation of 6 is that greater responsibility for problem or opportunity identification, data collection, analysis, and corrective action is being levied on green belts. To support that trend, many consultants providing 6 training now include green belts and black belts in the same classes. This means that, in many cases, green belts receive training on all the tools and techniques, as do black belts, and the expectation is that green belts will assume more

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 6

6 Chapter One

6σ Master Black Belts

6σ Black Belts

6σ Green Belts

Process operators and service delivery personnel

Figure 1.4

Six Sigma support personnel.

responsibility for day-to-day operation of 6 efforts. So we see a redefinition of responsibilities wherein the green belts no longer simply collect data as prescribed by black belts, but rather green belts are rapidly being tasked with collecting data and, more importantly, converting that data into useful information.

1.5 Converting Data into Useful Information What does this mean, “converting data into useful information”? It implies that data and information are somehow different things—they are! Data represent raw facts. Raw facts by themselves do not convey to us much meaning. Consider Table 1.1. Table 1.1 Process step completion times. 24 21 28 30 20

22 26 25 29 23

29 20 27 24 27

27 28 24 26 25

29 24 31 23 26

Table 1.1 has several rows and columns of numbers. These numbers correspond to measurements of the average time to complete a process step. As a collection of numbers, the data in Table 1.1 do not help us understand much about the process. To really understand the process, we need to convert the data into information, and to convert the data we use appropriate tools and

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 7

Setting the Context for Six Sigma 7

Table 1.2 Descriptive statistics. Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N

17.5

Figure 1.5

20

22.5

25

25.52 3.0430248 0.608605 26.776099 24.263901 25

27.5

30

32.5

Histogram.

techniques. In this case we can use simple descriptive statistics to help us quantify certain parameters and we can use graphics to help us visually convert the data into information, as shown in Table 1.2 and Figure 1.5, respectively. Table 1.2 indicates the mean (or average) is 25.52 and the standard deviation is 3.0430248. Now the data have been processed to give us a pair of quantitative values, we can better understand the process. Figure 1.5 indicates that the data appear to be distributed in a manner that looks like the normal distribution—a bell-shaped curve. And while we do gain some understanding of any given process by converting data into information such as the mean and standard deviation, we generally also gain very useful information by presenting the same data graphically. And so begins the job of the 6 green belt—converting data into information. As a final thought in this chapter, it is worth noting that not all information is “useful” information. You will read about many tools and techniques in the following chapters. It is important to note that these tools and techniques are what we call “blind to mistakes and misinterpretations.” This is to say that the tools and techniques will not tell you whether the information you create is good or bad. Nor will the tools and techniques give you guidance on how to interpret the information—for that you will have to learn the lessons contained in this book and be careful what information to use, how to use that information, and when.

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 8

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 333

Index

absolute probability, 89–90 acceptance regions, 204 aging factors, 129, 131–132 Alpha, defined, 305 alternative hypotheses, 202, 203, 305 alternatives, two-tail, 203 Analyze phase, tools/techniques associated, xxi arithmetic means, 305. See also means association, measures of, 39–44 associations, perfect, 41 axiomatic approach to probability theory, 86–88

normal approximation to, 159–163 point, 102 Poisson approximation to, 111–112, 158–159 sampling with replacement and, 107 standard deviation, 106–107 binomial probabilities, tables of, 105–106, 314–316 bivariate data, 39, 41–44 black belts, responsibilities of, 5–6 BMDP software, 255 bound on error of estimation, 168, 192, 305, 307 box-whisker plots, 66–70, 264–266, 290–291, 305

B

C

bar charts, 23–27, 267–269, 292–294, 305 before and after data, hypothesis testing with, 237–240 bell-shaped curves, 2 Bernoulli distributions, 102 Bernoulli populations, 147 Bernoulli random variables, 102–103 Bernoulli trials, 101–102, 305 Beta, defined, 305 beta function, 153 bias in point estimators, 167–170, 310 bimodal data, 305 bimodal distributions, 305 binomial distributions calculating in MINITAB, 270, 271–272 criteria for applying, 102 defined, 305 means, 106–107

categorical data, graphical representations, 22–27 cdf (cumulative distribution function), 97–98, 117 central limit theorem, 121, 141–148, 305 central tendency, measures of. See measures of centrality chance, 71 charts. See also JMP; MINITAB bar, 23–27, 267–269, 292–294, 305 box-whisker plots, 66–70, 264–266, 290–291, 305 categorical data, 22–27 control, 33 dot plots, 20–21, 39, 262–264 frequency distribution tables, 15–20 histograms, 27–34, 35–37, 260–262, 288–289, 307

A

333

Marketing_Gupta.qxd

334

1/28/05

1:36 PM

Page 334

Index

JMP, 291–292 line graphs, 33–34, 39 Pareto, 24 pie, 22–23, 268–270, 294–295, 309 probability function, 97 scatter plots, 21, 39–44 Six Sigma implementation flow, 4 stem and leaf diagrams, 27, 34–39, 310 summary information, 266–267, 291–292 time series, 33–34 tree, 75–77 Venn diagrams, 310 chi-square critical values table, 320–321 chi-square distributions, 148–152, 270, 305, 320–321 chi-square goodness-of-fit test, 305 chi-square test of independence, 305 classes, 20, 306 coefficients, confidence, 171–172 coefficients, correlation, 40, 306 coefficients of variation (CV), 52, 57, 306 combinations of objects, 77, 79 complement operations, 80–83, 306 composite hypotheses, 203 conditional probability, 88–91, 306 confidence coefficients, 171–172 confidence intervals. See also interval estimation defined, 165, 171 differences between two population means, 180–187 hypothesis testing, 250–254, 275–276 for large sample sizes, 173–177, 180–183, 187–192 one-sided, 174, 176 pivotal quantities and, 172–173 for population proportions, 187–192 for population variances, 195–198 for ratio of two population variances, 198–200 for small sample sizes, 177–180, 183–187 Students t-distribution and, 180 two-sided, 173–174, 176 confidence limits, 171–172 contingency tables, 306 continuity correction factor, 160, 306 continuous distribution, 306

continuous random variables, 94, 115, 117–120, 306 control charts, 33 Control phase, tools/techniques associated, xxi correction factors, 140, 160, 168, 306 correlation coefficients, 40, 306 critical points, 204, 306 critical regions, 204 cumulative distribution function (cdf), 97–98, 117 cumulative frequencies, 16–17, 306 cumulative frequency histograms, 32–33, 307 cumulative probabilities, 96 curves bell-shaped, 2 frequency distribution, 31–32 Ogive, 32–33, 308 operating characteristic, 206, 308 power, 309 CV (coefficients of variation), 52, 57, 306

D data before and after, 237–240 bimodal, 305 bivariate, 39, 41–44 categorical, 22–27 converting to information, 6–7 defined, 6 grouped, 20, 57–60, 307 hypothesis testing, 237–240 interval, 12–13, 307 nominal, 12, 308 numerical. See numerical data ordinal, 12–13, 308 paired, 237–240, 309 qualitative, 12–13, 15–18, 22–27, 309 quantitative. See quantitative data ratio, 12–13 sets of, 15, 306 skewed, 51, 52, 310 symmetric, 51, 310 types of, 11–13 ungrouped, 20 defects per million opportunities (DPMO), 1

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 335

Index

Define, Measure, Analyze, Improve, and Control (DMAIC), xvii, 5 Define phase, tools/techniques associated, xxi degrees of freedom, 148, 154, 306 density functions. See probability functions dependent events, 91, 306 descriptive statistics, 10, 15, 306 design of experiments (DOE), 306 deterministic experiments, 72 diagrams. See charts dichotomized populations, 107 discrete distributions, 306 discrete random variables, 93, 94, 97, 99–101, 306 discrete sample spaces, 74 dispersion, measures of, 2–3, 45, 52–57, 60, 64–65 distribution functions continuous random variables, 117 cumulative, 97–98, 117 frequency, 153 distributions Bernoulli, 102 bimodal, 305 binomial. See binomial distributions calculating in MINITAB, 269–272 chi-square, 148–152, 270, 305, 320–321 continuous, 306 discrete, 306 exponential, 129–132, 270, 307 F-, 270, 307 hypergeometric, 107–110, 307 normal. See normal distributions Poisson, 110–114, 270, 309 probability, 95–96 rectangular distributions, 118 of sample mean, 140 sampling. See sampling distributions shapes of, 51–52, 67 skewed/symmetric, 67 Snedecor’s F-, 155–158 Students t-, 230 tables, 15–20, 34–37, 39, 307 uniform, 118–120 Weibull, 132–135 Z, 311

335

DMAIC (Define, Measure, Analyze, Improve, and Control), xvii, 5 DOE (design of experiments), 306 dot plots, 20–21, 39, 262–264 DPMO (defects per million opportunities), 1

E empirical rule, 60–63, 66, 70, 307 equally likely events, 307 errors of estimation, 168, 192, 305, 307 in hypothesis testing, 204–205, 212 margin of, 168, 192 mean square, 308 of point estimation, 168, 192, 305, 307 standard, 140, 310 type I, 205, 212, 310 type II, 205, 212, 310 estimators, 137, 307. See also point estimation events defined, 74, 307 dependent, 91, 306 equally likely, 307 independent, 89–90, 307 mutually exclusive, 83, 90, 308 null, 75, 79 of random experiments, 73–75 rare, 110 representations of, 75–77 simple, 73, 75, 309 sure, 75, 80, 310 expected frequencies, 307 expected values, 99, 307 experiments defined, 307 deterministic, 72 random. See random experiments exponential distributions, 129–132, 270, 307 exponential models, 131–132 extreme values, 48, 66, 67, 308

F F critical values table, 323–330 F-distributions, 270, 307

Marketing_Gupta.qxd

336

1/28/05

1:36 PM

Page 336

Index

failure rate function, 132–133 finite correction factors, 168 finite populations, 11, 140 first quartile, 307 flow chart, Six Sigma implementation, 4 freedom, degrees of, 148, 154, 306 frequencies, class, 306 frequencies, cumulative, 16–17, 306 frequencies, expected, 307 frequencies, relative, 16–17, 83–86, 309 frequency distribution curves, 31–32 frequency distribution functions, 153 frequency distributions. See distributions frequency histograms, 27–30, 32–33, 34, 307 frequency polygons, 27, 30–31, 33, 307

G glossary, 305–311 Gosset, W. S., 153 graphical representations. See charts graphs. See charts green belts, responsibilities of, xvii, 5–6 grouped data, 20, 57–60, 307 Gupta, Bhisham C., 333

H hazard rate function, 132–133 histograms, 27–34, 35–37, 260–262, 288–289, 307 homogeneity, test of, 310 hypergeometric distributions, 107–110, 307 hypotheses, types of, 202–203, 209, 305, 308. See also hypothesis testing hypothesis testing before and after data, 237–240 confidence intervals, 250–254, 275–276 errors in, 204–205, 212 general concepts, 203–208 in JMP, 295–298, 300–301 large samples, 208–222, 240–244, 252, 273–274, 295–297 in MINITAB, 273–282 normal population, 238–240, 253, 254 one population mean, 208–216, 223–229, 238–240, 250–252, 295–298 one population proportion, 240–242

one population variance, 244–247 paired t-test, 237–240 probability model for, 201–202 purpose, 201–202 small samples, 223–237, 250–254, 274–275, 296–298 steps in, 207–208 two population means, 216–222, 229–237, 276–280 two population proportions, 242–244, 276–280 two population variances, 247–249, 280–282, 300–301

I Improve phase, tools/techniques associated, xxi independence, test of, 310 independent events, 89–90, 307 independent samples, 307 inertia, moments of, 101 inferential statistics, 10, 307 infinite populations, 11 information, 6, 7 inter-quartile range (IQR), 52, 64–65, 308 intersection operations, 80–83, 307 interval data, 12–13, 307 interval estimation, 165, 171–172, 192–195, 307. See also hypothesis testing; point estimation IQR (inter-quartile range), 52, 64–65, 308

J JMP basic functions, 284–286 calculating statistics, 286–287 displaying bar charts, 292–294 displaying box-whisker plots, 290–291 displaying graphical summaries, 291–292 displaying histograms, 288–289 displaying pie charts, 294–295 displaying stem and leaf diagrams, 289–290 hypothesis testing, 295–298, 300–301 normality testing, 301–302 paired t-test, 298–300

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 337

Index

L LCL (lower confidence limits), 171–172 left skewed data, 51 left skewed distributions, 67 level of significance, 205, 308 limits, class, 306 limits, confidence, 171–172 limits, specification, 2–3 line graphs, 33–34, 39 location, measures of, 2, 3, 63–64 lower confidence limits (LCL), 171–172 lower fences, 308 lower-tail hypotheses, 209

M MAIC (Measure, Analyze, Improve, and Control), 5 margin of error, 168, 192 marginal probability, 308 marks, class, 20 mean square error (MSE), 308 means arithmetic, 305 Bernoulli distributions, 102 binomial distributions, 106–107 continuous random variables, 120 defined, 308 discrete random variables, 99–101 exponential distributions, 130 generally, 46–48, 51 for grouped data, 58 hypergeometric distributions, 110 Poisson distributions, 114 population, 138–141 sample, 138–141 uniform distributions, 120 Weibull distributions, 133 weighted, 311 Measure, Analyze, Improve, and Control (MAIC), 5 Measure phase, tools/techniques associated, xxi measures of association, 39–44 measures of centrality defined, 45–46, 308 for grouped data, 57–59

337

limitations of, 52 means. See means medians, 48–50, 51, 58–59, 308 modes, 50–51, 59, 308 measures of dispersion, 2–3, 45, 52–57, 60, 64–65 measures of location, 2, 3, 63–64 measures of variability, 308 medians, 48–50, 51, 58–59, 308 memory-less properties, 130–131 midpoints, class, 20, 306 MINITAB calculating distributions, 269–272 calculating statistics, 258–260 displaying bar charts, 267–269 displaying box-whisker plots, 264–266 displaying dot plots, 262–263, 264 displaying graphical summaries, 266–267 displaying graphs, generally, 260 displaying histograms, 260–262 displaying pie charts, 268–270 displaying scatter plots, 263–264, 265 general use, 255–258 hypothesis testing about population mean and proportion, 273–276 hypothesis testing about two population means and proportions, 276–280 hypothesis testing about two population variances, 280–282 normality testing, 282–283 paired t-test, 278–279 modes, 50–51, 59, 308 moments of inertia, 101 Motorola definition of Six Sigma, 3 MSE (mean square error), 308 multiplication rule, 77, 90 mutually exclusive events, 83, 90, 308

N nominal data, 12, 308 nonconditional probability, 89–90 nonparametric statistics, 308 normal distributions calculating in MINITAB, 270–271 chi-square distributions and, 148 defined, 121, 123, 308, 310 empirical rule, 60–63

Marketing_Gupta.qxd

338

1/28/05

1:36 PM

Page 338

Index

examples, 124–129 generally, 121–124 standard deviation and, 2 Students t-distribution and, 153 tables, 123–124, 319 normality testing JMP, 301–302 MINITAB, 282–283 null events, 75, 79 null hypotheses, 202, 308 numerical data graphical representations, 20–21, 27–44, 66–70 interval estimation and, 171 measures of, 52 point estimation and, 166 numerical measures, 45. See also measures of centrality; measures of dispersion

O observations, 308 observed level of significance, 205, 308 OC (operating characteristic) curves, 206, 308 Ogive curves, 32–33, 308 one-tail alternatives, 203 one-tail tests, 308 operating characteristic (OC) curves, 206, 308 opportunities for defects, 1 ordered stem and leaf diagrams, 36. See also stem and leaf diagrams ordinal data, 12–13, 308 outliers, 48, 66, 67, 308

P p-values, 210–211, 309 paired data, 237–240, 309 paired t-test, 237–240, 278–279, 298–300 parameters, 45, 137, 165, 309 Pareto charts, 24 Pearson correlation coefficients, 40 Pearson, Karl, 40 percentiles, 63–64, 309 perfect associations, 41 permutations, 77–78 pie charts, 22–23, 268–270, 294–295, 309 pivotal quantities, 172–173

point binomial distributions, 102 point estimation. See also hypothesis testing; interval estimation bias in, 167–170, 310 defined, 165 description, 166–169 errors of, 168, 192, 305, 307 examples, 169–171 variances of, 167, 169–170 point values, 309 Poisson approximation to binomial distribution, 111–112, 158–159 Poisson distributions, 110–114, 270, 309 Poisson probability tables, 114, 317–318 Poisson process, 111, 131–132 population means confidence intervals for large samples, 180–183 confidence intervals for small samples, 183–187 differences between, 216–222 sample mean and, 138–141 population proportions confidence intervals, 187–192 difference between two, 242–244 estimating unknown, 195 hypothesis testing and, 240–244, 273–280 population variances confidence intervals, 195–200 formula for, 54 for grouped data, 60 hypothesis testing with known, 208–213, 216–219, 223–226, 229–232, 250–252 hypothesis testing with one, 244–247 hypothesis testing with two, 247–249, 280–282, 300–301 hypothesis testing with unknown, 213–216, 219–222, 226–230, 232–237 unknown, 193–194, 195 populations defined, 10–11, 309 types of, 11, 107, 140, 147 power curve, defined, 309 power, defined, 309 power of the test, 205 probability absolute, 89–90 axiomatic approach, 86–88

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 339

Index

339

conditional, 88–91, 306 defined, 71, 72 defining by relative frequency, 83–86 marginal, 308 nonconditional, 89–90 random experiments, 72–73 statistics and, 72 theoretical, 85 probability distributions. See distributions probability functions continuous random variables, 115 exponential distributions, 129–131 formula for, 95–96 graphical representations, 97 Poisson distributions, 111 Snedecor’s F-distributions, 156–157 probability tables, Poisson, 114, 317–318 problem-solving methodologies, xix process, defined, 309 process improvement using Six Sigma, 3–5

defined, 93, 309 discrete, 93, 94, 97, 99–101, 306 standard normal, 122–123 types, 93, 115 range spaces, 95 ranges, 52, 53, 309 ranges, interquartile, 52, 64–65 rare events, 110 ratio data, 12–13 rectangular distributions, 118 rejection regions, 204, 205, 309 relative frequencies, 16–17, 83–86, 309 relative frequency approach, 85 relative frequency histograms, 27–30, 307 relative frequency polygons, 30–31 replacement, sampling and, 107 research hypotheses, 202 right skewed data, 52 right skewed distributions, 67 root cause analysis, xix

Q

S

qualitative data defined, 12–13, 309 frequency distribution tables, 15–18 graphical representations, 22–27 quality control, defined, 309 quantitative data defined, 12–13, 309 frequency distribution tables, 18–20 graphical representations, 20–21, 27–33, 34–44, 66–70 interval estimation and, 171 measures of, 52 point estimation and, 166 quartiles, 64–65, 307, 309–310

sample mean, probability distributions of, 140 sample points, 73, 77 sample sizes, determining, 192–195 sample spaces, 73–77, 79–83, 309 sample statistics, 309 sample surveys, 309 sample variances, 54, 60 sampled populations, 11 samples defined, 11, 309 independent, 307 replacement and, 107 sampling distributions. See also central limit theorem defined, 309 generally, 137 of sample mean, 138–141 of sample proportion, 147–148 Students t-distribution, 153–155 SAS software, 255 scatter plots, 21, 39–44, 263–264, 265 second quartile, 309 Set Theory, 80–83 significance, level of, 205, 308 simple events, 73, 75, 309

R random experiments defined, 307 events of, 73–75 probability and, 72–73 random samples, 11, 309 random variables Bernoulli, 102–103 continuous, 94, 115, 117–120, 306

Marketing_Gupta.qxd

340

1/28/05

1:36 PM

Page 340

Index

simple hypotheses, 203 single-valued frequency distribution tables, 18 Six Sigma defined, 1, 310 implementation flow chart, 4 methodology, xix, 3–5 Motorola definition, 3 statistical concept, 1–3 steps in, xvii tools/techniques, xxi skewed data, 310 Snedecor’s F-distribution, 155–158 software for statistical analysis, 255, 303. See also JMP; MINITAB specification limits, 2–3 SPSS software, 255 standard deviations Bernoulli distributions, 102 binomial distributions, 106–107 continuous random variables, 120 defined, 310 discrete random variables, 99–101 exponential distributions, 130 generally, 55–56 for grouped data, 60 hypergeometric distributions, 110 Poisson distributions, 114 uniform distributions, 120 standard error, 140, 310 standard normal distributions. See normal distributions standard normal random variables, 122–123 statistical tools, 255, 303. See also JMP; MINITAB statistics calculating in JMP, 286–287 calculating in MINITAB, 258–260 defined, 9, 45, 137 descriptive, 10, 15, 306 goals of, 165 inferential, 10, 307 nonparametric, 308 probability and, 72 sample, 309 Statpages.net, 303 Statsoftinc.com, 303 stem and leaf diagrams, 27, 34–39, 289–290, 310

Students t-distribution, 153–155, 180, 226, 230, 310 Sturge’s formula, 19 sure events, 75, 80, 310 surveys, sample, 309 symmetric data, 51, 310 symmetric distributions, 67 SYSTAT software, 255

T t critical values table, 322 t-distributions, 153–155, 180, 226, 230 t-test, paired, 237–240, 278–279, 298–300 tables binomial probability, 105–106, 159, 314–316 chi-square distribution, 149–150, 320–321 contingency, 306 F critical value, 323–330 frequency distribution, 15–20, 34–37, 39, 307 normal distribution, 123–124, 319 Poisson probability, 114, 317–318 Snedecor’s F-distribution, 157 Students t-distribution, 154–155 t critical values, 322 target populations, 11 test statistic, 310 testing statistical hypotheses, 202 tests, types of, 310 theoretical probability, 85 third quartile, 310 time series graphs, 33–34 tree diagrams, 75–77 two-tail alternatives, 203 two-tail hypotheses, 209 two-tail tests, 310 type I error, 205, 212, 310 type II error, 205, 212, 310

U UCL (upper confidence limits), 171–172 unbiased estimators, 167–170, 310 ungrouped data, 20 uniform distributions, 118–120, 270, 310 union operations, 80–83

Marketing_Gupta.qxd

1/28/05

1:36 PM

Page 341

Index

upper confidence limits (UCL), 171–172 upper fences, 310 upper-tail hypotheses, 209 values chi-square, 320–321 expected, 99, 307 extreme, 48, 66, 67, 308 F critical, 323–330 p-, 210–211, 309 point, 309 t critical, 322 variability, measures of, 308 variables defined, 310 in frequency distribution tables, 16 variances defined, 310 generally, 52, 53–55, 56, 60 of point estimators, 167, 169–170

population. See population variances sample, 54, 60 Weibull distributions, 133 variation within a process, 3 Venn diagrams, 79–83, 310

W Walker, H. Fred, 333–334 web-based statistical tools, 303 Weibull distributions, 132–135 weighted means, 311 width, class, 306

Z Z distributions, 311 z-scores, 123, 311

341

Applied statistics for the six sigma green belt

Published on Feb 6, 2014

a set of statistical tools inside the six sigma improvement framework

Advertisement