This article was downloaded by: [University of Isfahan ] On: 19 February 2013, At: 10:34 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Technometrics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/utch20

A General Strategy for Analyzing Data From Split-Plot and Multistratum Experimental Designs Peter Goos

a b

& Steven G. Gilmour

c

a

Faculty of Applied Economics & StatUa Center for Statistics, Universiteit Antwerpen, 2000 Antwerpen, Belgium b

Erasmus School of Economics, Erasmus Universiteit Rotterdam, 3000 DR Rotterdam, The Netherlands c

Southampton Statistical Sciences Research Institute, University of Southampton, Southampton, SO17 1BJ, United Kingdom Accepted author version posted online: 29 May 2012.Version of record first published: 28 Nov 2012.

To cite this article: Peter Goos & Steven G. Gilmour (2012): A General Strategy for Analyzing Data From Split-Plot and Multistratum Experimental Designs, Technometrics, 54:4, 340-354 To link to this article: http://dx.doi.org/10.1080/00401706.2012.694777

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Supplementary materials for this article are available online. Please go to http://www.tandfonline.com/r/TECH

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

A General Strategy for Analyzing Data From Split-Plot and Multistratum Experimental Designs Peter Goos

Steven G. Gilmour

Faculty of Applied Economics & StatUa Center for Statistics Universiteit Antwerpen 2000 Antwerpen Belgium (peter.goos@ua.ac.be) and Erasmus School of Economics Erasmus Universiteit Rotterdam 3000 DR Rotterdam The Netherlands

Southampton Statistical Sciences Research Institute University of Southampton Southampton SO17 1BJ United Kingdom (s.gilmour@soton.ac.uk )

Increasingly, industrial experiments use multistratum designs, such as split-plot and strip-plot designs. Often, these experiments span more than one processing stage. The challenge is to identify an appropriate multistratum design, along with an appropriate statistical model. In this article, we introduce Hasse diagrams in the response surface context as a tool to visualize the unit structure of the experimental design, the randomization and sampling approaches used, the stratum in which each experimental factor is applied, and the degrees of freedom available in each stratum to estimate main effects, interactions, and variance components. We illustrate their use on several responses measured in a large study of the adhesion properties of coatings to polypropylene. We discuss quantitative, binary, and ordered categorical responses, for designs ranging from a simple split-plot to a strip-plot that involves repeated measurements of the response. The datasets discussed in this article are available online as supplementary materials, along with sample SAS programs. KEY WORDS:

1.

Binary data; Cumulative logit regression; Generalized linear mixed model; Hasse diagram; Lifetime data; Ordered categorical data; Poisson regression; Separation problem.

INTRODUCTION

Factorial designs make a major contribution to industrial research, and it is increasingly recognized that many, if not most, industrial experiments have some factors that are hard to set (often called “hard-to-change,” although in a completely randomized design, they should be reset between runs even if the factor level does not change). Hard-to-set factors lead naturally to multistratum designs, with different factors applied in different strata through restricted randomization, as in split-plot designs. This has been an area of much research in the last 15 years; for example, Letsinger, Myers, and Lentner (1996), Mee and Bates (1998), Trinca and Gilmour (2001), Goos (2002), Gilmour and Trinca (2003), Jones and Goos (2009), Paniagua-Qui˜nones and Box (2009), Vivacqua and Bisgaard (2009), and Arnouts, Goos, and Jones (2010)—see also Goos and Jones (2011, chaps. 10 and 11). Modeling data from multistratum designs requires careful thought about many different aspects. In this article, we propose a general strategy to model data from multistratum designs. First, we provide the tools to describe multistratum designs’ unit structure adequately, as well as their treatment structure. We show how combining this information suggests a skeleton

analysis of variance and a linear predictor for the model. Second, we describe how to choose a response model and a link function that, with the linear predictor, define a generalized linear mixed model (GLMM), or, if appropriate, a nonlinear mixed model. Generalized linear models (GLMs) have become increasingly accepted as the standard analyses for categorical response data (see, e.g., Myers, Montgomery, and Vining 2002; and Lee and Nelder 2003). GLMs with categorical responses have been successfully used in practice, for example, using the failure amplification method (Joseph and Wu 2004) to find optimum manufacturing settings for printed circuit boards (Jeng, Joseph, and Wu 2008) and for optimizing the synthesis of cadmium selenide nanostructures (Dasgupta et al. 2008). The presence of random effects in the linear predictor implies the use of GLMMs. Our suggested strategy for analyzing data from multistratum designs is outlined in Figure 1, the different steps of which we describe in detail in Section 2. Each step involves challenging aspects, some of which were discussed by Robinson, Myers, and

340

© 2012 American Statistical Association and the American Society for Quality TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4 DOI: 10.1080/00401706.2012.694777

DATA ANALYSIS FROM SPLIT-PLOT AND MULTISTRATUM DESIGNS

2.

UNIT STRUCTURE

Randomization

Sampling

TREATMENT STRUCTURE

Treatment Design

Unit Structure (Hasse Diagrams)

A Priori Treatment Model (Fixed Effects)

Random Effects

RESPONSE STRUCTURE

Nature of Measurements or Observations

OUTLINE ANALYSIS

Skeleton Analysis of Variance

Response Model

Linear Predictor

Link Function

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

MODELING

Initial Mixed InitialM ixed Model

Model Selection

Final Model

341

A GENERAL ANALYSIS STRATEGY

When planning an industrial multistratum experiment or modeling data from such an experiment, the first aspects to consider are the design’s unit structure and treatment structure. In the general analysis strategy we propose in Figure 1, these two important aspects are shown in the boxes labeled “Unit Structure” and “Treatment Structure.” The treatment design in industrial experiments typically has a multifactorial structure, such as a full or fractional factorial design, a response surface design, or a mixture design. Typically, the design is chosen to efficiently fit some particular model, which we refer to as the a priori model in Figure 1. This a priori model defines the fixed effects in our mixed model. A substantial portion of the literature on the design and analysis of industrial experiments has been concerned with the choice of treatment designs and the analysis of the resulting data for given unit structures, with most of the attention devoted to completely randomized designs, followed by block designs and split-plot designs. In contrast, the choice of a unit structure or strategies for modeling data from complicated multistratum experiments is a topic that has received much less attention. We propose the use of Hasse diagrams to visualize and gain understanding of the unit structure of multistratum designs.

Figure 1. General analysis strategy.

Montgomery (2004) and Robinson et al. (2006). In this article, we focus on that aspect which, in a response surface context, has received the least attention, namely the determination of the unit structure and the definition of the random effects to be used in the model. We propose Hasse diagrams as a tool to visualize the structure in the experimental units and any sampling done within the experimental units. So far, Hasse diagrams have been used only for continuous responses and orthogonal designs. We demonstrate that they are also useful in a response surface context with other responses and with nonorthogonal designs. To the best of our knowledge, our work is the first detailed account of how to build general response surface models for a broad range of response types with general multistratum design structures. The article is organized as follows. In Section 2, we discuss the general analysis strategy and describe how to construct Hasse diagrams in general. In Section 3, we describe the polypropylene experiment that motivated the work in this article and which we use to illustrate the general analysis strategy. In Sections 4 through 7, we apply the Hasse diagrams to the polypropylene experiment, starting with simple unit structures and ending with the most complicated case. In Section 4, we consider a surface tension response and a lifetime response obtained from a split-plot experiment. In Sections 5 and 6, we study a binary response as well as an ordinal categorical response from a splitplot experiment, involving repeated measurements within each experimental unit. This adds an extra level of complication to the Hasse diagram. Finally, in Section 7, we discuss the analysis of a strip-plot type of experiment involving an ordered categorical response, again with repeated measurements. Finally, we draw some conclusions and discuss design issues with the polypropylene experiment in Section 8.

2.1

Constructing Hasse Diagrams

The uses of Hasse diagrams for identifying structure in experiments to determine appropriate analysis of variance models were described by Taylor and Hilton (1981) and Lohr (1995). However, following Bailey (2007, chap. 10), we prefer to use separate Hasse diagrams to identify the structure in the experimental units, which provides insights into the random effects needed in the model, and to identify the treatment structure, which tells us what fixed effects should be included in the model. In response surface experiments, the treatment structure is usually clear, so we concentrate on the construction of Hasse diagrams to visualize the structure in the experimental units. The Hasse diagram is very convenient for calculating degrees of freedom (df) in each stratum. The structure in the experimental units is determined by the randomization. The experimental units to be used and perhaps one or more blocking factors are randomized by randomly allocating labels to the appropriate units, for example, by randomly assigning the labels 1, . . . , b to b blocks. If a blocking factor has treatments applied to it, we assume that the design has been produced with specific treatments attached to each block label so that the randomization automatically randomizes the treatments to the units. For example, if the main-plot design has been written down with specific treatment combinations attached to each label 1, . . . , b, then randomly assigning these labels to the main plots achieves the appropriate randomization. A Hasse diagram is a simple graph, with nodes representing blocking factors and edges representing nesting relationships between blocking factors. Each level of randomization, and any sampling that is done within experimental units, determines a node in the Hasse diagram. The rules for constructing Hasse diagrams are: TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

342

PETER GOOS AND STEVEN G. GILMOUR

Figure 2. Hasse diagrams for five commonly used unit structures.

• if blocking factor B is nested within blocking factor A, the node for B appears below the node for A, with an edge between the two nodes; • if blocking factors C and D are crossed, their nodes appear at the same level and are both connected to their supremum (i.e., the node that corresponds to the finest grouping in which both C and D are nested), which appears above them, and their infimum (i.e., the node that corresponds to the coarsest grouping that is nested within both C and D, or, equivalently, the combinations of their levels), which appears below them; • a node called the universe, U, representing the entire experiment, appears above all blocking factors; and • the node corresponding to the observational units appears below all other blocking factors (i.e., the lowest stratum generally represents the observational units). The Hasse diagrams have two numbers next to every node: the number of levels of the corresponding blocking factor and (in brackets) the corresponding degrees of freedom, obtained by subtracting the degrees of freedom for higher factors from the number of levels of the factor under consideration. Hasse diagrams for several standard designs can help to illustrate the idea. In a completely randomized design with 16 runs, the only randomization is of run labels to runs, so the Hasse diagram looks as shown in Figure 2(a), with only the entire experiment and runs as nodes and an edge between them showing that runs are nested within the entire experiment. In a randomized block design, blocks are nested within the entire experiment and runs are nested within blocks, producing TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

a Hasse diagram like that shown in Figure 2(b), which is for a design with six blocks, each of four runs. A split-plot design with 12 whole plots, each containing three runs, produces a Hasse diagram as shown in Figure 2(c). This diagram has the same structure as that for the randomized block design, which illustrates the fact that a split-plot design can be considered as a randomized block design with main effects confounded with blocks (Mead, Gilmour, and Mead 2012, chap. 18). A row–column design has two crossed blocking factors and so produces a Hasse diagram like that shown in Figure 2(d), which is for a 6 × 9 row–column structure. The rows and columns are both nested within the entire experiment, while the runs (row×column combinations) are nested within rows and within columns. However, rows are not nested within columns and columns are not nested within rows. Finally, a split–split-plot design, with eight whole plots, each containing four subplots, each of which contains four runs, produces a Hasse diagram as shown in Figure 2(e). Each node in the Hasse diagram represents a stratum in the analysis. Except for the universe and the observational units, each stratum implies a set of random effects in the model. The universe is represented by a fixed intercept parameter, and the variation in the observational units is accounted for by the distributional assumption made.

2.2 Skeleton Analysis of Variance The nodes in the Hasse diagram and the corresponding degrees of freedom serve as input for an approximate skeleton

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

DATA ANALYSIS FROM SPLIT-PLOT AND MULTISTRATUM DESIGNS

analysis of variance (see the box labeled “Outline Analysis” in Figure 1) and for writing down the random-effects part of the mixed model. We call the skeleton analysis of variance “approximate” because the exact degrees of freedom depend on the orthogonality of the design for the treatment factors to the blocking factors, and because some of the models we discuss are nonlinear mixed models. In these cases, the degrees of freedom differ to some extent from the approximate skeleton analysis of variance. The (linear and quadratic) main effects of any given treatment factor appear in the stratum defined by the blocking factor within which that treatment factor is randomized. Interactions appear in the stratum defined by the infimum of the blocking factors within which the parent treatment factors are randomized. The next step is to investigate how many degrees of freedom in each stratum are used for estimating fixed effects, and how many are left for estimating the variance components in that stratum. Therefore, the approximate skeleton analysis of variance allows us to check whether or not there are enough degrees of freedom to estimate some effect or variance component and to perform hypothesis tests. After the skeleton analysis of variance has been drafted, the next step is to write down the mixed model η = β0 1n + Xβ + Zγ , where β0 represents the intercept, 1n is an n × 1 vector of ones, n is the total number of observational units, β contains the fixed effects of the treatment factors, and γ contains the random effects corresponding to the different strata of the design. This mixed model, obtained as a direct result of the randomization and usually termed the “linear predictor,” is then appropriate for the analysis, as best justified by the randomization. 2.3 Other Issues Consideration of the nature of the response variable of interest usually leads to an obvious choice of distribution for the model, as in GLMs. Thus, for example, dichotomous data suggest a Bernoulli distribution, counts with no upper limit suggest a Poisson distribution, and polytomous data suggest a multinomial distribution. Typically, we would initially use the canonical link function (e.g., the logit link function for dichotomous data, or the log link function for count data) unless specific knowledge about how the responses arose suggests we should do otherwise. For polytomous data, we further have to consider the relationship between the linear predictors for the different response categories, for example, whether or not to assume proportional odds. In our general analysis strategy in Figure 1, the choice of a distribution for the response and the link function is shown in the box labeled “Response Structure.” The assumed distribution and link function, together with the linear predictor developed, give us our initial mixed model, as shown in the box labeled “Modeling” in Figure 1. Now, we are finally ready to fit a model to our data. Unless the data are very rich, model selection will be an important but difficult issue. It is, for instance, very common to find that one or more variance components corresponding to the higher strata are estimated to be zero, and sometimes there can be convergence problems. Dealing with these to find a model that gives a simple description of the data is not entirely a prescriptive process. Knowledge

343

of the precise objectives of the experiment and of the system is important, as well as of the standard tools of modeling, including fitting treatment models of different orders, backward elimination or forward selection starting from some specific model, merging categories in the response, changing the link function and changing the assumed distribution. The question of how to deal with variance components that are estimated to be zero was discussed by Gilmour and Goos (2009). Sometimes, a model can be obtained in which some treatment effect seems to be highly significant, but, with variance components estimated to be zero, this can be misleading. It certainly cannot be taken as strong evidence that such an effect is active. We recommend trying to drop such effects and considering the overall quality of the resulting model. Sometimes, dropping some fairly highly significant effect can lead to a variance component becoming estimable, which in turn can make some previously highly significant effects nonsignificant. Therefore, we would recommend performing backward elimination beyond the level that would usually be done, for example, to a significance level even lower than 1%, at least as an exploratory tool. If a plausible model can be found, which allows estimation of all variance components, then we would have confidence that such a model contains truly active effects. Of course, other effects can be declared possibly active if their significance depends on which random effects are in the model. In the next section, we describe an experiment, which we will use in the remainder of this article to illustrate the general analysis strategy. 3.

THE POLYPROPYLENE EXPERIMENT

In 2004 and 2005, four Belgian companies ran an experiment to investigate the impact of several additives and a gas plasma surface treatment on the adhesive properties of polypropylene. The experiment was of interest to car manufacturers who increasingly use polypropylene because it is inexpensive and light, and because it can be recycled. The experiment was financially supported by Flanders’ Drive, a technological platform that stimulates innovation in the automotive industry in Flanders. An undesirable property of polypropylene is that glues and coatings do not adhere well to its surface unless it undergoes a surface treatment, such as a gas plasma treatment. The goal of the experiment was to search for economical plasma treatments that lead to a good adhesion for various kinds of coatings. Because polypropylene is often compounded with additives to tailor the plastic to a specific end-use, the effects of several additives in the polypropylene were studied, as well as several plasma treatment factors. Seven additives, coded as X 1 –X 7 , were included in the experiment, each at two levels: ethylene propylene diene monomer (EPDM) rubber, ethylene copolymer, talcum, mica, lubricant, UV stabilizer, and ethylene vinyl acetate (EVA). Four plasma treatment factors, coded as X 8 –X 11 , were included, each at three levels: power, gas flow rate, processing time, and type of gas used. The levels and units used for each of these 11 factors are presented in Table 1. The polypropylene experiment is essentially a mixtureprocess variable experiment. A slack-variable approach (e.g., see Cornell 2002, chap. 6) was used for designing the mixture TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

344

PETER GOOS AND STEVEN G. GILMOUR

Table 1. Levels of factors studied in the polypropylene experiment Levels Factor

Units

EPDM (X1 ) Ethylene (X2 ) Talcum (X3 ) Mica (X4 ) Lubricant (X5 ) UV stabilizer (X6 ) EVA (X7 )

% % % % % % %

−1

0

0 0 0 0 0 0 0

1 10 15 20 20 1.5 0.8 1.5

Power (X8 ) Watts 500 1000 2000 sccm 1000 1500 2000 Gas flow rate (X9 ) 2 8 15 Processing time (X10 ) min Etching Activation 1 Activation 2 Type of gas (X11 )

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

NOTE:

sccm = standard cubic centimeters per minute.

part of the experiment and for modeling the data. The slack variable in the polypropylene experiment was the polypropylene, whose proportion in the experiment was between 51.2% and 100%, depending on what additives were compounded with it. The entire polypropylene experiment involves a complicated randomization, but it is based on a D-optimal split-plot design for a linear model including the main effects of the seven additives, the six two-factor interactions involving EPDM and each of the other additives, the main effects of the gas type, the flow rate, the power and the processing time, all two-factor interactions of these four factors, the quadratic effects of the flow rate, the power and the processing time, and all two-factor interactions between the seven additives and the four plasma treatment factors. Of all the two-factor interactions among the additives, only those involving EPDM were a priori believed to be important. The model of interest included 66 terms. The sequential construction of the D-optimal split-plot design, which was based on a linear mixed model because the primary response was continuous, is discussed in Jones and Goos (2007). The complicated randomization is due to the fact that the complete experiment was carried out in several stages: 1. First, 20 batches of polypropylene plates were produced according to the whole-plot design for the seven additives. Each of the batches contains several dozen polypropylene plates with the same settings for the seven additives. Each of the plates was stored individually in identical conditions. For each of the following stages, the appropriate number of plates was removed from storage immediately prior to the further processing. 2. Next, three to seven samples were selected from each of the 20 batches and processed according to the subplot design. Although no formal randomization took place, the selection was essentially random. The subplot design consisted of 100 gas plasma treatments applied in 100 independent oven runs, in a random order. After each of the 100 oven runs, the surface tension and the lifetime of the treated sample were measured. This stage of the experiment thus involves a continuous response, surface tension, and a lifetime response measuring the number of days for which the treated sample has a desirable surface tension, TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

and a split-plot design with seven whole-plot factors and four subplot factors. 3. At a later point in time, three to seven sets of three new samples were randomly selected from each of the 20 batches. The three samples in each set were processed together in one oven run, using one gas plasma treatment from the subplot design. Each set of three samples was processed in a separate run of the oven. For logistical reasons, the order in which the gas plasma treatments were applied to the sets of samples was the same as in Stage 2. A fixed number of days after the gas plasma treatment, coating 1 was applied to each of the three samples in a set. A six-level categorical response, related to the success of the coating’s adhesion to the plastic, was measured as soon as the coating was dry. This was done using a cross-cut test, which involved carving a regular grid on the plates, applying tape, and pulling it off. The adhesion was then assessed visually using the American Standard Test Method (ASTM) score, a six-point scale (0–5) considered the standard for this type of assessment. The coating was considered acceptable if it resulted in an ASTM score of at least 3. 4. Stage 3 was repeated on four more occasions for four additional types of coating. The sequence of events is illustrated in Figure 3. A summary of the ASTM scores obtained in the polypropylene experiment is given in Table 2. For coating 3, two-thirds of the measurements resulted in an ASTM score of 5, whereas for coating 4, more than two-thirds of the measurements resulted in an ASTM score of 0. For coatings 1, 2, and 5, none of the outcome categories is so dominant. For coatings 4 and 5, there are small numbers of missing observations. From the polypropylene experiment, the investigators sought answers to many research questions, one of which was whether the effects of the different factors were substantially different between coatings. An answer to this question requires the combined analysis of the data from the five coatings involved in the experiment. The type of coating is then treated as a twelfth experimental factor, having five levels. In this case, the polypropylene experiment becomes a strip-plot type of experiment, with additional complications such as the three repeats, the fact that the order of the oven runs corresponding to the subplot design was not randomized separately for every coating, and the categorical nature of the responses. In all the analyses we conducted, we used two orthogonal contrasts to model the impact of the three-level categorical factor X11 . The first, labeled “Type of gas,” was constructed so that it Table 2. Distribution of the ASTM scores across the levels 0–5 for the five coatings used in the polypropylene experiment ASTM 0 1 2 3 4 5 Total

Coating 1

Coating 2

Coating 3

Coating 4

Coating 5

61 13 26 34 31 135 300

39 14 29 38 57 123 300

40 11 19 23 7 200 300

220 7 13 35 9 10 294

69 17 29 32 123 28 298

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

DATA ANALYSIS FROM SPLIT-PLOT AND MULTISTRATUM DESIGNS

345

Figure 3. Relationships between factors and stages in the polypropylene experiment.

quantifies the difference between the etching gas and the two activation gases. The second, labeled “Activation gas,” measures the extent to which the effect of the first activation gas is different from that of the second activation gas. 4.

SURFACE TENSION AND LIFETIME RESPONSES

From each oven run, the initial surface tension and a lifetime response were measured. As outlined in Section 3, the study of these two responses involved a split-plot design with seven whole-plot factors and four subplot factors. This study did not involve any repeated measurements. The Hasse diagram in Figure 4 shows the two strata of the split-plot design, that is, the runs nested within the batches, which themselves are nested within the universe U. Here, the observational units are identical to the experimental units, since each oven run produced only a single surface tension and lifetime observation. The Hasse diagram naturally leads to the approximate skeleton analysis of variance in Table 3, which clearly shows the allo-

cation of the degrees of freedom from the Hasse diagram to the treatment factors’ effects and to the residual variance. The approximate skeleton analysis of variance ignores the nonorthogonality of the effects of the treatment factors X8 , . . . , X11 to the batch blocking factor. The nonorthogonality implies that some of the residual degrees of freedom in the batches’ stratum will be taken up by interbatch information on the effects of X8 , . . . , X11 , leaving fewer degrees of freedom for estimating the variance component for batches. We find it beneficial to sketch such analyses to clarify the relationship between the randomization and the mixed model. In an orthogonal design and for a normally distributed response, this analysis of variance table would give exactly the same analysis as a linear mixed model analyzed using residual maximum likelihood (REML) and generalized least squares (GLS) (Letsinger, Myers, and Lentner 1996).

Table 3. Approximate skeleton analysis of variance for the surface tension and lifetime responses Stratum Mean Batch

Run

Figure 4. Hasse diagram for the surface tension and lifetime responses.

Source of variation

df

Total X1 , . . . , X7 X1 × X2 , . . . , X1 × X7 Residual Total 2 , X11 X8 , X82 , X9 , X92 , X10 , X10 X8 × X9 , . . . , X10 × X11 X1 × X8 , . . . , X7 × X11 Residual Total

1 7 6 6 19 8 9 35 28 80

TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

346

PETER GOOS AND STEVEN G. GILMOUR

The continuous surface tension response can be analyzed by fitting a linear mixed model, with random effects included for each stratum implied by the randomization. In this case, there are just two strata: one corresponding to the randomization of batches and one corresponding to the randomization of oven runs. The design used is an unbalanced split-plot design of the type analyzed by Letsinger, Myers, and Lentner (1996). Hence, the appropriate model has the form (1) Yij | δi ∼ N μi , σ2 , where Yij is the surface tension measured for the ith batch and jth oven run,

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

μi = β0 + xij β + δi , where xij represents the levels of the treatment factors for the ith batch (i = 1, . . . , 20) and jth oven run (j = 1, . . . , ni ), β0 is a fixed intercept, β is a vector of fixed parameters, δi is a random batch effect with δi ∼ N(0, σδ2 ), and all random variables are independent. It is common to estimate the random effects using REML and the fixed effects using GLS. We used the SAS MIXED procedure for analyzing the surface tension data. Because the lifetime response is a count variable, with many counts being equal or close to zero, it makes sense to assume a Poisson distribution. Therefore, we used the model Yij | δi ∼ Poisson(μi ),

(2)

where, this time, Yij is the lifetime measured for the ith batch and jth oven run and log μi = β0 + xij β + δi . This model is a GLMM involving a log link function, with a linear predictor that is identical in form to that for the normal linear mixed model (1). We estimated model (2) using the SAS procedure GLIMMIX. After a backward stepwise elimination procedure, which respected the marginality of the models, we obtained the models summarized in Table 4 for the surface tension and lifetime data. Some factors that have a significant impact on the surface tension do not have an impact on the lifetime, and vice versa. Most of the significant two-factor interaction effects in the final models involve the contrast “Type of gas” and are quite large. Nonsignificant main effects for talcum and lubricant are included due to marginality restrictions. For the final surface tension model, σδ2 and σ2 were estimated to be 2.7148 and 8.9420, respectively. For the final lifetime data model, σδ2 , the only variance component, was estimated to be 0.6087. The two estimates for σδ2 suggest that there is some batch-to-batch variation. Note that, in Table 4, Kenward–Roger degrees of freedom are used (for more details, see Kenward and Roger 1997), and that the degrees of freedom for the main effects of ethylene, EPDM, EVA, lubricant, and UV are substantially smaller than those for the main effects of power, flow, time, and type of gas, and for the two-factor interaction effects. This is in line with the approximate skeleton analysis of variance in Table 3, but not identical because the final model in Table 4 involves fewer effects compared with the model considered in Table 3 and because of the slight nonorthogonality of the design used. We also used the negative binomial distribution as an alternative to the Poisson distribution in model (2) and it turns out that TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

the point estimates as well as the p-values from that model are very similar to those of the Poisson model. The negative binomial model indicates that the data exhibit some overdispersion, which means that there is more variability in the response than is expected from the Poisson distribution. 5.

SUCCESS OF COATING

In addition to the 100 plates used for measuring surface tension and lifetime, another several hundred were processed using the plasma treatments, with the same design but on a separate occasion. Several plates were used for each oven run and three of these were randomly selected for coating. In this section, we discuss how to analyse the 300 observed binary responses (three for each of the 100 runs of the experiment) obtained using 1, if the ASTM score ≥ 3, Success of coating = 0, if the ASTM score ≤ 2. We do not recommend collapsing ordinal responses to binary ones, but it allows a simple description of the development of the linear predictor and allows us to contrast the results for a binary response with those for an ordered categorical response. Compared with the design used for the surface tension, there is now an additional complication, because three plates were coated and scored for each of the 100 oven runs. As a result, there are three observational units, often referred to as repeats, for every experimental unit. This leads to the Hasse diagram shown in Figure 5, where we call the experimental units runs and the observational units tests. The corresponding approximate skeleton analysis of variance is given in Table 5. This immediately shows us that, if the response were normally distributed, an appropriate linear mixed model would be (3) Yij k | δi , ij ∼ N μij , σφ2 , where Yij k is the response from the kth test, k = 1, 2, 3, on the jth oven run (j = 1, . . . , ni ) from the ith batch (i = 1, . . . , 20), μij = β0 + xij β + δi + ij , δi ∼ N (0, σδ2 ), ij ∼ N (0, σ2 ), and all random variables are independent. We deal with the fact that the success of coating is a binary response variable by assuming it has a Bernoulli distribution and using a logistic link to a linear predictor that has the same

Figure 5. Hasse diagram for the success of coating and the ASTM score.

DATA ANALYSIS FROM SPLIT-PLOT AND MULTISTRATUM DESIGNS

347

Table 4. LMM analysis of the surface tension and GLMM analysis of the lifetime response Surface Tension

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

Effect Intercept Ethylene EPDM EVA Lubricant Talcum UV Power Flow Time Type of gas Time × Type of gas Power × Type of gas Flow × Type of gas Power × Flow Power × Time Time × Talcum EVA × Type of gas Lubricant × Type of gas Talcum × Type of gas UV × Type of gas

Lifetime

Estimate

SE

df

t

p-value

11.1596 −2.0930

0.4820

15.4

−4.34

0.0005

1.2018 −0.3069 −0.4909

0.4838 0.4843 0.4968

15.6 15.6 16

2.48 −0.63 −0.99

0.0248 0.5355 0.3377

4.9873 −1.6234 2.0977 8.9690 3.0333 6.5682 −2.8431 −0.9124

0.3554 0.3633 0.3787 0.4609 0.5356 0.5497 0.5415 0.3992

72.3 75.2 72.9 73.3 76.9 78.2 74.6 74.7

14.03 −4.47 5.54 19.46 5.66 11.95 −5.25 −2.29

<0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.0251

−1.4290 1.7659 −1.8145

0.3755 0.4697 0.4611

73.2 73.3 72.7

−3.81 3.76 −3.94

0.0003 0.0003 0.0002

form as the normal distribution GLMM in (3). Thus, we have a GLMM for Yij k , the success (1) or failure (0) of the kth test on the jth observation from the ith batch. This binary logistic mixed model can be written as Yij k | δi , ij ∼ Bernoulli(πij ), where

πij log 1 − πij

(4)

= β0 + xij β + δi + ij ,

πij is the probability of success for the jth oven run from batch i, δi ∼ N(0, σδ2 ), ij ∼ N (0, σ2 ), and all random variables are independent. For each of the five coatings, we analyzed the success of coating using the binary logistic mixed model (4). The results for coatings 1, 2, and 5 are very similar. The simplified model we obtained for coating 2 is displayed in Table 6. The main

Estimate

SE

df

t

p-value

−0.4277

0.2155

10.7

−1.98

0.0734

0.1057 0.8254 1.3441

0.2396 0.2647 0.2330

15.8 29.2 89

0.44 3.12 5.77

0.6650 0.0041 < 0.0001

1.0414 2.2904

0.1519 0.2562

89 89

6.85 8.94

< 0.0001 < 0.0001

−0.6158

0.2213

89

−2.78

0.0066

−0.6440

0.1731

89

−3.72

0.0003

0.4400 −0.4722

0.1371 0.1953

89 89

3.21 −2.42

0.0018 0.0177

−1.1597

effects of the factors EPDM, ethylene, talcum, mica, power, and time are at least borderline significant, as well as the type of gas (etching gas vs. activation gas) and the type of activation gas (activation gas 1 vs. 2). We also found a significant interaction effect between power and the type of activation gas. The two variance components in the binary logit model (4), σδ2 and σ2 , are estimated to be 3.6670 and 1.1554, respectively. The positive estimate for σδ2 again indicates batch-to-batch variation, while that for σ2 confirms that the three observational units (named tests in Figure 5) within every experimental unit (named runs in Figure 5) are dependent. The Kenward–Roger degrees of freedom in Table 6 indicate that less information is available about the additives EPDM, ethylene, talcum, and mica than about the gas plasma treatment factors power, time, and type of gas. These degrees of freedom are comparable with those in the approximate skeleton analysis of variance in Table 5. Analyzing the data for the coatings 3 and 4 leads to several differences from the results obtained for coatings 1, 2, and 5.

Table 5. Approximate skeleton analysis of variance for the success of coating and the ASTM score Table 6. Binary logistic mixed model for the success of coating 2 Stratum

Source of variation

df Effect

Mean Batch

Run

Test

Total X1 , . . . , X7 X1 × X2 , . . . , X1 × X7 Residual Total 2 , X11 X8 , X82 , X9 , X92 , X10 , X10 X8 × X9 , . . . , X10 × X11 X1 × X8 , . . . , X7 × X11 Residual Total Total

1 7 6 6 19 8 9 35 28 80 200

Estimate

SE

df

t

p-value

Intercept 4.1593 EPDM 1.1961 0.6016 9.81 1.99 0.0754 Ethylene 1.5689 0.5882 10.25 2.67 0.0231 Talcum 2.1786 0.7443 11.31 2.93 0.0134 Mica 1.3322 0.6840 8.975 1.95 0.0834 Power −0.8050 0.3737 55.17 −2.15 0.0356 Time 2.6360 0.5009 91.44 5.26 <0.0001 Type of gas 2.9872 0.6892 71.99 4.33 <0.0001 Activation gas −1.4446 0.3792 55.10 −3.81 0.0004 Power × Activation gas 1.3652 0.4223 72.20 3.23 0.0018 TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

348

PETER GOOS AND STEVEN G. GILMOUR

For both coatings 3 and 4, σδ2 is estimated to be zero, while the estimate of σ2 was similar to that reported for coating 2. For coating 4, we were able to find very few significant effects. This is a logical consequence of the fact that 240 out of the 294 dichotomized responses were zero (see Table 2, where the ASTM score is less than 3 for 240 of the responses for coating 4), so there is very little variability in the response for that coating. So far, we have simply used the randomization to define the appropriate random effects and the nature of the responses to define appropriate distributions and link functions. In the skeleton analyses of variance produced, we have assumed that we are interested in estimating all relevant interactions, as well as the linear and quadratic main effects of each factor. This is a reasonable aim, but might be rather ambitious for discrete data, given the numbers of parameters involved. Estimating the binary logistic mixed model turns out to be a real challenge when interactions are included in the model. The inclusion of several interactions together often leads to convergence problems. A workable model selection strategy is a forward selection procedure, where a main-effects model is estimated first and interaction effects involving factors with significant main effects are added one by one. Even then, convergence problems are likely to occur when adding certain interactions. As pointed out by Chipman and Hamada (1996), severe problems with fitting GLMs for categorical data are quite common. In our data, we encountered many convergence problems for coatings 3 and 4. To a large extent, this is because there was a highly uneven distribution of the response over the two outcomes of the success of coating. Perhaps the most common convergence problem when estimating models for binary regression is the so-called “separation,” which was first signaled by Albert and Anderson (1984). Allison (2008) gave several small datasets to explain how complete and quasi-complete separation lead to the nonexistence of the maximum likelihood estimator. He also pointed out that the separation problem is quite common in the presence of two-level factors, and that separation problems can be detected by inspecting contingency tables formed by a categorical factor and the response variable. Whenever a cell in such a contingency table contains a zero, the maximum likelihood estimator of β does not exist. In the polypropylene experiment, the factors EPDM, ethylene copolymer, talcum, mica, lubricant, UV stabilizer, and EVA all have two levels, and we also have a categorical factor, the type of gas. Since adding the two-factor interaction effect of EPDM and type of gas caused the convergence to fail when analyzing the success of coating 2, we created the contingency table displayed in Table 7. The contingency table does indeed have a zero cell and suggests that using 10% of EPDM in combination with the etching gas guarantees good adhesion. Etching gas is, however, substantially more expensive than activation gases, so it is only used as a last resort. Otherwise, there would have been no point in modeling the dichotomized response for coating 2.

6.

ASTM SCORE

Although the experimenters considered an ASTM score of at least 3 acceptable, they were also interested in how to improve it further. We might also expect to get more information by analyzTECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

Table 7. Contingency table explaining why the interaction effect involving EPDM and type of gas causes the convergence to fail for the binary response “Success of coating” Success of coating EPDM

Type of gas

0

1

Total

0% 0% 0% 10% 10% 10%

Etching Activation 1 Activation 2 Etching Activation 1 Activation 2 Total

27 13 6 0 14 22 82

21 38 36 42 43 38 218

48 51 42 42 57 60 300

ing the actual ASTM scores. Since these are ordinal variables, a natural model is the extension of model (4) to a cumulative logit model—for example, see Agresti (2002, chaps. 7 and 12; 2010, chaps. 3 and 10). Since the design structure implied by the randomization is exactly as in Figure 5, a suitable model is Yij k | δi , ij ∼ Multinomial(1, π ij ), where

6 log

πbij b=c+1 c a=1 πaij

(5)

= β0c + xij β + δi + ij

and π ij = [π1ij · · · π6ij ], with πaij = P (Yij k = a). The parameters β0c , c = 1, . . . , 5, are intercept parameters representing the overall levels falling into each category. Furthermore, δi ∼ N (0, σδ2 ), i = 1, . . . , 20, ij ∼ N (0, σ2 ), j = 1, . . . , ni , and all random variables are independent. We analyzed the ASTM scores for the five coatings using the mixed cumulative logit model (5). The simplified model obtained for coating 2 is displayed in Table 8. The main effects of the factors EPDM, ethylene, talcum and time are clearly highly significant. Also, the type of gas (etching gas vs. activation gas) and the type of activation gas (activation gas 1 vs. 2) have significant effects. The factor power does not have a significant main effect, but its interactions with the type of gas and the type of activation gas are highly significant. There is also some indication that the interaction between mica and the type of activation gas has an effect on the ASTM score. Comparing Table 8 for the mixed cumulative logit model analysis with Table 6 for the mixed binary logit analysis shows that more significant effects were found using the ASTM score as the response, suggesting that this analysis is indeed more powerful. Some of the newly detected effects are small, but the interaction effects involving the factor power are certainly as large as the significant main effects. The estimates of the effects that the binary and the cumulative logit model have in common possess the same signs, indicating that many of the qualitative conclusions from the two models are similar. The two variance components in the model, σδ2 and σ2 , are estimated to be 1.2507 and 3.7213, respectively, which suggests substantial batch-to-batch variation as well as dependence between the three observational units (the tests) within every experimental unit (i.e., within every oven run). The impact of the

DATA ANALYSIS FROM SPLIT-PLOT AND MULTISTRATUM DESIGNS

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

Table 8. Mixed cumulative logit model analysis of the ASTM score of coating 2 Effect

Estimate

SE

Intercept 1 Intercept 2 Intercept 3 Intercept 4 Intercept 5 EPDM Ethylene Talcum Mica Power Time Type of gas Activation gas Power × Type of gas Power × Activation gas Ethylene × Power Mica × Activation gas

−0.4988 1.4035 3.1170 4.9161 6.2085 0.7413 1.3152 1.4867 0.7568 −0.3134 1.9307 2.3826 −0.5931 1.2123 0.8434 −0.5763 0.5732

0.3676 0.3646 0.4470 0.4534 0.2964 0.3136 0.4270 0.3075 0.4750 0.3333 0.2886 0.3033

df

t

p-value

12.83 2.02 0.0652 13.99 3.61 0.0029 14.60 3.33 0.0048 12.20 1.67 0.1205 55.21 −1.06 0.2950 67.10 6.16 < 0.0001 70.15 5.58 < 0.0001 48.91 −1.93 0.0596 63.23 2.55 0.0131 52.63 2.53 0.0144 52.54 −2.00 0.0510 48.29 1.89 0.0648

randomization can be seen from the degrees of freedom for the hypothesis tests in Table 8, where 12 to 15 degrees of freedom are used for the whole-plot factor effects and between 48 and 70 degrees of freedom are used for the subplot effects. These degrees of freedom are in line with the approximate skeleton analysis of variance in Table 5. An overview of the results from the analysis for the ASTM scores for the five coatings is given in Table 9. The table shows that the type of activation gas has a significant impact on the ASTM scores for four of the five coatings, either through a main effect or through one or more interaction effects. Also, the presence of EPDM has a positive impact on four of the ASTM scores. Using the cumulative logit model, however, we also find two significant quadratic effects plus several main effects and interaction effects that were not detected using the binary logit model. Another difference between the analyses using the cumulative logit models and those using the binary logit models is observed for coating 3. Fitting the cumulative logit model for that coating’s ASTM score yields a positive estimate for σδ2 , which measures the batch-to-batch variation. In the simplified cumulative logit model, the estimate amounts to 2.4521, whereas it is zero in the simplified binary logit model. The ASTM response data lead to fewer problems with convergence than the binary response data. For instance, for coating 2, the interaction effect involving EPDM and type of gas becomes estimable when using the six-category ordinal response. Its point estimate is positive, but the effect is not significant. 7.

COMBINED ANALYSIS FOR DIFFERENT COATINGS

In the polypropylene experiment, the investigators were mainly interested in comparing the effects of the factors on different coatings as assessed by the ASTM scores. This necessitates the combined analysis of the data for all five coatings, with a model that includes three-factor interactions involving the coating (this allows us to study whether two-factor interaction effects differ across coatings). We start this section by

349

discussing the randomization structure that led to the combined data involving the 11 factors listed in Table 1 and coating as the twelfth experimental factor. Although there was no formal randomization of coatings to occasions, the sequence in which they were run can be considered as being essentially random. Also, different random samples of plates were used on each occasion. As explained in Section 3, on each of the five occasions, the 100 oven runs were performed in the same order. This means that the orders and occasions are also crossed. This leads to the Hasse diagram shown in Figure 6, with the combinations of orders and occasions included. The entire experiment can be described by 20 batches and five occasions, which are crossed, since each batch is used on each occasion. Within each batch, there are 3–7 orders and these are also crossed with occasions, since each order is used on each occasion. The combinations of these correspond to the 500 oven runs, which are the experimental units in this case. The Hasse diagram can be easily translated into the approximate skeleton analysis of variance shown in Table 10, and into a linear mixed model. The skeleton analysis of variance immediately shows that we cannot estimate a random occasion effect in a linear mixed model, because all the degrees of freedom in this stratum are used to estimate the effect of the coatings. Given the earlier results, we analyse the combined data using the ASTM score as an ordinal response variable. As with the linear mixed model, it is impossible to estimate the variance component corresponding to the occasions when analyzing our categorical responses, even though the multinomial distribution has a fixed scale parameter (unlike in the linear mixed model (3), there is no σφ2 to estimate). Therefore, we exclude the random occasion effect from our analysis. An appropriate model is Yij kl | δi , γij , λik , ij k ∼ Multinomial(1, π ij k ), where 6 log

πbij k b=c+1 c a=1 πaij k

(6)

= β0c + xij k β + δi + γij + λik + ij k ,

δi ∼ N (0, σδ2 ), i = 1, . . . , 20, is a random batch effect, γij ∼ N (0, σγ2 ), j = 1, . . . , ni , is a random effect for the orders within batch i, λik ∼ N (0, σλ2 ), k = 1, . . . , 5, is a random effect for the combinations of batches and occasions, ij k ∼ N (0, σ2 ) is a random oven run effect, l = 1, 2, 3 denotes the test, and all random variables are independent. We analyzed the ASTM scores for the five different coatings simultaneously using the mixed cumulative logit model (6). In our analysis, we used four different contrasts to capture the effects of the categorical factor “Coating.” The first contrast, which we named “Solvent-based coat 1 vs. 2” (C1 ) compares the solvent-based two-layer coating 1 (one-component base coat + two-component top coat) with coating 2, which is a solventbased single-layer coating (using a two-component coat). The second contrast, labeled “Water- vs. solvent-based” (C2 ), compares the water-based two-layer coating 3 (one-component base coat + two-component top coat) with the two solvent-based coatings 1 and 2. The third contrast, named “UV coat vs. traditional” (C3 ), compares coating 5, which is a coating that is dried using UV light, with the traditional solvent- and water-based TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

350

PETER GOOS AND STEVEN G. GILMOUR

Table 9. Overview of analysis for ASTM score of coatings 1–5

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

Coating 1 Effect

Est.

Intercept 1 Intercept 2 Intercept 3 Intercept 4 Intercept 5 EPDM Ethylene Talcum Mica Lubricant UV Power Time Type of gas Activation gas Power × Type of gas Power × Act. gas EPDM × Act. gas EPDM × Ethylene Ethylene × Power Mica × Act. gas Power2 Time2 Batch (σδ2 ) Run (σ2 )

Coating 2

p-value

Est.

0.8248 1.9447 3.1880 4.2825 4.9474 0.9183 0.8533 1.1160 0.7599

0.0008 0.0014 0.0007 0.0191

−0.4988 1.4035 3.1170 4.9161 6.2085 0.7413 1.3152 1.4867 0.7568

0.0652 0.0029 0.0048 0.1205

1.7875 2.1018 −0.7779

<0.0001 <0.0001 0.0093

−0.3134 1.9307 2.3826 −0.5931 1.2123 0.8434

0.2950 <0.0001 <0.0001 0.0596 0.0131 0.0144

0.5490 0.7519

0.0614 0.0043 −0.5763 0.5732

0.0510 0.0648

−1.1530 0.0000 3.8584

p-value

Coating 3 Est. 2.3275 2.7446 4.1791 5.4898 6.4184 1.6215 2.0649

0.3919 2.8145 3.5504 −0.8873 1.8217

p-value

Coating 4 Est.

p-value

−7.3705 −5.8842 −3.0111 −1.9059 −1.2401 0.0076 0.0016

0.4391 <0.0001 0.0001 0.0175 0.0461

0.7201

0.0959

1.0154 1.5205

0.0395 0.0119

Coating 5 Est.

p-value

−2.7046 1.9182 3.0387 4.2061 5.0899 1.2360 0.8856 1.7214 1.1463 0.7526 −0.9235 0.9226 1.6122 1.6407 −0.0097 1.1374 0.7153

<0.0001 0.0002 <0.0001 0.0001 0.0013 0.0001 0.0007 <0.0001 <0.0001 0.9700 0.0045 0.0191

−0.5386

0.0193

−1.3002

0.0275

0.0574 1.2507 3.7213

coatings 1–3. The final contrast, labeled “Low-end coat vs. rest” (C4 ), compares coating 4, which is a water-based single-layer low-end product, with the other four coatings, each of which are high-quality coatings. The final model we obtained is summarized in Table 11. This model was obtained using a manual stepwise regression, starting from an initial model involving the main effects of the 11 experimental factors listed in Table 1, the four contrasts for the coatings, and the interactions between these contrasts and the experimental factors’ main effects. The purpose of the interactions was to quantify the extent to which the main effects differ

2.4521 4.2720

0.0000 2.7527

for the different types of coatings. When fitting the initial model, two of the four variance components, σδ2 and σλ2 , were estimated to be zero. This led to standard errors (SE) that were smaller than expected and caused some of the Kenward–Roger degrees of freedom to be unjustifiably large compared with the skeleton analysis of variance in Table 10. After dropping some of the nonsignificant effects, the estimates of the variance components became positive and the degrees of freedom produced by the Kenward–Roger method were in line with those suggested by the skeleton analysis of variance. Note that, in Table 11, we do not show the standard errors and the results for the

Figure 6. Hasse diagram for combined analysis. TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

0.0000 11.1528

DATA ANALYSIS FROM SPLIT-PLOT AND MULTISTRATUM DESIGNS

Table 10. Approximate skeleton analysis of variance for the combined analysis of the ASTM scores for all coatings Stratum Mean Batch

Occasion Batch × Occasion

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

Order

Run

Test

Source of variation Total X1 , . . . , X7 X1 × X2 , . . . , X1 × X7 Residual Total Coatings Residual Total X1 × Coating, . . . , X7 × Coating X1 × X2 × Coating, . . . , X1 × X7 × Coating Residual Total 2 , X11 X8 , X82 , X9 , X92 , X10 , X10 X8 × X9 , . . . , X10 × X11 X1 × X8 , . . . , X7 × X11 Residual Total X8 × Coating, . . . , X11 × Coating X8 × X9 × Coating, . . . , X10 × X11 × Coating X1 × X8 × Coating, . . . , X7 × X11 × Coating Residual Total Total

df 1 7 6 6 19 4 0 4 28 24 24 76 8 9 35 28 80 32 36 140 112 320 1000

hypothesis tests for the coating contrasts. This is because there are not enough degrees of freedom in the “Occasions” stratum to estimate the corresponding variance component. It is, therefore, impossible to carry out proper hypothesis tests for the coating contrasts, which are estimated in the “Occasions” stratum (see the skeleton analysis of variance in Table 10). Most main effects are highly significant. In addition, several two-factor interaction effects are significant or nearly significant at the 5% level. The significance of several interaction effects involving the contrast “UV coat vs. traditional” (C3 ) implies that there is a significant difference between the UV-dried coating 5, on the one hand, and the traditional solvent-based and waterbased coatings, on the other. The interaction involving the factor ethylene and the contrast “Low-end coat vs. rest” (C4 ), which is borderline significant, suggests that the factor “ethylene” has a smaller positive impact on the adhesion for low-end coatings than for other coatings. Note that only the standard errors and the degrees of freedom for the coating contrasts are affected by the fact that the variance component for the occasions cannot be estimated. Hence, the inference for all other effects, including the interactions with the coating contrasts, does not pose any problems. The identification of important factors helps the manufacturers to identify suitable plasma treatments for different types of polypropylene to improve quality. The variance component estimates for the final model are given in Table 12. The estimate for σδ2 suggests that there is some batch-to-batch variation, and the estimate for σλ2 suggests that the batch-to-batch variation was slightly different between

351

occasions. By far, the largest variance component estimate is that for σ2 , indicating that the three repeated observations within each experimental unit are strongly correlated. We include the standard errors in Table 12, but recommend always including estimable variance components in the model. The main advantage of the combined analysis is that it enables the experimenter to carry out formal hypothesis tests to see whether the factor effects differ from coating to coating. Moreover, if the interaction effects between the experimental factors and the coatings are insignificant, then the information in the data can be pooled across the different coatings to acquire more precise estimates of the effects and have more powerful hypothesis tests for at least some of the remaining factor effects. 8.

DISCUSSION

We have presented a general strategy for the analysis of data from multistratum designs, such as split-plot and strip-plot designs. A key step of our approach is the use of Hasse diagrams to visualize the randomization structure of the experiment. We demonstrated the added value of the Hasse diagrams by means of a polypropylene experiment that involved a complicated randomization. The Hasse diagram of an experiment naturally leads to the appropriate GLMM and a skeleton analysis of variance, which allows the experimenter to evaluate the strengths and weaknesses of a particular design and randomization. In the polypropylene experiment, this approach uncovered a weakness of the experimental design used: the variance component corresponding to the occasions on which the coatings were applied was not estimable. As a consequence, no hypothesis tests could be done for the main effects of the coating. It is obvious that this kind of design should be avoided as much as possible. However, many experimenters in industry face tight logistical, time, and budget constraints, which necessitate the use of designs that do not allow proper inference for all of the factor effects and responses of interest. This was also the case in the polypropylene experiment. Another weakness of the design for the polypropylene experiment was that the orders of the 100 oven runs were not separately randomized on each occasion, leading to the Hasse diagram in Figure 6. Separately randomizing the run orders on each occasion would have led to the Hasse diagram shown in Figure 7. In that case, the randomizations of batches and occasions would have been crossed, and their combinations would have defined another stratum, as in a strip-plot design. The units in the “Batches × Occasions” stratum, which would be the lowest stratum in a classical strip-plot design, are split into smaller units (named runs), each of which involves three observational units (the tests). Each of these strata corresponds to a node in the Hasse diagram. The impact of this alternative design on the analysis would have been that more degrees of freedom would have been available for testing the effects of the factors applied to the runs stratum, that is, the gas plasma treatment factors. In terms of model complexity, one fewer variance component would have been required. A third weakness of the design used for the polypropylene experiment is that the design used was an 11-factor split-plot design that was D-optimal for the linear mixed model in (1). This design was appropriate for the primary response, the surface TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

352

PETER GOOS AND STEVEN G. GILMOUR

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

Table 11. Mixed cumulative logit model analysis of the ASTM scores for all five coatings combined Effect

Estimate

SE

df

t

p-value

Intercept 1 Intercept 2 Intercept 3 Intercept 4 Intercept 5 Water- vs. solvent-based (C2 ) UV coat vs. traditional (C3 ) Low-end coat vs. rest (C4 ) EPDM Ethylene Talcum Mica UV Power Time Type of gas Activation gas Power × Type of gas EPDM × Ethylene Time × Time C3 × UV C3 × Time C3 × Power C3 × Type of gas C3 × Activation gas C3 × EPDM C3 × Ethylene C3 × EPDM × Ethylene C4 × Ethylene

−1.0395 0.9763 2.4495 3.6824 4.5022 1.2177 −1.7073 −4.9763 0.9819 1.1300 1.2519 0.8871 0.0559 0.1385 1.8894 2.0659 −0.7051 0.8238 −0.0424 −0.8425 −0.6982 −0.5631 0.6464 −0.7361 0.6724 0.0488 −0.4075 −0.6143 −0.5733

0.2210 0.2196 0.2676 0.2742 0.2166 0.1891 0.1946 0.2545 0.1885 0.2905 0.2163 0.4110 0.2442 0.2614 0.2585 0.3454 0.2587 0.2448 0.2450 0.2446 0.2940

12.30 12.62 13.46 11.71 12.76 69.81 80.94 82.14 66.58 75.26 12.07 72.59 46.58 257.60 236.30 262.30 221.90 43.44 43.21 43.71 80.93

4.44 5.15 4.68 3.24 0.26 0.73 9.71 8.12 −3.74 2.84 −0.20 −2.05 −2.86 −2.15 2.50 −2.13 2.60 0.20 −1.66 −2.51 −1.95

0.0008 0.0002 0.0004 0.0074 0.8006 0.4663 <0.0001 <0.0001 0.0004 0.0059 0.8478 0.0440 0.0063 0.0321 0.0131 0.0340 0.0100 0.8428 0.1035 0.0158 0.0547

tension. However, it ignored the fact that there was a secondary interest to also estimate other GLMs involving count data and categorical variables as responses and to include coating as a twelfth factor in the analysis. Indeed, it turned out that this secondary analysis was more informative than the intended primary analysis. An attractive design approach would have been to use a composite design criterion, such as the compound cri-

teria in chapter 21 of Atkinson, Donev, and Tobias (2007) or in Woods et al. (2006). Using such a composite design criterion poses several major challenges. First, even if the random effects are ignored completely, the compound criterion would involve the information matrices for various GLMs, on top of the information matrix for the linear mixed model. These information matrices all depend on the 66 unknown parameters in the model

Figure 7. Hasse diagram for combined analysis if orders of oven runs had been randomized. TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

DATA ANALYSIS FROM SPLIT-PLOT AND MULTISTRATUM DESIGNS

Table 12. Variance component estimates corresponding to the final model for the combined analysis of the ASTM scores for the five coatings Variance component

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

σδ2 σγ2 σλ2 σ2

Batch Order Batch × Occasion Run

Estimate 0.3449 0.9761 0.2201 4.4347

SE 0.3721 0.4384 0.3404 0.5909

(see Section 3 for an enumeration of the terms in the model), so a Bayesian design would be appropriate. The problem with this is that, for any type of response, a 66-dimensional integral has to be evaluated numerically. In the literature on optimum experimental designs, no Bayesian designs have been generated for such complex problems. Second, the consideration of the variance component estimation in the composite optimality criterion adds additional complexity. Not only does the dimension of the information matrix increase by 1 for every variance component that has to be estimated, but also the dimension of the integral that has to be evaluated numerically. Recently, substantial progress has been made in the construction of Bayesian optimal designs for GLMs without random effects (for instance, see Woods et al. 2006; Kessels et al. 2009; Bliemer, Rose, and Hess 2008; Gotwalt, Jones, and Steinberg 2009) and for GLMs involving random effects (for instance, see S´andor and Wedel 2002; Yu, Goos, and Vandebroek 2009), but none of these articles considered various types of responses simultaneously or a dimensionality of the sort we encountered in the polypropylene experiment. A design approach that is more feasible than the Bayesian composite design criterion described in the previous paragraph is to seek a design that is optimal for a linear model corresponding to the Hasse diagrams in Figures 6 and 7. This would not require the incorporation of prior information on the factor effects and would circumvent the need to evaluate a 66-dimensional integral. Instead, a relatively simple extension of either of the approaches of Trinca and Gilmour (2001) and Gilmour and Trinca (2003) or Jones and Goos (2007, 2009) and Arnouts, Goos, and Jones (2010) is required. From our study, it should also be clear that the modeling of categorical responses calls for high-quality data. The split-plot design that was used as the basis for the experiment whose data we analyzed in this article was a D-optimal design involving 20 whole plots and 100 subplots for estimating main effects, a substantial number of interactions and the quadratic effects of the quantitative subplot factors. Even though it ignored the nonlinear aspect of some of the models of interest, the treatment design was thus a high-quality design and the number of whole plots was substantially larger than the number of whole-plot factor effects in the a priori model. Yet, estimating interaction effects and quadratic effects in the binary logistic mixed models we studied was often impossible due to the separation problem, which means that the maximum likelihood estimates do not exist and software procedures fail to converge. It is known that the separation problem occurs with many datasets involving twolevel factors. Optimal experimental designs for linear models involving main effects and interaction effects have two levels

353

only for each factor. As also noted by Woods and van de Ven (2011), such optimal experimental designs might be particularly sensitive to the separation problem, assuming only linear main effects and interactions. This is a topic that is worthy of further investigation. Another recurring problem in the analysis of the data from the polypropylene experiment is that some of the variance components are estimated to be zero. With categorical outcomes, such estimates are more likely to be obtained than with continuous responses. When planning experiments with categorical responses, any decisions regarding unit and treatment structures are thus even more important than in situations with continuous responses. Hence, for categorical data, the use of Hasse diagrams as a basis for planning experiments and to ensure sufficient degrees of freedom for variance component estimation is even more important than for continuous data. In this article, we have also shown how to deal with the problem of multiple observational units within the experimental units. The dependence between repeated observations within a given experimental unit was modeled using an additional random effect in the GLMM. This approach is very useful for industrial experimenters who often take multiple observations, which are commonly called repeats in the literature on industrial experimental design. When repeats are taken from continuous responses, a proper statistical analysis can be done on the average responses over all repeats within an experimental unit. Obviously, this commonly used approach is not appropriate for binary and ordered categorical outcomes such as those we have analyzed here. SUPPLEMENTARY MATERIALS The following supplementary materials are contained in a single archive and can be obtained via a single download. The surface tension and lifetime data: The file “lifetime data.csv” containing all the settings of the experimental factors and the corresponding total surface tension, polar surface tension, and lifetime. This file was used for the analyses described in Section 4. Using the SAS programs 1 and 2 requires importing the data file and naming it “PP.” SAS program 1: SAS program for the split-plot analysis of the surface tension response, the results of which are reported in Table 4. SAS program 2: SAS program for the split-plot analysis of the lifetime response, the results of which are reported in Table 4. The ASTM data: The file “data all coatings.csv” containing all the settings of the experimental factors and the corresponding ASTM scores. This file was used for the analyses described in Sections 5–7. Using the SAS programs 3–5 requires importing the data file and naming it “PP.” SAS program 3: SAS program for the binary logit split-plot analysis of the dichotomized response “Success of coating” for coating 2, the results of which are reported in Table 6. SAS program 4: SAS program for the cumulative logit splitplot analysis of the six-category ordinal response, “ASTM score,” for coating 2, the results of which are reported in Table 8. TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

354

PETER GOOS AND STEVEN G. GILMOUR

SAS program 5: SAS program for the combined cumulative logit analysis of the ASTM score for the five coatings, the results of which are reported in Table 11. ACKNOWLEDGMENTS This work was supported by the Royal Society International Joint Project number 2007/R2. The authors thank the participants in the Flanders’ Drive project for allowing them to use the data. [Received May 2010. Revised March 2012.]

Downloaded by [University of Isfahan ] at 10:34 19 February 2013

REFERENCES Agresti, A. (2002), Categorical Data Analysis (2nd ed.), New York: Wiley. [348] ——— (2010), Analysis of Ordinal Categorical Data (2nd ed.), New York: Wiley. [348] Albert, A., and Anderson, J. A. (1984), “On the Existence of Maximum Likelihood Estimates in Logistic Regression Models,” Biometrika, 71, 1–10. [348] Allison, P. D. (2008), “Convergence Failures in Logistic Regression,” Working Paper No. 360-2008, SAS Global Forum, Available at http://www2.sas.com/proceedings/forum2008/360-2008.pdf [348] Arnouts, H., Goos, P., and Jones, B. (2010), “Design and Analysis of Industrial Strip-Plot Experiments,” Quality and Reliability Engineering International, 26, 127–136. [340,353] Atkinson, A. C., Donev, A. N., and Tobias, R. D. (2007), Optimum Experimental Designs, With SAS, New York: Wiley. [351] Bailey, R. A. (2007), Design of Comparative Experiments, Cambridge: Cambridge University Press. [341] Bliemer, M. C. J., Rose, J. M., and Hess, S. (2008), “Approximation of Bayesian Efficiency in Experimental Choice Designs,” Journal of Choice Modelling, 1, 98–127. [353] Chipman, H., and Hamada, M. (1996), “Bayesian Analysis of Ordered Categorical Data From Industrial Experiments,” Technometrics, 38, 1–10. [348] Cornell, J. (2002), Experiments With Mixtures (3rd ed.), New York: Wiley. [343] Dasgupta, T., Ma, C., Joseph, V. R., Wang, Z. L., and Wu, C. F. J. (2008), “Statistical Modeling and Analysis for Robust Synthesis of Nanostructures,” Journal of the American Statistical Association, 103, 594–603. [340] Gilmour, S. G., and Goos, P. (2009), “Analysis of Data From Nonorthogonal Multi-Stratum Designs in Industrial Experiments,” Applied Statistics, 58, 467–484. [343] Gilmour, S. G., and Trinca, L. A. (2003), “Row-Column Response Surface Designs,” Journal of Quality Technology, 35, 184–193. [340,353] Goos, P. (2002), The Optimal Design of Blocked and Split-Plot Experiments, New York: Springer. [340] Goos, P., and Jones, B. (2011), Optimal Design of Experiments: A Case Study Approach, New York: Wiley. [340] Gotwalt, C. M., Jones, B. A., and Steinberg, D. M. (2009), “Fast Computation of Designs Robust to Parameter Uncertainty for Nonlinear Settings,” Technometrics, 51, 88–95. [353]

TECHNOMETRICS, NOVEMBER 2012, VOL. 54, NO. 4

Jeng, S-L, Joseph, V. R., and Wu, C. F. J. (2008), “Modeling and Analysis Strategies for Failure Amplification Method,” Journal of Quality Technology, 40, 128–139. [340] Jones, B., and Goos, P. (2007), “A Candidate-Set-Free Algorithm for Generating D-Optimal Split-Plot Designs,” Applied Statistics, 56, 347– 364. [344,353] ——— (2009), “D-Optimal Design of Split-Split-Plot Experiments,” Biometrika, 96, 67–82. [340,353] Joseph, V. R., and Wu, C. F. J. (2004), “Failure Amplification Method: An Information Maximization Approach to Categorical Response Optimization” (with discussion), Technometrics, 46, 1–31. [340] Kenward, M. G., and Roger, J. H. (1997), “Small Sample Inference for Fixed Effects From Restricted Maximum Likelihood,” Biometrics, 53, 983–997. [346] Kessels, R., Jones, B., Goos, P., and Vandebroek, M. (2009), “An Efficient Algorithm for Constructing Bayesian Optimal Choice Designs,” Journal of Business and Economic Statistics, 27, 279–291. [353] Lee, Y., and Nelder, J. A. (2003), “Robust Design via Generalized Linear Models,” Journal of Quality Technology, 35, 2–12. [340] Letsinger, J. D., Myers, R. H., and Lentner, M. (1996), “Response Surface Methods for Bi-Randomization Structures,” Journal of Quality Technology, 28, 381–397. [340,345] Lohr, S. L. (1995), “Hasse Diagrams in Statistical Consulting and Teaching,” The American Statistician, 49, 376–381. [341] Mead, R., Gilmour, S. G., and Mead, A. (2012), Statistical Principles for the Design of Experiments, Cambridge: Cambridge University Press. [342] Mee, R. W., and Bates, R. L. (1998), “Split-Lot Designs: Experiments for Multistage Batch Processes,” Technometrics, 40, 127–140. [340] Myers, R. H., Montgomery, D. C., and Vining, G. G. (2002), Generalized Linear Models: With Applications in Engineering and the Sciences, New York: Wiley. [340] Paniagua-Qui˜nones, C., and Box, G. E. P. (2009), “A Post-Fractionated StripStrip-Block Design for Multi-Stage Processes,” Quality Engineering, 21, 156–167. [340] Robinson, T. J., Myers, R. H., and Montgomery, D. C. (2004), “Analysis Considerations in Industrial Split-Plot Experiments When the Responses Are Nonnormal,” Journal of Quality Technology, 36, 180–192. [340] Robinson, T. J., Wulff, S., Khuri, A. I., and Montgomery, D. C. (2006), “Robust Parameter Design Using Generalized Linear Mixed Models,” Journal of Quality Technology, 38, 65–75. [340] S´andor, Z., and Wedel, M. (2002), “Profile Construction in Experimental Choice Designs for Mixed Logit Models,” Marketing Science, 21, 455–475. [353] Taylor, W. H., and Hilton, H. G. (1981), “A Structure Diagram Symbolization for Analysis of Variance,” The American Statistician, 35, 85–93. [341] Trinca, L. A., and Gilmour, S. G. (2001), “Multi-Stratum Response Surface Designs,” Technometrics, 43, 25–33. [340,353] Vivacqua, C. A., and Bisgaard, S. (2009), “Post-Fractionated Strip-Block Designs,” Technometrics, 51, 47–55. [340] Woods, D. C., Lewis, S. M., Eccleston, J. A., and Russell, K. G. (2006), “Designs for Generalized Linear Models With Several Variables and Model Uncertainty,” Technometrics, 48, 284–292. [351,353] Woods, D. C., and van de Ven, P. (2011), “Blocked Designs for Experiments With Correlated Nonnormal Response,” Technometrics, 53, 173–182. [353] Yu, J., Goos, P., and Vandebroek, M. (2009), “Efficient Conjoint Choice Designs in the Presence of Respondent Heterogeneity,” Marketing Science, 28, 122–135. [353]