TM-1-2 by everium llc

Measures of Central Tendency Although a frequency distribution is helpful in organizing data, there are occasions when you need a brief way of describing the performance of a group. The range of scores discussed earlier is one such brief description. Understand, though, that two sets of data with the same range may differ greatly. Consider the class described in Table 1-1. The range of sit-up scores for that class (Class A) was 57 (80-23=57). Suppose another class (Class B) had sit-up scores ranging from 40 to 97. The range in that class would also be 57 (40-97=57). A graphical representation of the sit-up scores for the two classes is given in Figure 1-1. Would you say that the students in Class A did about the same as those in Class B? Would you expect a "typical" student from Class A and a "typical" student from Class B to have scored the same number of sit-ups? Obviously, the students in class B are much better at doing sit-ups. Consequently, in this case the range is really not that descriptive. What is needed is a way to find a "typical" or "representative" score for each class. A score which best represents all the scores of a group is called a measure of central tendency, or an average. Figure 1-1

Sit-up scores for two classes Mean The most common and most useful measure of central tendency is the mean. In everyday language the terms "average" and "mean" are often used interchangeably. To compute the mean of a set of scores, add them and then divide by the number of scores. Thus, both the number of scores and the size of each score are used in determining the mean. The symbol for the mean of a population is μ and the symbol for the mean of a sample is . Table 1-4 illustrates the procedure for finding the mean push-up score for 6 members of a class of 9th grade girls. Note the special symbol “Σ” (sigma) which means "find the sum of". Of course, X is the symbol for a raw score and N represents the number of all the raw scores in the study.

Table 1-4 Computation of the mean push-up score for six 9th grade girls Number of Push-ups

X 35 27 25 22 22 13 X = 144

X

X N

Mean 

sum of all Xs 144   24 numberof all Xs 6

Sometimes, when you have a large number of scores, it is more convenient to group the data before finding the mean. On other occasions, the original data used in making the frequency distribution may not be available. In these cases, you need a formula and procedure for finding the mean of a set of grouped data. Table 1-5 illustrates the procedure for finding the mean for the sit-up data given in Table 1-3. Hey! Pay attention here…I am giving you the keys to the universe here. Table 1-5 Procedure for computing the mean sit-up score of the class of 9th grade boys (Grouped Data)

S.I. 78-82 73-77 68-72 63-67 58-62 53-57 48-52 43-47 38-42 33-37 28-32 23-27

Tally

ƒ d 1 6 2 5 4 4 3 3 7 2 6 1 10 0 6 -1 4 -2 3 -3 2 -4 2 -5 N=50

ƒd 6 10 16 9 14 6 (+61) 0 -6 -8 -9 -8 -10 (-41) Σ ƒd = 20

The formula for determining the mean for grouped data is as follows. Don’t “freak out” on me it is not that difficult. Let me walk you through it slowly.

  fd  X  GM   I  N 

M This is the mean for grouped data…that is what we are calculating here. GM This is simply the symbol for the guessed mean. Σ ƒd This represents the sum of the ƒd column N Represents the number of cases in the study I Represents the step interval you selected.

Now just plug the formula in. The sum of the fd (Σ ƒd) column is 20, the number (N) of cases in the example is 50, the step interval (I) is 5 and the guessed mean (GN) is 50.

  fd  X  GM   I  N   20  X  50    5   50 

X  50  .40  5 X  50  2.00

X  52.00 Let me walk you through this thing step by step and you will see it is no big deal. You will notice that the first three columns (headed S.I., Tally and ƒ) are a duplication of Table 1-3. The following steps were used to construct the rest of the table and compute the mean. That’s right; I did that for you too. Let’s see how I did it. Pay attention to what I about to tell you. It is not that difficult. Ready? 1. First we construct a frequency distribution (See Table 1-3). We already did that… right? If you don’t remember, go back and look at Table 1-3. Go ahead, just don’t take all day. I will wait right here for you. Good! 2. Our next step is to "guess" the mean. Take a guess at what the mean is by looking over the data. For convenience, researchers usually choose the middle score of some step-interval. It makes no difference which interval is chosen, but the computation is much easier if your guess is close to the true mean. In this case we chose 50, the middle score of 48-52, as the guessed mean (GM).

3. Form the d column. On the line with S.I. which contains the GM, place a "0". Number the S.I.'s above this "1, 2, 3, ..." and the S.I.'s below the one containing the GM "-1, -2, -3, ..." 4. Form the ƒd column. Each entry in this column is the product of the corresponding entries in the ƒ and d columns. Notice that some of the products are negative. 5. Find Σ ƒd. Sum the ƒd column. 6. Now use the following formula to find the mean for grouped data:   fd   sum of fd column  X  GM   I Mean  Guessed Mean    interval size   N   number of scores  Here is something that might interest you. The main weakness of the mean is that, with a small sample, extreme scores have a misleading effect. In the following group of scores the mean is 10, and it is representative of the group: 12 11  X 50 10 X    10 5 N 9 8 ΣX = 50

However, if we change the highest score from 12 to 30 in this group of scores, the mean becomes 13.6, which raises the measure of central tendency considerably. 30 11  X 68 X    13.6 10 5 N 9 8 ΣX = 68 The mean of 13.6 is not representative of the group since it is higher than every score but one. Although this example may be a bit extreme, it illustrates the point that one should always consider the number of cases and look at the data, rather than blindly trusting statistics. In fact, don’t trust anything or anybody… your mother probably told you that when you were a kid. Now I am telling you that. If you didn’t listen to her, you better listen to me. Remember, I am giving you a grade when all of this is said and done. YES! I do accept bribes, just don’t tell the dean. To be a good measure of central tendency, the mean should be in the middle of the scores. I know I already said that but some things are worth saying more than once. Mode Another measure of central tendency is the mode. The mode is the score that occurs most frequently. The mode is a rough measure and is used more for description than for exact analysis. To locate the mode when the scores are ungrouped, you need to merely locate the score or scores that appear the most often. Because there may be more than one mode, the mode is the least reliable of the three measures of central tendency that are generally used. For the class of 9th grade boys, we may obtain the

mode by looking at either Table 1-1 or Table 1-2. Look at Table 1-1 and see if you can find the mode. Now look at Table 1-2 and see if you were right. Go ahead and do it…dam it. This is not your mother talking here, and don’t come back here until find it. It was 50…right! Of course it is…the mode is 50 since it is the score with the highest frequency. In case two or more scores tie for the highest frequency, the distribution is bimodal or multimodal. Anywho, for data presented in a frequency distribution, it may be convenient to determine the modal interval. In Table 1-3 we see that the interval 48-52 is the modal interval since it has the highest frequency. Median A third measure of central tendency is the median. The median is the middle score of a set of scores. Half the scores fall above it and half below it. For example, the set of data… 2, 3, 5, 11, 21… has 5 as its median since there are two scores above 5 and two scores below 5. For the set of data… 2, 3, 5, 8, 11, 21… there is no score in the distribution which has the same number of scores above it and below it. Score 5 has two below and three above it while 8 has three below and two above. The median of this distribution is the mean of its two middle scores… 5+8 = 13 divided by 2 which is 6.5. Isn’t that neat how that works? Extreme scores do not greatly affect the median. Using the same five scores with which we demonstrated the influence of extreme scores on the mean…remember we change the highest score from 12 to 30 and the mean became 13.6, which raised the measure of central tendency considerably. Well look what happens when we do that here using the median as our measure of central tendency. The median would be 10 for both groups of data, but you can plainly see that the extreme score of 30 did not affect the median. Again, extreme scores do not greatly affect the median. 12 11 10 9 8

30 11 10 9 8

(median for both data sets)

With a large number of scores, the mean is not influenced a great deal by extreme scores either. In fact, the median is the preferred measure of central tendency only when the scores do not conform to the normal distribution. In other words, if the scores cluster at either end of the distribution, the median may be a better central tendency measure. When scores cluster at the upper or lower ends, rather than the middle, the distribution is said to be skewed. We will discuss skewness later in the book. Which Measure of Central Tendency? Which measure of central tendency should we use? I am glad you asked. We compute a measure of central tendency because we want one score which will best represent all the scores of the group. Yet Table 1-7 illustrates that the mode, median, and mean for a set of data are often different, yet each can be called

Table 1-6 Measures of central tendency for sit-up and push-up data

Exercise Sit-ups Push-ups

N Mode 50 50 6 22

Median Mean 50 52.00 23.5 24.00

"average". The measure of central tendency you select will Table 1-7 Teacher Salaries depend on how you plan to use it. # of Teachers Looking at Table 1-6, if your intention is to Salary 24,100 3 impress another teacher, you would probably use the 24,200 6 Mode: $24,200 mean as your measure of central tendency for both sit24,300 5 Median: $24,300 ups and push-ups since, in this case, it is the largest 24,400 4 Mean: $28,600 measure of central tendency in this problem. If, on the 25,200 2 other hand, you were disappointed in your class's 41,500 1 performance and wanted to motivate them to work harder, you might choose the mode because it is the lowest measure of central tendency in this problem. One common example of choosing the measure of central tendency that best supports your position is in regard to salaries. Suppose the teachers at one school wanted a raise, so they pointed out that their average salary was $24,000. The school board countered by stating that the average salary at that school was almost $28,610. The actual salaries are given in Table 1-7. Both groups were using an average salary, but each picked the measure of central tendency which best supported their case. To avoid misunderstandings, you should specify which measure you are using and be familiar with the proper use of each. As mentioned, the mode is the easiest measure of central tendency to compute, but it is often less deceptive and not very useful. It can be used when you need a "typical" score or a quick rough average. The median is relatively easy to compute and locates the score in the exact middle of your distribution. Since the median is not affected by extremely high or low scores, it is most useful when a distribution contains a few extreme scores. For the data in Table 1-7, the median salary is $24,300. The teacher making $41,500 could have his salary decreased to $20,000 or increased to $150,000 and the median would not be affected since his rank in the distribution would not change. The major disadvantage of the median is that it cannot be used in further computations. The mean is the most commonly used of the three measures of central tendency. It is the most stable and usually is the "truest" of the measures. It is an algebraic procedure and unlike the mode and median, can be used in subsequent statistical procedures. In short, the mean is generally the best measure of central tendency and should always be used unless there is a special reason not to use it.