
https://ebookmass.com/product/statistics-and-data-analysis-

Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...
Statistics for Nursing Research: A Workbook Evidence-Based Practice 2nd Edition – Ebook PDF Version
https://ebookmass.com/product/statistics-for-nursing-research-aworkbook-evidence-based-practice-2nd-edition-ebook-pdf-version/
ebookmass.com
Introduction to Python for Econometrics, Statistics and Data Analysis Kevin Sheppard
https://ebookmass.com/product/introduction-to-python-for-econometricsstatistics-and-data-analysis-kevin-sheppard/
ebookmass.com
Introduction to Python for Econometrics, Statistics and Data Analysis. 5th Edition Kevin Sheppard.
https://ebookmass.com/product/introduction-to-python-for-econometricsstatistics-and-data-analysis-5th-edition-kevin-sheppard/
ebookmass.com
Operative Techniques in Foot and Ankle Surgery 2nd Edition, (Ebook PDF)
https://ebookmass.com/product/operative-techniques-in-foot-and-anklesurgery-2nd-edition-ebook-pdf/
ebookmass.com





Frequency Distributions: Tabulating and Displaying Data
Researchers constructing a frequency distribution manually list the data values (the Xs) in a column in the desired order, and then keep a tally next to each value for each occurrence of that value. In Table 2, the tallies are shown in the second column, using the familiar system of four vertical bars and then a slash for the fifth case. The tallies can then be totaled, yielding the frequency (f) or count of the number of cases for each data value.
In constructing a frequency distribution, researchers must make sure that the list of data values is mutually exclusive and collectively exhaustive. The sum of the frequencies must equal the number of cases in the sample.
f N
where Σ the sum of f the frequencies N the sample size
This equation simply states that the sum of (symbolized by the Greek letter sigma, Σ ) all the frequencies of score values ( f ) equals the total number of study participants ( N ).
A frequency count of data values usually communicates little information in and of itself. In Table 2, the fact that five patients had a heart rate of 70 bpm is not very informative without knowing how many patients there were in total, or how many patients had lower or higher heart rates. Because of this fact, frequency distributions almost always show not only absolute frequencies (i.e., the count of cases), but also relative frequencies, which indicate the percentage of times a given value occurs. The far right column of Table 2indicates that 5% of the sample had a heart rate of 70. Percentages are useful descriptive statistics that appear in the majority of research reports. A percentage can be calculated easily, using the following simple formula:
% (f N) 100
That is, the percentage for a given value or score is the frequency for that value, divided by the number of people, times 100. The sum of all percentages must equal 100% (i.e., Σ% 100%). You will probably recall that a proportion is the same as a percentage, before multiplying by 100 (i.e., proportion f N )
Of course, researchers rarely use a tally system or manually compute percentages with their dataset. In SPSS and other statistical software packages, once the data have been entered and variable information has been input, you can proceed to run analyses by using pull-down menus that allow you to select which type of analysis you want to run. For the analyses described in this chapter, you would click on Analyze in the top toolbar, then select Descriptive Statistics from the pull-down menu, then Frequencies. Another commonly used descriptive statistic is cumulative relative frequency, which combines the percentage for the given score value with percentages for all values that preceded it in the distribution. To illustrate, the heart rate data have been analyzed on a computer using SPSS, and the resulting computer printout is presentedin Figure 1. (The SPSS commands that produced the printout in Figure 1are Analyze ➞ Descriptive Statistics ➞ Frequencies ➞ hartrate.)
Frequency Distributions: Tabulating and Displaying Data Grouped
Frequencies
FIGURE 2 SPSS printout of a grouped frequency distribution.
for research reports. Two common approaches are ascending or descending order of the frequencies, and alphabetical order of the categories. We ordered the categories in Table 3in descending order of frequency.
GRAPHIC DISPLAY OF FREQUENCY DISTRIBUTIONS
Frequency distributions can be presented either in a table, as in Tables 2and 3, or graphically. Graphs have the advantage of communicating information quickly, but are not common in journal articles because of space constraints. By contrast, graphs are excellent means of communicating information in oral and poster presentations at conferences. They can also be useful to researchers themselves early in the analysis process when they are trying to understand their data.
Bar Graphs and Pie Charts
When a variable is measured on a nominal scale, or on an ordinal scale with a small number of values, researchers can construct a bar graph to display frequency information. An example of a bar graph for the marital status data from Table 3is presented in Figure 3. A bar graph consists of two dimensions: a horizontal dimension (the X axis) and a vertical dimension (the Y axis). In a bar graph, the categories are typically listed along the horizontal X axis, and the frequencies or percentages are displayed on the vertical Y axis. The bars above each category are drawn to the height that indicates the frequency or relative frequency for that category. In a bar graph for categorical data, the bars for adjacent categories should be drawn not touching each other; each bar width and the distance between bars should be equal. Researchers sometimes indicate exact percentages at the top of the bars, as shown in Figure 3.
TABLE 3 Frequency Distribution of a Nominal-Level Variable:
Frequency Distributions: Tabulating and Displaying Data
Marital Status of Study Participants
FIGURE 3 Example of an SPSS bar graph for a nominal-level variable.
An alternative to a bar graph is a pie chart (sometimes called a circle graph), which is a circle divided into pie-shaped wedges corresponding to the percentages. Figure 4presents an SPSS-generated pie chart for the marital status data. All the pieces of the pie must add up to 100%. The pie wedges are generally ordered from highest to lowest frequency, with the largest segment beginning at “12 o’clock.”2
Marital Status of Participants
Marital
FIGURE 4 Example of an SPSS pie chart for a nominal-level variable.
2 To produce Figures 3and 4, we used the SPSS commands Analyze ➜ Descriptive Statistics ➜ Frequencies ➜ Charts for the variable marstat, opting for the Bar Chart option first and the Pie Chart option next. We could also have used the commands Graphs ➜ Legacy Dialogs ➜ Bar (or Pie).
Frequency Distributions: Tabulating and Displaying Data
General Issues in Graphic Displays
Graphic displays of frequency distributions can communicate information at a glance, but graphs can be constructed in such a way that the information is misleading or ineffective. One issue concerns the grouping of values in a grouped distribution. If the heart rate data were clustered into two class intervals, for example (55–64 and 65–74), the resulting histogram or frequency polygon would not be especially informative. Another issue concerns the height and width of the display. The American Psychological Association (2001) has published guidelines that are used by many nursing research journals. These guidelines suggest that the height of a graph (i.e., the height at the highest frequency) should be about two thirds the width of the X axis.
SHAPES OF DISTRIBUTIONS
Distributions of quantitative variables can be described in terms of a number of features, many of which are related to the distributions’ physical appearance or shape when presented graphically.
Modality
The modality of a distribution concerns how many peaks or high points there are. A distribution with a single peak—that is, one value with a high frequency—is a unimodal distribution. The distribution of heart rate data (Figure 7) is unimodal, with a single peak at the value of 66.
Multimodal distributions have two or more peaks, and when there are exactly two peaks, the distribution is bimodal . Figure 8presents six distributions with different shapes. In this figure, the distributions labeled A, E, and F are unimodal,
FIGURE 8 Examples of distributions with different shapes.
Frequency Distributions: Tabulating and Displaying Data
while B, C, and D are multimodal. Distributions B and D have two peaks, and thus can also be described as bimodal.
Symmetry and Skewness
Another aspect of a distribution’s shape concerns symmetry. A distribution is symmetric if the distribution could be split down the middle to form two halves that are mirror images of one another. In Figure 8, distributions A through C are symmetric, while D through F are not.
Distributions of actual study data are rarely as perfectly symmetric as those shown in Figure 8. For example, the distribution of heart rate values in Figure 7is roughly symmetric, and researchers would likely characterize the data as symmetrically distributed. Minor departures from perfect symmetry are usually ignored when describing the shapes of data distributions.
In asymmetric distributions, the peaks are off center, with a bulk of scores clustering at one end, and a tail trailing off at the other end. Such distributions are often described as skewed, and can be described in terms of the direction of the skew. When the longer tail trails off to the right, as in D and E of Figure 8, this is a positively skewed distribution. An example of an attribute that is positively skewed is annual income. In most countries, most people have low or moderate incomes and would cluster to the left, and the relatively small numbers in upper income brackets would be distributed in the tail. When a skewed distribution has a long tail pointing to the left (Figure 8, F), this is a negatively skewed distribution. For example, if we constructed a frequency polygon for the variable age at death, we would have a negatively skewed distribution: Most people would be at the far right side of the distribution, with relatively few people dying at a young age.
Skewness and modality are independent aspects of a distribution’s shape. As Figure 8shows, a distribution can be multimodal and skewed (D), or multimodal and symmetric (B and C)—as well as unimodal and skewed (E and F), or unimodal and symmetric (A).
Statisticians have developed methods of quantifying a distribution’s degree of skewness. These skewness indexes are rarely reported in research reports, but they can be useful for evaluating whether statistical tests are appropriate. A skewness index can readily be calculated by most statistical computer programs in conjunction with frequency distributions. The index has a value of 0 for a perfectly symmetric distribution, a positive value if there is a positive skew, and a negative value if there is a negative skew. For the heart rate data (Figure 7), the skewness index is .20, indicating a very modest negative skew.
TIP: In SPSS, if you request information about skewness within the Frequency procedure, you will get a value for both the skewness index and a standard error. As a rough guide, a skewness index that is more than twice the value of its standard error can be interpreted as a departure from symmetry. In our example, the skewness index of .20 was smaller than the standard error (.24), indicating that the heart rate distribution is not markedly skewed.
Kurtosis
A third aspect of a distribution’s shape concerns how pointed or flat its peak is—that is, the distribution’s kurtosis. Two distributions with different peakedness are superimposed on one another in Figure 9. Distribution A in this figure is more peaked, and
Frequency Distributions: Tabulating and Displaying Data
FIGURE 10 SPSS printout of a frequency distribution with wild codes and missing data. the sample) had a missing value (in this case, a “system missing” or blank) on the mammogram question.
4. Testing assumptions for statistical testsMany widely used inferential statistics are based on a number of assumptions. In statistics, an assumption is a condition that is presumed to be true and, when ignored or violated, can lead to misleading or invalid results. Many inferential statistics assume, for example, that variables in the analysis (usually the dependent variables) are normally distributed. Frequency distributions and the associated indexes for skewness and kurtoses provide researchers with information on whether the key research variables conform to this assumption—although there are additional ways to examine this. When variables are not normally distributed, researchers have to choose between three options: (1) Select a statistical test that does not assume a normal distribution; (2) Ignore the violation of the assumption—an option that is attractive if the deviation from normality is modest; or (3) Transform the variable to better approximate a distribution that is normal. Various data transformations can be applied to alter the distributional qualities of a variable, and the transformed variable can be used in subsequent analyses. Some data transformation suggestions are shown in Table 4.
5. Obtaining information about sample characteristics Frequency distributions are used to provide researchers with descriptive information about the background characteristics of their sample members. This information is often of importance in interpreting the results and drawing conclusions about the ability to generalize the findings. For example, if a frequency distribution revealed that 80% of study participants were college graduates, it would be imprudent to generalize the findings to less well-educated people.
6. Directly answering research questions Although researchers typically use inferential statistics to address their research questions, descriptive statistics are sometimes used to summarize substantive information in a study. For example, Lauver, Worawong, and Olsen (2008) asked a sample of primary care patients what their health goals were. They presented several descriptive tables with frequency and relative frequency (percentage) information. For instance, as their primary health goal, 40% of participants (N 24)said they wanted to get in better shape and 30% (N 18) wanted to lose weight. Only 6.7% (N 4) mentioned the desire to manage stress as their primary goal.
Frequency Distributions: Tabulating and Displaying Data
frequency graph or table can be quite efficient. Even though graphs require considerable space, they do have an arresting quality that captures people’s attention, and so are preferred in any type of oral presentation where space constraints are not an issue. They also can be very effective if used sparingly in reports to emphasize or clarify important pieces of information.
Tips on Preparing Tables and Graphs for Frequency Distributions
Although frequency distributions are not often presented in tables or graphs in research reports, following are a few tips for preparing them. Some of these tips also apply to displays of other statistical information.
•When percentage information is being presented, it is generally not necessary (or desirable) to report the percentages to two or more decimal places. For example, a calculated percentage of 10.092% usually would be reported as 10.1% or, sometimes, 10%.
•In reporting percentages, the level of precision should be consistent throughout a specific table or figure. Thus, if the percentages in a distribution were 10.1%, 25%, and 64.9%, they should be reported either as 10%, 25%, and 65% or 10.1%, 25.0%, and 64.9%.
•A reader should be able to interpret graphs and tables without being forced to refer to the text. Thus, there should be a good, clear title and well-labeled headings (in a table) or axes (in a graph). With frequency information, the table should include information on the total number of cases (N )on which the frequencies were based. Acronyms and abbreviations should be avoided or explained in a note.
•Occasionally there is a substantive reason for showing how much missing information there was. For example, if we were asking people about whether they used illegal drugs, it might be important to indicate what percentage of respondents refused to answer the question. In most cases, however, missing information is not presented, and only valid percentages are shown.
TIP: If you are preparing figures or charts for a poster or slide presentation at a conference, charts created by programs like SPSS, Excel, or Word may suffice. However, for publication in journals, it may be necessary to hire a graphic artist to create professional images. You can also consult books such as those by Few (2004) or Wallgren, Wallgren, Persson, Jorner, and Haaland (1996) for additional guidance on how to prepare statistical graphs.
Research Example
Almost all research reports include some information on frequencies or relative frequencies. Here we describe a published study that used frequency information extensively.
Study: “Physical injuries reported in hospital visits for assault during the pregnancy-associated period” (Nannini et al., 2008).
Study Purpose: The purpose of this research was to describe patterns of physical injuries reported on hospital visits for assault among women during their pregnancy or postpartum period.
Research Design: Using hospital records (linked to natality records) in Massachusetts during the period 2001 to 2005, the researchers obtained data for a sample
Frequency Distributions: Tabulating and Displaying Data
of 1,468 women for 1,675 hospital visits for assault. The first physical injury was noted for each visit that had a physical injury diagnostic code (N 1,528 visits).
Key Variables: The hospital records data were used to describe the distribution of physical injuries by body region and nature of the injury. The variable body region, a nominal-level variable, had five categories: head and neck, spine and back, torso, extremities, and unclassifiable. Nature of injury, another categorical variable, had six categories: fracture, sprain, open wound, contusion, system wide, and other. The researchers also had data regarding the women’s characteristics, including race/ ethnicity and marital status (nominal variables), education
Summary Points
•A frequency distribution is a simple, effective way to impose order on data. A frequency distribution orders data values in a systematic sequence (e.g., from lowest to highest), with a count of the number of times each value was obtained. The sum of all the frequencies ( Σ f ) must equal the sample size (N ).
•In a frequency distribution, information can be presented as absolute frequencies (the counts), relative frequencies (that is, percentages), and cumulative relative frequencies (cumulative percentages for a given value plus all the values that preceded it).
•When there are numerous data values, it may be preferable to construct a grouped frequency distribution , which involves grouping together values into class intervals
•Frequency distribution information can be presented in graphs as well as in tables. Graphs involve plotting the score values on a horizontal axis (the X axis) and frequencies or percentages on the vertical axis (the Y axis).
•Nominal (and some ordinal) data can be displayed graphically in bar charts or pie charts , while
Exercises
The following exercises cover concepts presented in this chapter. Answers to Part A exercises are indicated with a dagger (†) are provided here. Exercises in Part B involve computer analyses using the datasets and answers and comments are offered on the Web site.
(ordinal), and age (ratio level, but shown in a grouped frequency distribution with five class intervals: 20, 20–24, 25–29, 30–34, and 35 ).
Key Findings: The women in this sample of assaulted pregnant or postpartum women tended to be young (64.0% were under age 25) and single (82.6% were unmarried). The distribution of injuries indicated that the women’s head and neck were the most commonly injured body regions (42.2% overall). Injuries to the torso were observed for 21.5% of the pregnant women and 8.7% of the postpartum women. In terms of nature of the injury, the most prevalent type was contusions, observed for 46.5% of the women.
interval and ratio data are usually presented in histograms or frequency polygons.
•Data for a variable can be described in terms of the shape of the frequency distribution. One aspect of shape is modality: A unimodal distribution has one peak or high value, but if there are two or more peaks it is multimodal.
•Another aspect of shape concerns symmetry: A symmetric distribution is one in which the two halves are mirror images of one another.
•A skewed distribution is asymmetric, with the peak pulled off center and one tail longer than the other. A negative skew occurs when the long tail is pointing to the left, and a positive skew occurs when the long tail points to the right.
•A third aspect of a distribution’s shape is kurtosis: Distributions with sharp, thin peaks are leptokurtic, while those with smooth, flat peaks are platykurtic
•A special distribution that is important in statistics is known as the normal distribution (bell-shaped curve), which is unimodal and symmetric.
PART A EXERCISES
A1. The following data represent the number of times that a sample of nursing home residents who were aged 80 or older fell during a 12-month period.
Frequency Distributions: Tabulating and Displaying Data
0341020120 1001250101
0210113210 1311046101
Construct a frequency distribution for this set of data, showing the absolute frequencies, relative frequencies, and cumulative relative frequencies.
A2. Using information from the frequency distribution for Exercise A1, answer the following:
(a)What percentage of the nursing home residents had at least one fall?
(b)What number of falls was the most frequent in this sample?
(c)What number of falls was the least frequent in this sample?
(d)What percentage of residents had two or fewer falls?
(e)What is the total size of the sample?
(f)Are there any outliers in this dataset?
A3. Draw a frequency histogram for the data shown in Exercise A1. Now superimpose a frequency polygon on the histogram. Using a ruler, measure the height and width of your graphs: Is the height about two thirds of the width?
A4. Describe the shape of the frequency distribution drawn in Exercise A3 in terms of modality and skewness. Is the number of falls normally distributed?
A5. If you wanted to display information on patients’ age using the data in Table 5, would you construct a histogram, bar graph, frequency polygon, or pie chart? Defend your selection, and then construct such a graph.
PART B EXERCISES
B1. Using the SPSS dataset Polit2SetA, create a frequency distribution for the variable racethn. You can do this by clicking on Analyze (on the top toolbar menu), then select Descriptive Statistics from the pull-down menu, then Frequencies.

This will bring up a dialog box (this is true in almost all SPSS menu options) in which you can designate the variables of interest and specify certain statistical or output options. For this exercise, click on the variable racethn (the fourth variable in the list) and then click on the arrow in the middle of the dialog box to move this variable into the list for analysis. Then click OK. Based on the output you have created, answer these questions:
(a)What percentage of women in this study were “White, not Hispanic”?
(b)Does the column for “Cumulative Percent” yield meaningful information for this variable?
B2. Re-run the frequency distribution for racethn. This time, use the toolbar with icons that is second from the top. Put the mouse pointer over the icons, from left to right. Find the icon (likely to be the fourth one) that has a “Tool Tip” that reads “Recall recently used dialogs” when you use the mouse pointer.

Recall recently used dialogs
Click on this icon—it will bring up a list of recently used analytic commands. The “Frequencies” command should be at the top of the list because it is the one most recently used, so using this “dialog recall” feature is a useful shortcut when running multiple analyses with different variables. For this run, when the Frequencies dialog box appears, click on the “Charts” pushbutton, and then select “Bar Chart” and “Percentages.” Compare the tabled versus graphic results from Exercises B1 and B2.
B3. Now execute the SPSS Frequency command once again for the variable higrade, highest grade of education for participants (Variable 6). (If you do this analysis right after the previous one, you will need to remove the variable racethn from the variable list with the arrow push button, and then move higrade into the list for analysis.) Examine the frequency distribution information and answer these questions:
(a)What percentage of women completed 16 years of education?
(b)What percentage of women had 10 years or less of education?
(c)How many women had exactly 12 years of education?
B4. Now focus on missing data for the variable higrade, using the same frequency distribution output as in Exercise B3. Answer these questions:
(a)How many cases altogether had valid information, and what percentage of the overall sample did these cases represent?
(b)How many different types of missing values were there?
(c)What were the missing value codes (available by looking at the Variable View screen of the Data Editor, or in the Codebook)?
(d)What do these missing values codes mean?
B5. Re-run the frequency distribution for higrade. This time, when the dialog box comes up, click the pushbutton for “Statistics.” When a new dialog box appears that asks which statistics you would like, click the “Skewness” and “Kurtosis” options that appear in the lower right section under the heading “Distribution.” Then return to the main dialog box (Click Continue) and click OK. Examine the resulting output and then answer these questions:
(a)What are the values for the skewness and kurtosis indexes?
(b)Based on the information shown on the output, would you conclude that this variable is normally distributed?
(c)How would you describe the distribution of scores?