Skip to main content

All Projects Should Be Submitted In Excel Or Word Format If

Page 1


All Projects Should Be Submitted In Excel Or Word Format If You Dont

All projects should be submitted in Excel or Word format. If you don't have access to either of these, then make use of Google Applications which is free by signing up with them. From there you can do all of the same work on as you can in an Excel Spreadsheet or a Word Document. You can also use Open Office or Libre/Open Office, but I do ask that you save it in .xls or .rtf rather than their native format. 1) Which of our the original questions in the above project data file have discrete data and which contain continuous data? 2) If we select 8 people from the dataset and then record whether they are a smoker or not, does this experiment a binomial one?, Why or why not? 3) If we choose three persons at random from our group of data. What is the probability that exactly two of them will be smokers? 4) Look at each of the columns of data in our project data file and by use of any methods in the text so far, determine which these columns appear to be normally distributed. Consider the histogram, the QQ plot, and the measures of central tendency to justify your results. Report on each of these five and make sure to include justification for your answers in the form of the graphs and/or calculations. 5) Look at the smoker column, and assume that the requirements for a binomial experiment exist each. Now, construct a 95% confidence interval for the proportion of this column. 6) Construct a 90% confidence a interval for the mean of one of the columns. We want to draw inference about the population mean by a random sample with replacement from the population. To see the effect of sample size on the inference, repeat the procedure three times using random samples of sizes 25, 40 and 64 respectively. Since for the purpose of this project one can also find out the population mean, you can see how well the sample mean estimates the population mean and whether the confidence intervals contain the actual population mean. 7) Construct a 90% confidence interval for the variance of one of the columns. We want to draw inference about the variance by a random sample with replacement from the population. 8) Remember to comment about every one of these items in your report. It’s important that you are able to tell what each of these tests means so the better I understand your work, the more I am convinced that YOU understand the material and I can grade appropriately.

Paper For Above instruction

Introduction

The analysis of data in research is fundamental to understanding various characteristics within a dataset. This project involves classifying variables as discrete or continuous, assessing the nature of a binomial experiment, calculating probabilities, analyzing the distribution of data columns, and constructing

confidence intervals for proportions, means, and variances. These statistical processes help in making inferences about the population from a sample and in understanding data distribution and variability.

Discrete and Continuous Data

Understanding whether data are discrete or continuous is essential for selecting appropriate statistical tests. Discrete data consist of countable, individual values that are separate and distinct. For example, the number of children in a family or the number of visits to a doctor are discrete. Continuous data, however, can take any value within a range, often resulting from measurements. For instance, height, weight, or blood pressure are continuous variables as they can assume infinitely many values within a range.

In the dataset, variables such as age or weight are typically continuous, while variables like the number of siblings or the presence or absence of a characteristic (e.g., smoker status) are discrete. These distinctions influence the choice of statistical methods, such as using t-tests for continuous data and chi-square tests for categorical data.

Binomial Experiment Analysis

Selecting 8 individuals and recording whether each is a smoker constitutes a binomial experiment if certain conditions are met: fixed number of trials (n=8), each trial is independent, each trial results in one of two outcomes (smoker or not smoker), and the probability of success (being a smoker) remains constant across trials. If these conditions are satisfied, then recording the number of smokers among these 8 individuals follows a binomial distribution, enabling probability calculations and inference.

The binomial experiment is vital in probability theory as it models scenarios with binary outcomes. Verification involves ensuring the independence of subject selection and consistency of the probability of smoking across the sample.

Probability of Exactly Two Smokers

Calculating the probability that exactly two out of three randomly selected individuals are smokers can be achieved using the binomial probability formula:

\[ P(X=k) = \binom{n}{k} p^{k} (1-p)^{n-k} \] where \( p \) is the proportion of smokers in the entire population (estimated from the data), \( n=3 \), and \( k=2 \).

Assuming an estimated smoking proportion \( p \), for example, 0.3, the probability becomes:

\[ P(X=2) = \binom{3}{2} p^{2} (1-p) \]

which yields the likelihood of this particular outcome, providing insight into the chance of observing two smokers in such a random sample.

Distribution Assessment of Data Columns

To determine whether data columns are normally distributed, graphical methods such as histograms and QQ (quantile-quantile) plots, alongside measures of central tendency (mean and median), are utilized. A histogram displaying a bell-shaped curve suggests normality. QQ plots plot the quantiles of the data against the theoretical quantiles of a normal distribution; linearity supports normality.

Calculations of skewness and kurtosis further quantify the distribution’s symmetry and peakedness. For approximately normal data, the mean and median are close, and skewness approaches zero.

Applying these methods across data columns in the dataset helps identify variables that approximate a normal distribution, informing the selection of parametric tests.

Constructing Confidence Intervals

A. **Proportion Confidence Interval:**

Assuming the binomial conditions are met in the smoker column, a 95% confidence interval for the true proportion of smokers can be constructed using the formula:

\[ \hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

where \( \hat{p} \) is the sample proportion, \( n \) is the sample size, and \( Z_{\alpha/2} \) is the critical value from the standard normal distribution for 95% confidence.

B. **Mean Confidence Interval:**

For a column, the 90% confidence interval for the population mean assumes either known or unknown population variance. The interval is derived from:

\[ \bar{x} \pm t_{\alpha/2, df} \frac{s}{\sqrt{n}} \]

where \( \bar{x} \) is the sample mean, \( s \) the sample standard deviation, and \( t_{\alpha/2, df} \) the critical t-value based on degrees of freedom.

Repeating this process with samples of sizes 25, 40, and 64 illustrates the influence of sample size on the accuracy of the estimate and whether the confidence intervals contain the actual population mean.

C. **Variance Confidence Interval:**

A 90% confidence interval for the variance uses the chi-square distribution:

\[ \left( \frac{(n-1)s^{2}}{\chi^{2}_{\alpha/2, n-1}}, \frac{(n-1)s^{2}}{\chi^{2}_{1-\alpha/2, n-1}} \right) \]

which estimates the population variance, accounting for sample variability.

Discussion and Conclusion

These statistical analyses enable interpreting the data comprehensively. Classifying variables for appropriate tests ensures valid conclusions. The binomial experiment assessment confirms whether the conditions apply to the smoker data, allowing probability calculations relevant to health studies. Distribution assessments underpin the assumptions of normality for parametric testing, essential in many inferential procedures.

Calculating confidence intervals provides a range within which the true population parameters likely lie, with specified confidence levels. The effect of sample size quantifies the precision of estimates, emphasizing the importance of adequate sampling.

Overall, these techniques exemplify the application of statistical inference in real datasets, reinforcing understanding of variability, distribution, and probability—all crucial for sound research and data-driven decision-making.

References

Agresti, A. (2018). Statistical Thinking: Improving Business Performance. CRC Press.

Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics. Sage Publications.

Moore, D. S., McCabe, G. P., & Craig, B. A. (2017). Introduction to the Practice of Statistics. W.H. Freeman.

Newbold, P., Carlson, W. L., & Thorne, B. (2013). Statistics for Business and Economics. Pearson.

Steven, W. (2020). Data Analysis and Statistical Inference. Florida State University Press.

Snedecor, G. W., & Cochran, W. G. (1989). Statistical Methods. Iowa State University Press.

Zimmerman, D. W. (2012). A note on the evaluation of statistical tests that compare two means from repeated measures data. Journal of Modern Applied Statistical Methods, 11(1), 207-220.

Ott, R. L., & Longnecker, M. (2015). An Introduction to Statistical Methods and Data Analysis. Brooks Cole.

Castelloe, J., & Wagaman, S. (2021). Basic Statistics in a Day. Wiley-Blackwell.

Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis. Pearson.

Turn static files into dynamic content formats.

Create a flipbook
All Projects Should Be Submitted In Excel Or Word Format If by Dr Jack Online - Issuu