Stat 130 chapter 1

Page 1

Stat 130 – Intro to Math Stat for CS Chapter 1 Reviewer Basic Concepts  Statistics - branch of science that deals with collection, organization, analysis, interaction, and presentation of data. (COAIP)  Population - collection of all elements under consideration in a statistical inquiry  Sample - subset of the population  Variable - attribute of the elements in a collection that can assume different values for the different elements  Observation - realized value of a variable  Data - collection of observations  Constant - characteristic whose outcome can be predicted with certainly  Classification of variables: 1. Qualitative variable - yields categorical responses 2. Quantitative variable - takes on numerical values (values that can be used in operations) representing an amount or quality a) Discrete variable - usually measured by counting or enumeration b) Continuous variable - can assume the infinity many values corresponding to a line interval  Levels of Measurement  Process of determining the value or label of the variable based on what has been observed. (Measured) [DISTINCT, NON-OVERLAPPING, EXHAUSTIVE CATEGORIES] 1. Nominal Level (Classificatory scale)  Weakest level of measurement  Used simply for categorizing subjects into different groups  Attributes are only named 2. Ordinal Level (Ranking scale)  The system arranges the categories according to magnitude  Attributes are ordered 3. Interval Level  Has the properties of the nominal & ordinal levels  The distances between any two numbers on the scale are of known sizes  There is no “true zero” point (relative zero)  3 examples: IQ, Military hours, temperature-Celsius) 4. Ratio level  Has the properties of all subsequent levels  Strongest level of measurement  Has a “true zero” point (absolute zero)  nominal ratio QUALITATIVE QUANTITATIVE ordinal interval 1


 Parameter - summary measure describing a specific characteristic of the population  Statistic - summary measure describing a specific characteristic of the sample Fields of Statistics 1. Applied Statistics - concerned with the PROCEDURES and TECHNIQUES used in COAIP of data. a) Descriptive Statistics -deals with the techniques used in the COAIP of the data on hand -from them, for them information -data from population > describe the population -data from sample > describe the sample only -will not allow us to generalize about the population using sample data b) Inferential Statistics -deals with the techniques used in analyzing the sample data that will lead to generalizations about the population from which the sample came from. -sample data is used to form conclusions about the population -always subject to same error 2. Theoretical or Mathematical Statistics - concerned with the development of the mathematical foundations of the methods used in applied statistics. Sampling Designs  Data Collection Methods - use of documented data - surveys > method of collecting data on the variable of interest by asking people questions - experiments - observation method  General Classification of Collecting Data 1. Census (complete enumeration) - process of gathering info from every unit in the population 2. Survey sampling - process of obtaining info from the units in the selected sample from a well-defined population a) Probability sampling - gives every element of the population a known nonzero chance of being selected on the sample b) Non-probability sampling  Target Population - population from which information is desired  Sampled Population - collection of elements from which the sample is actually taken  Population / Sampling frame - listing of all the individual units in the population  Probability Sampling Designs -Sample Random Sampling (SRS) - method of selecting n units out of N units in the population in such a way that every district sample of size n has an equal chance of being drawn a) SRSWR - with replacement, element mat be chosen more than once b) SRSWOR - without replacement

2


Measures of Central Tendency - any single value that is used to identify the “centerâ€? of the data of the typical value ďƒ˜ Properties of Summation Notation

1. ∑đ?‘›đ?‘–=1(đ?‘‹đ?‘– + đ?‘Œđ?‘– + đ?‘?đ?‘– ) = ∑đ?‘›đ?‘–=1 đ?‘‹đ?‘– + ∑đ?‘›đ?‘–=1 đ?‘Œđ?‘– + ∑đ?‘›đ?‘–=1 đ?‘?đ?‘– 2. ∑đ?‘›đ?‘–=1 đ??śđ?‘‹đ?‘– = đ??ś ∑đ?‘›đ?‘–=1 đ?‘‹đ?‘– 3. ∑đ?‘›đ?‘–=1 đ??ś = đ?‘›đ?‘?

ďƒ˜ Arithmetic Mean - sum of all values in the collection divided by the total number of elements

- Âľ (population mean) ∑đ?‘› đ?‘‹ Ě… - đ?‘‹ = đ?‘–=1 đ?‘– (sample mean)

đ?‘› Characteristics of the Mean: 1) Employs all available information 2) Affected by the value of every observation. Strongly influenced by extreme values 3) May not be an actual number in the data set 4) Always exists and is unique ASIDE: - Raw data - data in their original form; not yet organized or processed - Array - ordered arrangement of data according to magnitude (ordered data / sorted data) ďƒ˜ Median -value that divides the array into two equal parts - Odd: đ?‘€đ?‘‘

= đ?‘‹(đ?‘›+1) 2

- Even: ��

=

đ?‘‹ đ?‘›+1 đ?‘‹ đ?‘›+1 ( )+ ( ) 2 2 2

- Characteristics of the Median: 1) Positional measure 2) Median is affected by position NOT value ďƒ˜ Mode -value that occurs with the greatest frequency - does not exist if frequencies are equal - Characteristics of the Mode: 1) Does not always exist; if it does, it may not be unique 2) It is not affected by extreme values 3) Can be for both qualitative and quantitative data Measures of Location ďƒ˜ Values below which a specified fraction or percentage of the observations in a given set must fall ďƒ˜ Percentiles - values that divide a set of observations in an array into 100 equal parts - đ?‘ƒđ?‘˜ is a value s.t. at least K% of the ordered data are smaller than it and at least (100 - K) % are larger than it, where K = 1, 2, 3, . . . , 99 Measures of Dispersion ďƒ˜ Indicate the extent to which individual items in a series are scattered about an average 3


ďƒ˜ [Range = max-min] -> simplest way ďƒ˜ Variance: đ?œŽ 2

=

2 ∑đ?‘› đ?‘–=1(đ?‘‹đ?‘– − Âľ)

đ?‘

ďƒ˜ Standard Deviation: =

√

, đ?‘ 2

=

2 ∑đ?‘› đ?‘–=1(đ?‘‹đ?‘– − Âľ)

đ?‘

Ě… 2 ∑đ?‘› đ?‘–=1(đ?‘‹đ?‘– − đ?‘‹) đ?‘›âˆ’1

,đ?‘

=√

Ě… 2 ∑đ?‘› đ?‘–=1(đ?‘‹đ?‘– − đ?‘‹) đ?‘›âˆ’1

-> Always positive!

ďƒź Can’t compare 2 data sets (std. dev.) unless they have the same mean ďƒź đ??śđ?‘‰ =

đ?‘ đ?‘Ąđ?‘‘.đ?‘‘đ?‘’đ?‘Ł. đ?‘šđ?‘’đ?‘Žđ?‘›

(100%) ; coefficient of variation -> using this, comparison can take place

Measures of Skewness and Kurtosis 1. Positively Skewed (Skewed to the right) - Longer tail to the right - More concentration of values below than above the mean

2. Negatively Skewed (Skewed to the left) - Longer tail to the left - More concentration of value above than below the mean

ďƒ˜ Pearson’s First and Second Coefficients of Skewness 1.

đ?‘†đ?‘˜ =

2.

đ?‘†đ?‘˜ =

đ?‘‹Ě…−đ?‘€đ?‘œ đ?‘† 3(đ?‘‹Ě…−đ?‘€đ?‘‘) đ?‘†

Sk= 0 -> symmetric, Mean=Md=Mo Sk > 0 -> positively skewed, Mean>Md>Mo 4


Sk < 0 -> negatively skewed, Mean<Md<Mo  Measure of Kurtosis - used to describe the shape of the hump of a relative frequency distribution as compared to the normal distribution  Types: 1. Mesokurtic -> normal hump; k = 3 2. Leptokurtic -> curve is more peaked; hump is sharper; k > 3 3. Platykurtic -> curve is less peaked; hump is flatter; k < 3; more variable Graphical Displays of Data 1. Frequency Histogram - bar graph 2. Stem-and-leaf display

horizontal-intervals vertical-frequencies

CREDITS: Notes by Camille Salazar Encoded by Gerald Roy Campañano

5


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.