Bio Statistics Part 1 INDIAN DENTAL ACADEMY Leader in continuing dental education www.indiandentalacademy.com

www.indiandentalacademy.com

Contents • • • • • • • • • • • •

Introduction Common Statistical Terms Source of data Types of data Data presentation Measures of statistical averages or central tendency Types of variability Measures of variation or dispersion Normal distribution or normal curve Sampling Determination of sample size Probability or p value www.indiandentalacademy.com

Introduction

www.indiandentalacademy.com

• Any science needs precision for it’s development. • For precision, facts, observations or measurements have to be expressed in figures. • “It has been said when you can measure what you are speaking about and express it in numbers, you know something about it, but when you cannot express it in numbers your knowledge is of meagre and unsatisfactory kind.” - Lord Kelvin www.indiandentalacademy.com

â€˘ Similarly in medicine, be it diagnosis, treatment or research everything depends on measurement. â€˘ E.g. you have to measure or count the number of missing teeth OR measure the vertical dimension and express it in number so that it makes sense.

www.indiandentalacademy.com

• Statistic or datum means a measured or counted fact or piece of the information stated as a figure such as height of one person, birth weight of a baby etc. • Statistics or data is plural of the same. • Statistics is the science of figures. • Bio statistics is the term used when tools of statistics are applied to data that is derived from biological sciences such as medicine. www.indiandentalacademy.com

Applications and uses of bio statistics as a science • In physiology and anatomy – To define the limits of normality for variable such as height or weight or Blood Pressure etc in a population. – Variation more than natural limits may be pathological i.e abnormal due to play of certain external factors. – To find correlation between two variables like height and weight. www.indiandentalacademy.com

Applications and uses of bio statistics as a science • In pharmacology – To find the action of drugs – To compare the action of two drugs or two successive dosages of same drug – To find the relative potency of a new drug with respect to a standard drug

www.indiandentalacademy.com

Applications and uses of bio statistics as a science • In medicine – To compare the efficiency of a particular drug, operation or line of treatment – To find association between two attributes such as cancer and smoking – To identify signs and symptoms of disease

www.indiandentalacademy.com

Applications and uses of bio statistics as a science • In community medicine and public health – To test usefulness of sera or vaccine in the field – In epidemiologic studies the role of causative factors is statistically tested

www.indiandentalacademy.com

Applications and uses of bio statistics as a science â€˘ In research â€“ It helps in compilation of data , drawing conclusions and making recommendations.

www.indiandentalacademy.com

Applications and uses of bio statistics as a science â€˘ For students â€“ By learning the methods in biostatistics a student learns to evaluate articles published in medical and dental journals or papers read in medical and dental conferences. â€“ He also understands the basic methods of observation in his clinical practice and research.

www.indiandentalacademy.com

Common Statistical Terms

www.indiandentalacademy.com

Common Statistical Terms • Constant – Quantities that do not vary e.g. in biostatistics, mean, standard deviation are considered constant for a population

• Variable – Characteristics which takes different values for different person, place or thing such as height, weight, blood pressure

• Population – Population includes all persons, events and objects under study. it may be finite or infinite. www.indiandentalacademy.com

Common Statistical Terms • Sample – Defined as a part of a population generally selected so as to be representative of the population whose variables are under study

• Parameter – It is a constant that describes a population e.g. in a college there are 40% girls. This describes the population, hence it is a parameter. www.indiandentalacademy.com

Common Statistical Terms • Statistic – Statistic is a constant that describes the sample e.g. out of 200 students of the same college 45% girls. This 45% will be statistic as it describes the sample

• Attribute – A characteristic based on which the population can be described into categories or class e.g. gender, caste, religion. www.indiandentalacademy.com

Source of data

www.indiandentalacademy.com

Source of data • The main sources for collection of data – Experiments – Surveys – Records

• Experiments – Experiments are performed to collect data for investigations and research by one or more workers. www.indiandentalacademy.com

Source of data • Surveys – Carried out for Epidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community.

• Records – Records are maintained as a routine in registers and books over a long period of time – provides readymade data. www.indiandentalacademy.com

Types of data

www.indiandentalacademy.com

Types of data • Data is of two types • Qualitative or discrete data • Quantitative or continuous data

www.indiandentalacademy.com

Types of data • Qualitative or discrete data – In such data there is no notion of magnitude or size of an attribute as the same cannot be measured. – The number of person having the same attribute are variable and are measured – e.g. like out of 100 people 75 have class I occlusion, 15 have class II occlusion and 10 have class III occlusion. – Class I II III are attributes , which cannot be measured in figures, only no of people having it can be determined www.indiandentalacademy.com

Types of data • Quantitative or continuous data – In this the attribute has a magnitude. both the attribute and the number of persons having the attribute vary – E.g Freeway space. It varies for every patient. It is a quantity with a different value for each individual and is measurable. It is continuous as it can take any value between 2 and 4 like it can be 2.10 or 2.55 or 3.07 etc. www.indiandentalacademy.com

Data presentation

www.indiandentalacademy.com

Data presentation • Statistical data once collected should be systematically arranged and presented – To arouse interest of readers – For data reduction – To bring out important points clearly and strikingly – For easy grasp and meaningful conclusions – To facilitate further analysis – To facilitate communication www.indiandentalacademy.com

Data presentation • Two main types of data presentation are – Tabulation – Graphic representation diagrams

with

www.indiandentalacademy.com

charts

and

Data presentation Tabulation • It is the most common method • Data presentation is in the form of columns and rows • It can be of the following types – Simple tables – Frequency distribution tables www.indiandentalacademy.com

Simple Table Number of patients at KIDS, Bgm Jan 06

2,800

Feb 06

1,900

March 06

1,750 www.indiandentalacademy.com

Frequency distribution table â€˘ In a frequency distribution table, the data is first split into convenient groups ( class interval ) and the number of items ( frequency ) which occurs in each group is shown in adjacent column.

www.indiandentalacademy.com

Frequency distribution table Number of Cavities

Number of Patients

0 to 3

78

3 to 6

67

6 to 9

32

9 and above

16 www.indiandentalacademy.com

Data presentation Charts and diagrams â€˘ Useful method of presenting statistical data â€˘ Powerful impact on imagination of the people

www.indiandentalacademy.com

Charts and diagrams • They are – – – – – – – – – –

Bar chart Histogram Frequency polygon Frequency curve Line diagram Cumulative frequency diagram or ogive Scatter diagram Pie chart Pictogram Spot map or map diagram www.indiandentalacademy.com

Bar chart • Length of bars drawn vertical or horizontal is proportional to frequency of variable. • suitable scale is chosen • bars usually equally spaced

www.indiandentalacademy.com

Bar chart • They are of three types _simple bar chart _ multiple bar chart • two or more variables are grouped together

_component bar chart • bars are divided into two parts • each part representing certain proportional to magnitude of that item

www.indiandentalacademy.com

item

and

Simple bar chart 300 250 200 150

Number of CD Patients

100 50 0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

www.indiandentalacademy.com

Multiple bar chart 400 350

320

300 250

390

370 280

290

250 220

200

CD Patients RPD Patients FPD Patients

180

150 100 50

80

95

45

40

0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

www.indiandentalacademy.com

Component bar chart 3000 2500

500 450

2000 1500

Patients to prostho

300

1000 1500

200 2100

1850 1400

500 0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

www.indiandentalacademy.com

Patients to other Departments

Histogram • pictorial presentation of frequency distribution • consists of series of rectangles • class interval given on vertical axis • area of rectangle is proportional to the frequency

www.indiandentalacademy.com

Histogram 80

75

70 60 50 40 30 20

45

43

40

34

32

38 29

22

10 0 Number of carious lesions www.indiandentalacademy.com

0 to 3 3 to 6 6 to 9 9 to 12 12 to 15 15 to 18 18 to 21 21 to 24 24 to 27

Frequency polygon â€˘ obtained by joining midpoints of histogram blocks at the height of frequency by straight lines usually forming a polygon

www.indiandentalacademy.com

Frequency polygon

www.indiandentalacademy.com

Frequency curve â€˘ when number of observations is very large and class interval is reduced the frequency polygon losses its angulations becoming a smooth curve known as frequency curve

www.indiandentalacademy.com

Frequency curve

www.indiandentalacademy.com

Line diagram â€˘ line diagram are used to show the trends of events with the passage of time

www.indiandentalacademy.com

Line Diagram 90

85

80 70 60

60

50

Patients with periodontitis

40 30

25

20 10

10

0 0

1

2

3

4

www.indiandentalacademy.com

5

Cumulative Frequency Diagram â€˘ graphical representation of cumulative frequency . â€˘ it is obtained by adding the frequency of previous class

www.indiandentalacademy.com

Cumulative Frequency Diagram 100 90 80 70 60 50 40 30 20 10 0

90 70 55 35

40

45

25

0 to 10 to 20 to 30 to 40 to 50 to 60 to 10 20 30 40 50 60 70 yrs yrs yrs yrs yrs yrs yrs www.indiandentalacademy.com

Prevalence of Dental Caries ( in percent)

Scatter or Dot diagram â€˘ shows relationship between two variables â€˘ If the dots are clustered showing a straight line, it shows a relationship of linear nature

www.indiandentalacademy.com

Scatter or Dot diagram 14 12 10 8

Sugar Exposure

6 4 2 0 0

5

10

Carious lesion www.indiandentalacademy.com

15

Pie chart • In this frequencies of the group are shown as segment of circle • Degree of angle denotes the frequency • Angle is calculated by – class frequency X 360 total observations

www.indiandentalacademy.com

Pie chart 30, 5% 70, 11% 200, 31%

180, 29%

150, 24%

www.indiandentalacademy.com

PROSTHO CONSO PERIO ORTHO PEDO

Pictogram â€˘ Popular method of presenting data to the common man

www.indiandentalacademy.com

Pictogram Delhi

9000

Bombay

11000

Chennai

8000

Kolkatta

5000

Hyderabad

6000

Bangalore

12000

Pune

4000

Lucknow

5000

www.indiandentalacademy.com

Spot map or map diagram â€˘ These maps are prepared to show geographic distribution of frequencies of characteristics

www.indiandentalacademy.com

Spot map or map diagram

www.indiandentalacademy.com

Measures of statistical averages or central tendency

www.indiandentalacademy.com

• Average value in a distribution is the one central value around which all the other observations are concentrated • Average value helps – to find most characteristic value of a set of measurements – to find which group is better off by comparing the average of one group with that of the other www.indiandentalacademy.com

• the most commonly used averages are – mean – median – mode

www.indiandentalacademy.com

Mean • refers to arithmetic mean • it is the summation of all the observations divided by the total number of observations (n) • denoted by X for sample and µ for population • X = x1 + X2 + X3 …. Xn / n • Advantages – it is easy to calculate • Disadvantages – influenced by extreme values

www.indiandentalacademy.com

Median • When all the observation are arranged either in ascending order or descending order, the middle observation is known as median • In case of even number the average of the two middle values is taken • Median is better indicator of central value as it is not affected by the extreme values www.indiandentalacademy.com

Mode â€˘ Most frequently occurring observation in a data is called mode â€˘ Not often used in medical statistics.

www.indiandentalacademy.com

Example • Number of decayed teeth in 10 children 2,2,4,1,3,0,10,2,3,8 • Mean = 34 / 10 = 3.4 • Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2 = 2.5 • Mode = 2 ( 3 Times) www.indiandentalacademy.com

Types of variability

www.indiandentalacademy.com

• There are three types of variability – Biological variability – Real variability – Experimental variability

• Experimental subtypes

variability

– Observer Error – Instrumental Error – Sampling Error www.indiandentalacademy.com

are

of

three

Biological variability â€˘ It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent â€˘ This difference is small and occurs by chance and is within certain accepted biological limits â€˘ e.g. vertical dimension may vary from patient to patient www.indiandentalacademy.com

Real Variability â€˘ such variability is more than the normal biological limits â€˘ the cause of difference is not inherent or natural and is due to some external factors â€˘ e.g. difference in incidence of cancer among smokers and non smokers may be due to excessive smoking and not due to chance only www.indiandentalacademy.com

Experimental Variability • it occurs due to the experimental study • they are of three types – Observer error • the investigator may alter some information or not record the measurement correctly

– Instrumental error • this is due to defects in the measuring instrument • both the observer and the instrument error are called non sampling error

– Sampling error or errors of bias • this is the error which occurs when the samples are not chosen at random from population. • Thus the sample does not truly represent the population www.indiandentalacademy.com

Measures of variation or dispersion

www.indiandentalacademy.com

• Biological data collected by measurement shows variation • e.g. BP of an individual can show variation even if taken by standardized method and measured by the same person. • Thus one should know what is the normal variation and how to measure it.

www.indiandentalacademy.com

• The various measures of variation or dispersion are – Range – Mean or average deviation – Standard deviation – Co efficient of variation

www.indiandentalacademy.com

Range • It is the simplest • Defined as the difference between the highest and the lowest figures in a sample • Defines the normal limits of a biological characteristic e.g. freeway space ranges between 2-4 mm • Not satisfactory as based on two extreme values only www.indiandentalacademy.com

Mean deviation • It is the summation of difference or deviations from the mean in any distribution ignoring the + or – sign • Denoted by MD MD = € ( x – x ) n X = observation X = mean n = no of observation www.indiandentalacademy.com

Standard deviation • Also called root mean square deviation • It is an Improvement over mean deviation used most commonly in statistical analysis • Denoted by SD or s for sample and σ for a population • Denoted by the formula SD = € ( x – x )2 n or n-1 www.indiandentalacademy.com

• Greater the standard deviation, greater will be the magnitude of dispersion from mean • Small standard deviation means a high degree of uniformity of the observations • Usually measurement beyond the range of ± 2 SD are considered rare or unusual in any distribution

www.indiandentalacademy.com

• Uses of Standard Deviation – It summarizes the deviation of a large distribution from it’s mean. – It helps in finding the suitable size of sample e.g. greater deviation indicates the need for larger sample to draw meaningful conclusions – It helps in calculation of standard error which helps us to determine whether the difference between two samples is by chance or real www.indiandentalacademy.com

Coefficient of variation • It is used to compare attributes having two different units of measurement e.g. height and weight • Denoted by CV CV = SD X 100 Mean • and is expressed as percentage www.indiandentalacademy.com

Normal distribution or normal curve

www.indiandentalacademy.com

• So much of physiologic variation occurs in any observation • Necessary to – Define normal limits – Determine the chances of an observation being normal – To determine the proportion of observation that lie within a given range www.indiandentalacademy.com

â€˘ Normal distribution or normal curve used most commonly in statistics helps us to find these â€˘ Large number of observations with a narrow class interval gives a frequency curve called the normal curve

www.indiandentalacademy.com

• • • •

It has the following characteristics Bell shaped Bilaterally symmetrical Frequency increases from one side reaches its highest and decreases exactly the way it had increased • The highest point denotes mean, median and mode which coincide www.indiandentalacademy.com

www.indiandentalacademy.com

• Mean +_ 1 SD includes 68.27% of all observations . such observations are fairly common • Mean +- 2 SD includes 95.45% of all observations i.e. by convention values beyond this range are uncommon or rare. There chances of being normal is 100 – 95.45 % i.e. only 4.55.%. • Mean +- 3 SD includes 99.73%. such values are very rare. There chance of being normal is 0.27% only • These limits on either side of measurement are called confidence limits www.indiandentalacademy.com

www.indiandentalacademy.com

Example

www.indiandentalacademy.com

• the look of frequency distribution curve may vary depending on mean and SD . thus it becomes necessary to standardize it. • Eg- One study has SD as 3 and other has SD as 2,thus it becomes difficult to compare them • Thus normal curve is standardized by using the unit of standard deviation to place any measurement with reference to mean. • The curve that emerges through this procedure is called standard normal curve www.indiandentalacademy.com

www.indiandentalacademy.com

Properties of standard normal curve • smooth bell shaped • perfectly symmetrical • based on infinite number of observations thus curve does not touch X axis • mean is zero • SD is always 1 • total area under the curve is 1 • mean median mode coincide www.indiandentalacademy.com

• the unit of SD here is relative or standard normal deviate and is denoted by Z Z=x–x SD Z = Observation – Mean SD

www.indiandentalacademy.com

â€˘ With the help of Z value we can find the area under the curve from a table â€˘ This area helps to give the P value

www.indiandentalacademy.com

www.indiandentalacademy.com

Sampling

www.indiandentalacademy.com

â€˘ It is not possible to include each and every member of population as it will be time consuming, costly , laborious . â€˘ therefore sampling is done â€˘ Sampling is a process by which some unit of a population or universe are selected for the study and by subjecting it to statistical computation, conclusions are drawn about the population from which these units are drawn www.indiandentalacademy.com

• The sample will be a representative of entire population only • It is sufficiently large • It is unbiased • Such sample will have its statistics almost equal to parameters of entire population • Two main characteristics of a representative sample are – Precision – Unbiased character www.indiandentalacademy.com

Precision • Precision depends on a sample size • Ordinarily sample size should not be less than 30 Precision = n s n = sample size , s = standard deviation • Precision is directly proportional to square root of sample size, greater the sample size greater the precision • Also greater the SD, less will be the precision • Thus in such cases to obtain precision, sample size needs to be increased

www.indiandentalacademy.com

Unbiased character • The sample should be unbiased i.e. every individual should have an equal chance to be selected in the sample. • Thus a standard random sampling method should be used • Non sampling errors can be taken care of by – Using standardized instruments and criteria – By single , double , triple blind trials – Use of a control group www.indiandentalacademy.com

Determination of sample size

www.indiandentalacademy.com

For Quantitative Data â€˘ The investigator needs to decide how large an error due to sampling defect is allowable i.e. allowable error L â€˘ Either the investigator should start with assumed SD or do a pilot study to estimate SD sample size = 4 SD2 L2 www.indiandentalacademy.com

For Quantitative Data • Mean pulse rate of population is 70 beats per min with standard deviation of 8 beats. What will be the sample size if allowable error is ± 1 n = 4 X 8 X 8 = 256 1 • If L is less n will be more i.e. larger the sample size lesser is the error. www.indiandentalacademy.com

For qualitative data • In such data we deal with proportion Sample size = n = 4 p q L2 • p = proportion of positive character • q = proportion of negative character • q = 1-p or (100-p if expressed in percent) • L = allowable error usually 10% of p www.indiandentalacademy.com

For qualitative data • e.g. incidence rate in last influenza was found to be 5% of the population exposed • what should be the size of the sample • to find incidence rate in current epidemic if allowable error is 10%? • p = 5% q = 95% • l = 10 % of p = 0.5% n = 4 X 5 X 95 = 7600 O.5 2 www.indiandentalacademy.com

Probability or p value

www.indiandentalacademy.com

• Concept of probability is very important in statistics • Probability is the chance of occurrence of any event or permutation combination. • It is denoted by p for sample and P for population • In various tests of significance we are often interested to know whether the observed difference between 2 samples is by chance or due to sampling variation. • There probability or p value is used www.indiandentalacademy.com

• P ranges from 0 to 1 • 0 = there is no chance that the observed difference could not be due to sampling variation • 1 = it is absolutely certain that observed difference between 2 samples is due to sampling variation • However such extreme values are rare. www.indiandentalacademy.com

• P = 0.4 i.e. chances that the difference is due to sampling variation is 4 in 10 • Obviously the chances that it is not due to sampling variation will be 6 in 10 • The essence of any test of significance is to find out p value and draw inference

www.indiandentalacademy.com

• If p value is 0.05 or more – it is customary to accept that difference is due to chance (sampling variation) . – The observed difference is said to be statistically not significant.

• If p value is less than 0.05 – observed difference is not due chance but due to role of some external factors. – The observed difference here is said to be statistically significant. www.indiandentalacademy.com

Determination of p value • From shape of normal curve • We know that 95% observation lie within mean ± 2SD . Thus probability of value more or less than this range is 5% • From probability tables • p value is also determined by probability tables in case of student t test or chi square test www.indiandentalacademy.com

Determination of p value • By area under normal curve • Here z= standard normal deviate is calculated • Corresponding to z values the area under the curve is determined (A) • Probability is given by 2(0.5 - A)

www.indiandentalacademy.com

Thank you For more details please visit www.indiandentalacademy.com

www.indiandentalacademy.com