Page 1

Bio Statistics Part 1 INDIAN DENTAL ACADEMY Leader in continuing dental education

Contents • • • • • • • • • • • •

Introduction Common Statistical Terms Source of data Types of data Data presentation Measures of statistical averages or central tendency Types of variability Measures of variation or dispersion Normal distribution or normal curve Sampling Determination of sample size Probability or p value


• Any science needs precision for it’s development. • For precision, facts, observations or measurements have to be expressed in figures. • “It has been said when you can measure what you are speaking about and express it in numbers, you know something about it, but when you cannot express it in numbers your knowledge is of meagre and unsatisfactory kind.” - Lord Kelvin

• Similarly in medicine, be it diagnosis, treatment or research everything depends on measurement. • E.g. you have to measure or count the number of missing teeth OR measure the vertical dimension and express it in number so that it makes sense.

• Statistic or datum means a measured or counted fact or piece of the information stated as a figure such as height of one person, birth weight of a baby etc. • Statistics or data is plural of the same. • Statistics is the science of figures. • Bio statistics is the term used when tools of statistics are applied to data that is derived from biological sciences such as medicine.

Applications and uses of bio statistics as a science • In physiology and anatomy – To define the limits of normality for variable such as height or weight or Blood Pressure etc in a population. – Variation more than natural limits may be pathological i.e abnormal due to play of certain external factors. – To find correlation between two variables like height and weight.

Applications and uses of bio statistics as a science • In pharmacology – To find the action of drugs – To compare the action of two drugs or two successive dosages of same drug – To find the relative potency of a new drug with respect to a standard drug

Applications and uses of bio statistics as a science • In medicine – To compare the efficiency of a particular drug, operation or line of treatment – To find association between two attributes such as cancer and smoking – To identify signs and symptoms of disease

Applications and uses of bio statistics as a science • In community medicine and public health – To test usefulness of sera or vaccine in the field – In epidemiologic studies the role of causative factors is statistically tested

Applications and uses of bio statistics as a science • In research – It helps in compilation of data , drawing conclusions and making recommendations.

Applications and uses of bio statistics as a science • For students – By learning the methods in biostatistics a student learns to evaluate articles published in medical and dental journals or papers read in medical and dental conferences. – He also understands the basic methods of observation in his clinical practice and research.

Common Statistical Terms

Common Statistical Terms • Constant – Quantities that do not vary e.g. in biostatistics, mean, standard deviation are considered constant for a population

• Variable – Characteristics which takes different values for different person, place or thing such as height, weight, blood pressure

• Population – Population includes all persons, events and objects under study. it may be finite or infinite.

Common Statistical Terms • Sample – Defined as a part of a population generally selected so as to be representative of the population whose variables are under study

• Parameter – It is a constant that describes a population e.g. in a college there are 40% girls. This describes the population, hence it is a parameter.

Common Statistical Terms • Statistic – Statistic is a constant that describes the sample e.g. out of 200 students of the same college 45% girls. This 45% will be statistic as it describes the sample

• Attribute – A characteristic based on which the population can be described into categories or class e.g. gender, caste, religion.

Source of data

Source of data • The main sources for collection of data – Experiments – Surveys – Records

• Experiments – Experiments are performed to collect data for investigations and research by one or more workers.

Source of data • Surveys – Carried out for Epidemiological studies in the field by trained teams to find incidence or prevalence of health or disease in a community.

• Records – Records are maintained as a routine in registers and books over a long period of time – provides readymade data.

Types of data

Types of data • Data is of two types • Qualitative or discrete data • Quantitative or continuous data

Types of data • Qualitative or discrete data – In such data there is no notion of magnitude or size of an attribute as the same cannot be measured. – The number of person having the same attribute are variable and are measured – e.g. like out of 100 people 75 have class I occlusion, 15 have class II occlusion and 10 have class III occlusion. – Class I II III are attributes , which cannot be measured in figures, only no of people having it can be determined

Types of data • Quantitative or continuous data – In this the attribute has a magnitude. both the attribute and the number of persons having the attribute vary – E.g Freeway space. It varies for every patient. It is a quantity with a different value for each individual and is measurable. It is continuous as it can take any value between 2 and 4 like it can be 2.10 or 2.55 or 3.07 etc.

Data presentation

Data presentation • Statistical data once collected should be systematically arranged and presented – To arouse interest of readers – For data reduction – To bring out important points clearly and strikingly – For easy grasp and meaningful conclusions – To facilitate further analysis – To facilitate communication

Data presentation • Two main types of data presentation are – Tabulation – Graphic representation diagrams




Data presentation Tabulation • It is the most common method • Data presentation is in the form of columns and rows • It can be of the following types – Simple tables – Frequency distribution tables

Simple Table Number of patients at KIDS, Bgm Jan 06


Feb 06


March 06


Frequency distribution table • In a frequency distribution table, the data is first split into convenient groups ( class interval ) and the number of items ( frequency ) which occurs in each group is shown in adjacent column.

Frequency distribution table Number of Cavities

Number of Patients

0 to 3


3 to 6


6 to 9


9 and above


Data presentation Charts and diagrams • Useful method of presenting statistical data • Powerful impact on imagination of the people

Charts and diagrams • They are – – – – – – – – – –

Bar chart Histogram Frequency polygon Frequency curve Line diagram Cumulative frequency diagram or ogive Scatter diagram Pie chart Pictogram Spot map or map diagram

Bar chart • Length of bars drawn vertical or horizontal is proportional to frequency of variable. • suitable scale is chosen • bars usually equally spaced

Bar chart • They are of three types _simple bar chart _ multiple bar chart • two or more variables are grouped together

_component bar chart • bars are divided into two parts • each part representing certain proportional to magnitude of that item



Simple bar chart 300 250 200 150

Number of CD Patients

100 50 0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

Multiple bar chart 400 350


300 250


370 280


250 220


CD Patients RPD Patients FPD Patients


150 100 50





0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

Component bar chart 3000 2500

500 450

2000 1500

Patients to prostho


1000 1500

200 2100

1850 1400

500 0 1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

Patients to other Departments

Histogram • pictorial presentation of frequency distribution • consists of series of rectangles • class interval given on vertical axis • area of rectangle is proportional to the frequency

Histogram 80


70 60 50 40 30 20






38 29


10 0 Number of carious lesions

0 to 3 3 to 6 6 to 9 9 to 12 12 to 15 15 to 18 18 to 21 21 to 24 24 to 27

Frequency polygon • obtained by joining midpoints of histogram blocks at the height of frequency by straight lines usually forming a polygon

Frequency polygon

Frequency curve • when number of observations is very large and class interval is reduced the frequency polygon losses its angulations becoming a smooth curve known as frequency curve

Frequency curve

Line diagram • line diagram are used to show the trends of events with the passage of time

Line Diagram 90


80 70 60



Patients with periodontitis

40 30


20 10


0 0






Cumulative Frequency Diagram • graphical representation of cumulative frequency . • it is obtained by adding the frequency of previous class

Cumulative Frequency Diagram 100 90 80 70 60 50 40 30 20 10 0

90 70 55 35




0 to 10 to 20 to 30 to 40 to 50 to 60 to 10 20 30 40 50 60 70 yrs yrs yrs yrs yrs yrs yrs

Prevalence of Dental Caries ( in percent)

Scatter or Dot diagram • shows relationship between two variables • If the dots are clustered showing a straight line, it shows a relationship of linear nature

Scatter or Dot diagram 14 12 10 8

Sugar Exposure

6 4 2 0 0



Carious lesion


Pie chart • In this frequencies of the group are shown as segment of circle • Degree of angle denotes the frequency • Angle is calculated by – class frequency X 360 total observations

Pie chart 30, 5% 70, 11% 200, 31%

180, 29%

150, 24%


Pictogram • Popular method of presenting data to the common man

Pictogram Delhi
















Spot map or map diagram • These maps are prepared to show geographic distribution of frequencies of characteristics

Spot map or map diagram

Measures of statistical averages or central tendency

• Average value in a distribution is the one central value around which all the other observations are concentrated • Average value helps – to find most characteristic value of a set of measurements – to find which group is better off by comparing the average of one group with that of the other

• the most commonly used averages are – mean – median – mode

Mean • refers to arithmetic mean • it is the summation of all the observations divided by the total number of observations (n) • denoted by X for sample and µ for population • X = x1 + X2 + X3 …. Xn / n • Advantages – it is easy to calculate • Disadvantages – influenced by extreme values

Median • When all the observation are arranged either in ascending order or descending order, the middle observation is known as median • In case of even number the average of the two middle values is taken • Median is better indicator of central value as it is not affected by the extreme values

Mode • Most frequently occurring observation in a data is called mode • Not often used in medical statistics.

Example • Number of decayed teeth in 10 children 2,2,4,1,3,0,10,2,3,8 • Mean = 34 / 10 = 3.4 • Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2 = 2.5 • Mode = 2 ( 3 Times)

Types of variability

• There are three types of variability – Biological variability – Real variability – Experimental variability

• Experimental subtypes


– Observer Error – Instrumental Error – Sampling Error




Biological variability • It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent • This difference is small and occurs by chance and is within certain accepted biological limits • e.g. vertical dimension may vary from patient to patient

Real Variability • such variability is more than the normal biological limits • the cause of difference is not inherent or natural and is due to some external factors • e.g. difference in incidence of cancer among smokers and non smokers may be due to excessive smoking and not due to chance only

Experimental Variability • it occurs due to the experimental study • they are of three types – Observer error • the investigator may alter some information or not record the measurement correctly

– Instrumental error • this is due to defects in the measuring instrument • both the observer and the instrument error are called non sampling error

– Sampling error or errors of bias • this is the error which occurs when the samples are not chosen at random from population. • Thus the sample does not truly represent the population

Measures of variation or dispersion

• Biological data collected by measurement shows variation • e.g. BP of an individual can show variation even if taken by standardized method and measured by the same person. • Thus one should know what is the normal variation and how to measure it.

• The various measures of variation or dispersion are – Range – Mean or average deviation – Standard deviation – Co efficient of variation

Range • It is the simplest • Defined as the difference between the highest and the lowest figures in a sample • Defines the normal limits of a biological characteristic e.g. freeway space ranges between 2-4 mm • Not satisfactory as based on two extreme values only

Mean deviation • It is the summation of difference or deviations from the mean in any distribution ignoring the + or – sign • Denoted by MD MD = € ( x – x ) n X = observation X = mean n = no of observation

Standard deviation • Also called root mean square deviation • It is an Improvement over mean deviation used most commonly in statistical analysis • Denoted by SD or s for sample and σ for a population • Denoted by the formula SD = € ( x – x )2 n or n-1

• Greater the standard deviation, greater will be the magnitude of dispersion from mean • Small standard deviation means a high degree of uniformity of the observations • Usually measurement beyond the range of ± 2 SD are considered rare or unusual in any distribution

• Uses of Standard Deviation – It summarizes the deviation of a large distribution from it’s mean. – It helps in finding the suitable size of sample e.g. greater deviation indicates the need for larger sample to draw meaningful conclusions – It helps in calculation of standard error which helps us to determine whether the difference between two samples is by chance or real

Coefficient of variation • It is used to compare attributes having two different units of measurement e.g. height and weight • Denoted by CV CV = SD X 100 Mean • and is expressed as percentage

Normal distribution or normal curve

• So much of physiologic variation occurs in any observation • Necessary to – Define normal limits – Determine the chances of an observation being normal – To determine the proportion of observation that lie within a given range

• Normal distribution or normal curve used most commonly in statistics helps us to find these • Large number of observations with a narrow class interval gives a frequency curve called the normal curve

• • • •

It has the following characteristics Bell shaped Bilaterally symmetrical Frequency increases from one side reaches its highest and decreases exactly the way it had increased • The highest point denotes mean, median and mode which coincide

• Mean +_ 1 SD includes 68.27% of all observations . such observations are fairly common • Mean +- 2 SD includes 95.45% of all observations i.e. by convention values beyond this range are uncommon or rare. There chances of being normal is 100 – 95.45 % i.e. only 4.55.%. • Mean +- 3 SD includes 99.73%. such values are very rare. There chance of being normal is 0.27% only • These limits on either side of measurement are called confidence limits


• the look of frequency distribution curve may vary depending on mean and SD . thus it becomes necessary to standardize it. • Eg- One study has SD as 3 and other has SD as 2,thus it becomes difficult to compare them • Thus normal curve is standardized by using the unit of standard deviation to place any measurement with reference to mean. • The curve that emerges through this procedure is called standard normal curve

Properties of standard normal curve • smooth bell shaped • perfectly symmetrical • based on infinite number of observations thus curve does not touch X axis • mean is zero • SD is always 1 • total area under the curve is 1 • mean median mode coincide

• the unit of SD here is relative or standard normal deviate and is denoted by Z Z=x–x SD Z = Observation – Mean SD

• With the help of Z value we can find the area under the curve from a table • This area helps to give the P value


• It is not possible to include each and every member of population as it will be time consuming, costly , laborious . • therefore sampling is done • Sampling is a process by which some unit of a population or universe are selected for the study and by subjecting it to statistical computation, conclusions are drawn about the population from which these units are drawn

• The sample will be a representative of entire population only • It is sufficiently large • It is unbiased • Such sample will have its statistics almost equal to parameters of entire population • Two main characteristics of a representative sample are – Precision – Unbiased character

Precision • Precision depends on a sample size • Ordinarily sample size should not be less than 30 Precision = n s n = sample size , s = standard deviation • Precision is directly proportional to square root of sample size, greater the sample size greater the precision • Also greater the SD, less will be the precision • Thus in such cases to obtain precision, sample size needs to be increased

Unbiased character • The sample should be unbiased i.e. every individual should have an equal chance to be selected in the sample. • Thus a standard random sampling method should be used • Non sampling errors can be taken care of by – Using standardized instruments and criteria – By single , double , triple blind trials – Use of a control group

Determination of sample size

For Quantitative Data • The investigator needs to decide how large an error due to sampling defect is allowable i.e. allowable error L • Either the investigator should start with assumed SD or do a pilot study to estimate SD sample size = 4 SD2 L2

For Quantitative Data • Mean pulse rate of population is 70 beats per min with standard deviation of 8 beats. What will be the sample size if allowable error is ± 1 n = 4 X 8 X 8 = 256 1 • If L is less n will be more i.e. larger the sample size lesser is the error.

For qualitative data • In such data we deal with proportion Sample size = n = 4 p q L2 • p = proportion of positive character • q = proportion of negative character • q = 1-p or (100-p if expressed in percent) • L = allowable error usually 10% of p

For qualitative data • e.g. incidence rate in last influenza was found to be 5% of the population exposed • what should be the size of the sample • to find incidence rate in current epidemic if allowable error is 10%? • p = 5% q = 95% • l = 10 % of p = 0.5% n = 4 X 5 X 95 = 7600 O.5 2

Probability or p value

• Concept of probability is very important in statistics • Probability is the chance of occurrence of any event or permutation combination. • It is denoted by p for sample and P for population • In various tests of significance we are often interested to know whether the observed difference between 2 samples is by chance or due to sampling variation. • There probability or p value is used

• P ranges from 0 to 1 • 0 = there is no chance that the observed difference could not be due to sampling variation • 1 = it is absolutely certain that observed difference between 2 samples is due to sampling variation • However such extreme values are rare.

• P = 0.4 i.e. chances that the difference is due to sampling variation is 4 in 10 • Obviously the chances that it is not due to sampling variation will be 6 in 10 • The essence of any test of significance is to find out p value and draw inference

• If p value is 0.05 or more – it is customary to accept that difference is due to chance (sampling variation) . – The observed difference is said to be statistically not significant.

• If p value is less than 0.05 – observed difference is not due chance but due to role of some external factors. – The observed difference here is said to be statistically significant.

Determination of p value • From shape of normal curve • We know that 95% observation lie within mean ± 2SD . Thus probability of value more or less than this range is 5% • From probability tables • p value is also determined by probability tables in case of student t test or chi square test

Determination of p value • By area under normal curve • Here z= standard normal deviate is calculated • Corresponding to z values the area under the curve is determined (A) • Probability is given by 2(0.5 - A)

Thank you For more details please visit

Bio statistics1/ dental implant courses by Indian dental academy  

The Indian Dental Academy is the Leader in continuing dental education , training dentists in all aspects of dentistry and offering a wide r...