What is BioStatistics ? A very gentle, intuitive introduction to probabilities and risk Georgeta D. Vaidean, MD, MPH, PhD Lecture 5 PHRM 510 TCOP- Spring 2010

Objectives • To differentiate between study design and statistics • To identify, where in the grand scheme of a study, we use statistics • To differentiate between internal and external validity • To enumerate and recognize basic types of sampling • To define and apply in exemples, the concepts of target, sampled and study populations • To define major categories of “statistics” • To calculate simple probabilities as they apply in medical science

â&#x20AC;˘ Why do we need biostatistics??

Basic designs

time

• How do we do a study? • Where do we start?

The anatomy and physiology of research

Results

Study population

The anatomy and physiology of research

Results

‘Virtual’ Target Population VTP

Real Sampled Population RSP

Sample Study population

The anatomy and physiology of research External Validity ?

Internal Validity Results

Yes

Valid ?

Yes

No

Population

Sample Study population Disregard

Statistically significant?

No

Discuss, Reconsider

What is Statistics ? • Latin status = state; statisticus of politics • German Statistik study of political facts and figures

• Early: Passive display of numbers, charts, data – E.g. Census, Labor statistics

• Now: a discipline of data-based reasoning – Opinion polls, weather forecast – Science • Social Sciences, Medical Sciences, all sciences

– Sports – Games – Art

Biostatistics

Statistics in everyday life -examples-

– Employment rates- by the Bureau of Labor Statistics – Cos of living – Gallup Poll •

http://www.gallup.com/poll/125666/Flu-Cold-Cases-Remain-Lower-One-Year-Ago.aspx

What kind of statistics are these ?

Statistics in everyday life -examples– Casinos, Horse races and alike – Weather Forecast : • How is the weather going to be tomorrow? • Will it snow tomorrow? – Statistically speaking…… – How forecast is made? – Is it possible that it will snow in April? » How likely is such an event?

What kind of statistics are these ?

Statistics • 1. Singular: a mathematical science • 2. Plural of statistic – refers to a quantity calculated from data • Numbers, means, ranges • E.g. – http://www.city-data.com/ – CDC http://www.cdc.gov/DataStatistics/ » http://www.cdc.gov/asthma/brfss/07/lifetime/tableL1.htm – AHA: Heart Disease and Stroke Statistics — 2010 Update » PMID: 20019324 » http://circ.ahajournals.org/cgi/reprint/CIRCULATIONAHA.109. 192667

How many types of Statistics do we have ?

â&#x20AC;˘ No, it is not that infamous quote

Statistics • 1. Descriptive Statistics – Statistical methods used to summarize or describe data

• 2. Inferential statistics ( Predictive Statistics) – Statistical methods used to account for randomness and uncertainty in the observations, and then used to draw inferences and to make predictions.

â&#x20AC;˘ Nature of Descriptive Statistics How do we summarize complex results of clinical studies in a meaningful way? â&#x20AC;˘ Nature of Statistical Inference We have data from a sample, we get some results/findings. They are 1) valid and, 2) they are not just chance findings. Can we extrapolate our results to the greater population which gave rise to our sample? And/or to populations/patients in the future in similar circumstances?

Statistics in Medical Sciences • Stems from uncertainty of clinical events and decisions. Medicine is an inexact science • We can seldom predict something with 100% certainty • Variability of individuals to measurement and response to treatment • Basically deals with the “random Variability” or “noise” • Does not deal with error of collecting data or with errors in design ( bias) • We rely on the probability theory for quantifying uncertainty

Statistics in Medical Science-e.g. – – – –

Disease “ rates” - by CDC Cost of medication in ICU Polls- i.e. Surveys Are the TCOP student’s grades higher than TCOM student’s ?

What kind of statistics is this?

– Forecast- i.e. prognosis • How is the patient going to be tomorrow/next year/ in 5 years? • What is the chance that diabetic patients with uncontrolled HTN will develop an MI? • What is the chance that a trauma patient with 75% burns will survive ?

• Will we have a flu pandemic ? – Statistically speaking…… – How forecast/prediction is made? » How likely it is such an event?

What kind of statistics is this?

Statistics in Medical Research Can my valid results be extrapolated to the population from which the sample was drawn? And to the VTP? Inferential Statistics

VTP

Valid Results

RSP

Is this a “representative” sample? Use rigorous “Sampling Strategies” Is this a large enough sample? Use Sample size/Power calculations

How can I describe this population? Descriptive Statistics

Are there statistically significant differences between groups?

Statistics in Medical Research: Very Important Notes 1. If the Results are NOT VALID, STATS do NOT MATTER, you can safely throw the paper away. Can my valid results be extrapolated to the population from which the sample was drawn? Inferential Statistics.

2. Validity is not a statistical issue. We have to use our judgment. Valid Results

Is this a “representative” sample? Use various rigorous “Sampling Strategies”

How can I describe this population? Descriptive Statistics

Are there statistically significant differences between groups?

3. If something is statistically significant, it does NOT

A good discussion and examples of sampling in : Access Pharmacy http://www.accesspharmacy.com/content.aspx? aID=2046616#2046616

Source: Access Pharmacy

Sampling

The Population (puppy-lation)

A Sample Source: Shiva Gautam, Ph. D.

Generally the population is not homogeneous

Giving rise to variability in samples

Sample 1 Source: Shiva Gautam, Ph. D.

Sample 2

All people with HTN People in our study

Population: A class of patients about whom we would like to draw conclusions. Sample: A group of patients from the population whose outcome is known.

What is a representative sample?

RSP

Sampled Population

Sample Study population

What is a representative sample? Statistics plays one important role here, by using an appropriate sampling method

Our study population

•Simple Random Sampling •Systematic Sampling •Stratified Sampling •Cluster Sampling •Randomization schemes in RCT

Sampling: Please read on your own in Access Pharmacy • “Probability & Related Topics for Making Inferences About Data“ Beth Dawson, Robert G. Trapp: Basic & Clinical Biostatistics, 4e. Chapter 4. The section entitled “Populations & Samples” http://www.accesspharmacy.com/content.aspx?aID=2046680&referURL=http%3A//www.accesspharmacy.com/content.aspx%3FaID%3D2046680

• Stop before “Population parameters”. Time needed ~ 15 minutes • You should be able to: – Enumerate 6 Reasons for Sampling • You need to understand how they work like. No need to memorize the simple random sampling using a table, as described ( just browse and understand the general idea) – Enumerate and differentiate between the 4 major methods of sampling and understand the Random assignment and the Block randomization in RCT – Quizzes and exams may contain questions about these topics

Probability Pharmacology is a rather exact science. Medicine is an inexact science. In medicine, we can never predict an outcome with absolute certainty. For example, Interferon may cause hepatic injury. How do we diagnose hepatic injury in patients? Even good diagnostic tests are not perfect, and there is always a probability of being â&#x20AC;&#x153;wrongâ&#x20AC;?. In dealing with Public Health population-based issues, the level of uncertainty is even greater than in clinical Epi. Understanding probability is important for decision making . We rely on probability theories to quantify uncertainty that is inherent in any decision making process. We always deal with uncertainly. Our only hope is to reduce that level of uncertainty, to a level that will allow us to take decisions ( diagnostic , therapeutic, public health decisions).

What is Probability? â&#x20AC;˘ In terms of relative frequency, the probability of an event, E:

P(E)=

# times E occurs # times E can occur

Probability deals with the relative likelihood that a certain event will or will not occur, relative to some other events. We can derive probabilities in one of two ways: empirically or theorretically.

Empirically derived probabilities • Example 1: give an example of when you thought about probabilities • Example 2 : horse races: odds (a special type of probability) – How are these derived?

• Example 3: probability of survival for a cancer patient – How are these derived? • Based on known survival rates of similar patients who have the same stage of disease and have undergone the same Tx regimen

• Note: The assumption is: we can predict the future only if it will happen under the same circumstances as the past !

Empirically derived probabilities • "If you hear hoof beats, think horses—not zebras" • What does this mean? • This has 2 amendments: – 1. Leave the door open for surprises – 2. What do you say if you are in Kenya?

Theoretically derived probabilities Atlantic City

â&#x20AC;&#x201C; The payoffs are marked â&#x20AC;&#x201C; The odds given for rolling a 7 or 11 on the 1 st throw, or hitting a certain number, are not based on the experience of the croupier; they are figured out based on probability theory.

Theoretically derived probabilities

â&#x20AC;˘ Each of the 6 sides has the same p to end up on top; which one appears is random event â&#x20AC;˘ What is the p of rolling a 3 in one toss of the die?

Disclaimer â&#x20AC;˘ All these calculations are meant to make you better PharmD practitioners and researchers, not to lead you down the road of dangers by making you better gamblers !

Probability properties • P= [0,1] – P=1 – P=0 What does it mean? – P=0.5 – Can we say 50% ?

• The Sum of the probabilities of all possible events = 1

Frequency distributions â&#x20AC;˘ Example. A sample of 1047 nondiseased men 40-59 yrs old. The event of interest is the specific interval of cholesterol levels into which a manâ&#x20AC;&#x2122;s cholesterol falls

Problems â&#x20AC;˘ 1. What is the probability that the serum cholesterol level of a man randomly selected from the sample is between 160- and 179 mg/dl? â&#x20AC;˘ What is the probability that a nondiseased , 50-yr old man randomly selected from the population from which the sample was drawn will have a serum cholesterol level less than 200 mg/dl?

The 2 x 2 Table

• Is a simple table to summarize data • Marginal Totals • Grand total • E.g. determining the accuracy of a diagnostic test • Suppose we invented a new test to diagnose hepatitis ( we have many patients taking Interferon, remember?) • Gold standard- defined Disease • Our new experimental test, T

D+

D-

Totals

T+

7

4

11

T-

3

86

89

10

90

100

Problem 1. What is the probability that a person selected at random from the 100 study participants has the disease ( as determined by the gold standard)? Problem 2. What is the probability that a person selected at random has a positive result using the experimental test ?

• Joint probability

– P( A and B) – P ( ( D-) and (T-) )

• Conditional probability – P (B I A) – P ( Chol between 120-139) IF we already know that Chol is < 240) – P (( D+) I T+ )

• More later, when we will discuss diagnostic tests • •

Optional: The Meaning of the Term "Probability“ in Basic & Clinical Biostatistics, 4e. Beth Dawson, Robert G. Trapp,in Access Pharmacy. http://www.accesspharmacy.com/content.aspx?aID=2046616#2046616 However, all exams/quizzes will make reference only to the class slides and class discussion on this topic.

Hypothetical example • Hypothetical example: CityA has a population of 200,000 adults • The health department tells us there are 5000 cases of a disease called Bigbrain diseases. We select a random sample of 1000 people, that is representative of the city’s adult population( how we do this? Well, we need a good statistician to do that for us). Fact is, we get the 1000 people sample and let’s say we put them all in one big waiting room. • We go in that room and we pick at random 1 person. What is the probability that that person has the Bigbrain disease?

How do we express the concept of sampling in terms of .. probability, frequency?

Our study population

Statistics plays an important role here, by using an appropriate sampling method

What is next? 4. Extrapolations/ Inferences

3. Prediction

Are they really different?

1. Sampling

2. Statistical Significance tests

In this introductory course we will explore and do a few basic statistical significance test, we will look at and understand the idea behind predictions. The rest will be the subject of more advanced courses.

Dealing with math IS difficult. Innumeracy is quite .. human • We have made a lot of progress in quantifying estimates for various risks, with some probabilities. Yet we are running against an eternal problem: many people cannot understand figures; or if they can, they refuse to act accordingly [PMID: 15310796] • Innumeracy is pretty “natural” for humans :

– Arithmetical operations that are more complicated than simple addition or subtraction just do not correspond with the operations that we apply in reasoning and deliberation. – It is hard for most people to imagine very big and very small numbers. – The calculation and interpretation of risks: despite 200 years of probability calculation, very little of this has penetrated the everyday thinking of the population at large. • E.g. smokers are still afraid of flying

• Innumeracy research: People having trouble with “numbers” has been studied • Nobel prize for economics in 2002: Daniel Kahneman (Princeton, NJ), Vernon L. Smith ( George Mason, VA) – People find it difficult to deal with numbers and have trouble estimating “risks” ; people find small risks and big numbers very difficult – people are extremely inconsistent when it comes to weighing advantages against disadvantages: the probability of a loss is weighed much more heavily than an equally large probability of an equally large benefit, which makes behavioral choices regarding health and disease appear irrational [Judgment under uncertainty: heuristics and biases. Science 1974;185:1124–31]

Disclaimer • All the exemples in this course are adapted, modified and sometimes old examples. They were selected because of their ability to illustrate a concept. Not because they w’d reflect the “newest medical discovery “.

Pre requisites for the Biostat section of this class: • Keep the mind open • Question your assumption • Do not allow yourself to be buried by an “ avalanche of mathematical trivia and neologisms” • Accept that innumeracy is a given to our human nature and keep up your optimism . Use your innate logic, and do the homework. • “Ingest” the material in small daily bites, it makes it more digestible. • From time to time, stand back and observe the… big picture • Ask

DON’T

DO • Use your logic • Use your common sense • Figure out what exactly are you looking for, to prove or disapprove with a “statistic”. • Keep it simple • When in doubt, ask • Remember: Stats do not make data meaningful. No Statistics will save a faulty design !

Probability risk

probability risk basic info