Lecturer: James Betts (UC) Additional Reading (for next week): • Greenhalgh T. (2006) How to Read a Paper: the basics of evidence-based medicine, 3rd edition. Oxford: Blackwell ◦Chapters 3 & 4 at least Objectives of the course: • Develop IT skills • Introduce scientific method • Expand understanding of research design • Learn to describe and analyse data appropriately • Gain the ability to individually assess the value of scientific evidence • Improve ability to write scientifically Lectures: Wednesday (9.15) - 4 East 3.38 (weeks 1-9) On week 7 there will be a test (incorporating what is on the slides, what is said in lectures and any specified additional reading) Computer classes: Thursday (10.15) - 2 East 1.14 (weeks 2-10) The unit web page has information on the content of the classes Reminder: get SPSS on laptop! Seminars and tutorials take place in tutor groups, check emails for further information www.bath.ac.uk/~jb335/ is the site for electronic copies of the handouts The coursework component is 60% of the final grade and will involve reviewing, analysing and evaluating research design, data etc. of scientific reports Research Question: "Is it safer to give birth at home or in hospital?"

Possible considerations: • Study population ◦Who, how many and why? • Experimental design ◦What to compare and how • Context ◦Where, when and why? • Intervention ◦Define and justify • Outcomes ◦Define and justify • Sources of bias • Ethics

Lecturer: James Betts Topics: • The research process • The research design continuum • Experimental designs • Sampling methods • Scientific reasoning • Quantitative and qualitative research strategies What is research? Research is about systematically organising facts and data Research is... • Systematic - research process • Logical - induction/deduction • Empirical - evidence based • Reductive - generalisation (potentially beyond sample population) • Replicable - methodology The research process: Cyclical Review available literature (Find what isn't known) Publish your findings Formulate a question Interpret your findings Select APPROPRIATE research design Collect RELEVANT data Reductionism: Simplifying to the most basic form (Reduced) (Complex) Basic Science Applied Science (Maths) (Physiology) Basic science is: • Theoretical?

• More invasive? • Lab based? • Tight control? • Lacks external validity? • Focus on mechanisms • More reductionist Whereas applied science is: • Quick answers? • Less invasive? • Field based? • Loosely controlled? • Externally valid? • Less reductionist External validity - does our research apply to the "real world"? Internal validity - are we actually testing what we say we are? Analytical Research: • Critical account of present understanding • Meta-analysis is a quantitative review

Historical Research: • Accessing primary (witnesses) and secondary (literature) sources to document past events Philosophical Research: • Organising existing evidence into a comprehensive theoretical model Descriptive Research: • Case study - accrual of detailed information from an individual, group or team etc. • Is it refutable? Often cannot be proved or disproved • Survey ◦Cross-sectional - status of various groups at a certain point in time ◦Longitudinal - status of various groups at multiple points in time ◦Correlational - relationships between variables Correlational evidence: As X increases Y increases Does X cause Y or vice versa? Does Z increase both X and Y? CORRELATIONS DO NOT INFER CAUSALITY! (And vice versa - sex & pregnancy) Experimental Research: • Establishment of causality, one variables influence on another • All other variables (extraneous) must be controlled to isolate the effect of the independent variable (IV) on the dependent variable (DV) • Independent variable - cause, allowed to vary/manipulated a.k.a. Predictor variable • Dependent variable - effect, should only vary due to the influence of the IV a.k.a. Criterion variable Law of single variable: there are always uncontrollable influences • Confounding variable - extraneous variable that have co-varied with the IV Experimental Designs: • Pre-experimental • Quasi-experimental • True-experimental

Key: • R = random assignment to equivalent groups (via: repeated measures design, matched pairs design or matched groups design) • O1,2,3... = observation of group X • Oa,b,c... = observation of group Y • T = treatment • P = placebo Pre-experimental designs: One shot study T O1 One group pre-test post-test study O1 T O2 Static group comparison (Daniel 1:8) T O1 P Oa Quasi-experimental designs: Time series O1 O2 O3 T O4 O5 O6 True-experimental designs: Randomised group comparison R T O1 (Equivalent groups, assume they are the same) P O2 Pre-test post-test randomised group comparisons R O1 T O2 O3 P O4 Solomon four group design R O1 T O2 O3 P O4 T O5 P O6 Sampling: 12 smarties total

2 red smarties The more you sample the better your results, small sample sizes have lots of noise Target population = N Sample population = n Effective sampling means n is representative of N Statistics drawn from n so that the DV can be generalised from n to N Sampling Methods: • Random - all members of N have equal chance of selection • Stage - randomly select a group, then take a sample • Cluster - selecting a natural group to sample from • Stratified - identify strata and sample accordingly ◦E.g. Men vs Women (49% & 51% globally, therefore if n = 100 there should be 49 males and 51 females) • Systematic - start at a random point and select at regular intervals • Opportunity - sample a convenient group Quantitative, confirm theory from observation General Theory Deductive Specific Observation Inductive Qualitative, devise theory from observation Qualitative: Quantitative: Create a novel theory Assess pre-stated theory Ethnography - researcher is inherent part Often involves hypothesis testing of study Minimises researcher influence Infers complex statements or opinions Infers statistics Data collected permits open responses Data collected involves closed responses Choice of research strategy: • Based on: ◦Epistemology (how should we be attempting to assess knowledge?) ‣ Positivism - explain phenomena ‣ Interpretivism - understand phenomena ◦Ontology (does the data exist in a tangible or intangible form?)

‣ Objectivism - explain independent external outcomes ‣ Constructionism - understand how social factors interact • Study in the natural sciences often requires positive epistemology and objectivistic ontology • Study in the social sciences often requires interpretive epistemology and constructionist ontology • However it is possible to combine the two if you code qualitive data quantitatively (I.e. athlete = 1 and non-athlete = 2) Additional reading: see handout

Lecturer: James Betts Statistics: • Descriptive ◦Organising, summarising and describing data • Correlational ◦Relationships Significance testing • Inferential ◦Generalising Measured variables: Measurement is the assignment of numbers to objects, events or abstract concepts according to a known set of rules Data can be categorised, quantified and analysed to draw meaningful conclusions The data that is gathered we refer to as variables • Organismic data ◦Physiological, psychological or performance based characteristics of an organism • Environmental ◦Characteristics relating to the organisms environment (what we subject the test subject to) Levels of measurement: 1. Nominal scale 2. Ordinal scale 3. Interval scale 4. Ratio scale As you move down the scale the next step possesses all the characteristics of the previous step plus adds something more (N.O.I.R.) Nominal data: • Measure of identity or category • Quantifying qualitative data • No information or order or magnitude • Categories must be mutually exclusive (cant fall into two categories) and comprehensive (you must be able to select one) • E.g. 20 like coke, 10 like Pepsi (cannot be ranked) Ordinal data:

• Measure of order or rank • Data can be arranged into series • Provides no information regarding magnitude • E.g. (Small, medium or large) Interval data: • Measure of order and quantity • Difference between values can be calculated • Cannot establish an 'X-fold increase' - something is twice as big as something else • There is a non-standard 0 point, the units can be changed • E.g. Foot size Ratio data: • Interval scale with an absolute 0 • Subsumes all lower levels • E.g. Race times Nominal data: 38.08 s Ordinal data: Silver Interval: +0.1 s Ratio: +x % SI Units: Seven 'constant' base units using the metric system Variable Unit Symbol Accepted derivations Distance Metre m ha (area), (angle), L (volume) Mass Kilogram kg t Time Second s min, h, d Temp. Kelvin K Molarity Mole mol L (volume) Current Ampere A Luminance Candela cd Units are always lower case (except K, A, L), neither italicised nor pluralised and with a space between the value and unit (including %, excluding ) Discrete and continuous variables: • Discrete ◦Described by a specific, distinct point on a scale

◦Cannot be subdivided further ◦E.g. Gender • Continuous ◦Theoretically can take any value between two points on a continuum ◦Dependent on accuracy of measuring tools ◦E.g. Time Indicators of central tendency: • Mode ◦Most frequently occurring score • Median ◦Middle score • Mean ◦Arithmetic average Mode: Advantages - quick and easy to compute, unaffected by extreme scores, used at any level of measurement Disadvantages - terminal statistic (can't do anything with it), certain sub groups could make this statistic unrepresentative (tramps on a bus) Median: Advantages - unaffected by extreme scores, can be used at all levels above nominal Disadvantages - only considers order, value ignored Mean: There are many types of mean: arithmetic, harmonic, truncated, geometric...etc. Advantages - very sensitive measure, takes into account ALL data, can be combined with other group means to give overall mean Disadvantages - very sensitive measure, only interval/ratio data, only when scores are symmetrical above and below Distribution: • Often displayed graphically • x = measured variable • y = frequency Normal distribution (bell-shaped) curve:

• Naturally occurring • Asymptotic (theoretically never hits zero) • Symmetrical • Mean, median and mode are all in the same place Point of inflection (where line changes from concave to convex) takes in 68.6 % of the sample. From the middle point POI occurs 1 standard deviation away, 2 SD's (13.59 %), 3 SD's (2.15 %) Average = 3500 S.D. = 1000 Raw Score = 4500 Z Score = +1 The study of SD size is kurtosis Non-normal distribution: Negative skew - 'tail' to the left, mode (peak), median (middle), mean (towards outlier) Distribution: • Determines which measure of central tendency to use • Determines which measure of variability to use • Provides a Z score for standard comparison • Determines further statistical tests ◦Parametric (assumes normal distribution, interval/ratio data, random, more powerful) ◦Non-parametric (simply calculated, distribution free, less powerful) Variability: • Standard deviation • Standard error measurements • Range • Inter-quartile range • Normalised confidence intervals

Lecturer: James Betts Continuation of descriptive statistics: • SD represents the spread of data/variability of results around the mean • SEM represents how confident we can be in our mean result, is it representative? • SEM = SD of target pop. (Estimated from sample) • Small SD and large n means a small SEM • We can use our standard error measurement to say that we can be 68 % certain that the mean of the target population lies within this distance of our mean • No methods section should include SEM Median and range or inter quartile range: • Mean +/- SD or SEM cannot be used for non-normal distributions • Therefore we have to use the median and range/IQR • I.e. median (range) = 37.5 (8-179) • IQR removes extreme scores as it uses the 25th and 75th percentile rather than the full range of data Accuracy and precision of reporting: • Use maximum number of decimal places available throughout data analysis • Don't report means, medians, SD etc. to more decimal places than the raw data RELIABILITY AND VALIDITY: Validity - "the soundness or appropriateness of a test or instrument in measuring what it is designed to measure" Reliability - "the degree to which a test or measure produces the same scores when applied in the same circumstances" Objectivity - "the degree to which different observers agree on measurements" Internal validity - "is the experimenter measuring the effect of the independent variable on the dependent variable?" External validity - "can the results be generalised to the wider population?"

Validity Logical

Statistical Construct

Face

Content

Reliability

Concurrent

Consistency

Predictive

Objectivity

Face • Infers test is valid by definition • Clear that test measures what it is supposed to • E.g. Ruler drop test has face validity • Externally valid? • Subjective Content • Infers that the test measures all of the aspects that contribute to the variable of interest • Subjective Logical • Simply appears to measure the right variable in its entirety Concurrent • Infers that the test produces similar results to a previously validated test Predictive • Test provides a valid reflection of future performance using a similar test • Can be performance during test A be used to predict performance in test B Statistical • Produces results that agree with other similar tests Construct: • Infers not only that the test measures what it is supposed to, but is also capable of detecting what should exist, theoretically • Relates to hypothetical, intangible constructs • E.g. Team rivalry and sportsmanship • Assessment is difficult • If what should exist cannot be detected this could mean:

◦The test is invalid ◦The theory is incorrect ◦There are sensitivity/specificity issues

Sensitivity - "the test is sensitive enough that approximately x % of positive people will receive a positive result" Specificity - "the test is specific enough that approximately x % of negative people will receive a negative result) Threats to validity: • Internal ◦Maturation ‣ Changes in DV over time irrespective of IV ‣ Could occur in a one group post-test pre-test ‣ Solve with time series or pre-test post-test randomised group comparison ‣ Repeated measures design can occasionally be an inappropriate solution even when randomised and counterbalanced • E.g muscle damage (repeated bout effect) • Vitamin supplementation (washout period) • An independent measures design would have to be used ◦History ‣ Unplanned events between tests e.g. Exercise ‣ Solve by controlling extraneous variables ◦Pre-test ‣ Learning, sensitisation can occur (interactive effects due to the pre-test) ‣ Also influences external validity ‣ Solve with Solomon 4 group design ◦Statistical regression (a.k.a regression to the mean) ‣ Initial extreme score is likely to be succeeded by less extreme scores ‣ E.g. Training has the greatest effect on the untrained ‣ Solve with effective sampling ◦Instrumentation ‣ A difference in the way 2 comparable variables were measured ‣ E.g. Uncalibrated equipment ‣ Solve by calibration ◦Selection bias ‣ Groups for comparison are not equivalent ‣ Not randomly assigned

‣ Solve by random assignment, pre-test post-test difference, use repeated measures ◦Experimental mortality ‣ Missing data due to subject drop out ‣ Also affects external validity ‣ Reduced n and statistical power ‣ Challenges quality of data (internal) but also our ability to generalise ‣ Solve by recruiting sufficient participants (30 % more in invasive trials) • External ◦Inadequate description ‣ Should be replicable ‣ If nobody can replicate the experiment then it is irrefutable and lacks external validity ‣ Solve with a comprehensive methodology ◦Biased sampling ‣ Linked to statistical regression ‣ Sample does not reflect target population ‣ n = N ‣ E.g. Results are generalised across gender ‣ Solve by random sampling of target population ◦Hawthorne effect ‣ Dependent variable is influenced by the fact it is being recorded ‣ "White coat effect" ‣ Solve by controlling environment ◦Demand characteristics ‣ Participants detect the purpose of the experiment and behave accordingly ‣ Solve by using blind trials ◦Operationalisation ‣ a.k.a. Ecological validity ‣ Dependent variable must have relevance in the real world ‣ Solve by choosing the dependent variable carefully Reliability is a pre-requisite of validity Relative reliability - individuals maintain position within group Absolute reliability - scores are the same each time

Rater reliability: • Intrarater reliability ◦Consistency of a given observer or measurement tool on more than one occasion ◦E.g. Boxing referee • Interrater reliability ◦Consistency of a given measurement from more than one observer or measurement tool ◦E.g. Panel scoring

Lecturer: James Betts • Virtually all measurements have errors ◦I.e. measured score = "true scrore" +/- error (linked to SD) • Reliability and measurement error are not the same, rather reliability infers an acceptable degree of measurement error Systematic error: Any variable causing a consistent shift in the mean in a given direction E.g. Omitting snacks in between meals Random error: Fluctuation of scores due to chance E.g. Inaccurate descriptions of food consumed Randomly above/below - random error Consistently above/below - systematic error Assessment of error: Systematic - evidence of bias between means (I.e. one lower/higher than the other) Random - scatter plot • r = 0 - infers high error • r = 1 - infers no error • In general good agreement r > 0.7 (SD^2)

Retrospective recall: remembering afterwards, subjects prone to forgetting (systematic error) and description changes (random error)

Direct record - record as you go

Systematic error

Systematic Variance

Random error

14

18

10

9

10

12

8

11

11

15

21

17

17

22

14

12

Error Variance

Systematic and Random - calculate mean and difference scores, scatter plot Bland-Altman Plot Some systematic error Little systematic error Little Random error 3 points visual assessment: Little Random error

Some systematic error Some Random error Little systematic error Some Random error

Points evenly distributed about 0 line Do points deviate greatly from mean line? Is the error consistent from left to right? (Funnelling effect)

Why is error important? â€˘ Measurement error is clearly of importance when evaluating agreement between two tools â€˘ Consideration of error is also relevant when attempting to establish intervention effects/treatment differences â—ŚI.e. where some of the variance between trials is due to the IV We strive to increase the proportion of variance due to the IV (yellow section) We try to minimise systematic and error variance by controlling conditions and counter-balancing Smallest Worthwhile Effect: Even a small amount of primary variance from an ergogenic aid would guarantee victory for either competitor (Sprint example) However error variance is such that a re-run could produce entirely different results For an effect to be considered worthwhile it would need to exceed time by more than his error variance Take your mean speed higher than his PB/90% of his results Primary Variance Maximise effect of IV Increase size of IV effect

Systematic variance Minimise

Error Variance

P-value: probability that total variance is completely down to chance (not due to primary variance) P = 0.01 infers a 99% chance that the IV is responsible for variance in DV In exercise science we must be at least 95% sure that our effect is due to the IV before concluding a significant difference I.e. P 0.05 Quantitative analysis of nominal data • Requires dichotomous variables • Can be coded for more objective analysis • Does not require any consideration of normality • Uses χ test

Test of interest

P value

96.6% confident

P value No significant difference in proportion of users according to group

(Comparison of observed frequency (Comparison of two observed frequency counts against what would be counts) expected according to null hypothesis) Assumptions for χ : • Normal distribution not required • Cells in table should be independent • 80% of cells must have frequency greater than 5 • All must be selected at least once - more categories = more subjects • Can't use percentages