Skip to content The Leeds Teaching Hospitals NHS Trust Website
    

Research & Development Website

Statistics Support              

 

dot Sensitivity and Specificity

dot Descriptive Statistics

dot Statistical Analysis

Research and Development usually involves some statistics at some stage. Generally the sooner you think about this the better!

Advice can be obtained from Mr Phil McShane .

If you want to talk to him please complete the ‘Pro- forma' and return it to the R & D office or directly to him.

Sample Size Calculation

When planning a research project it is important to ensure that it has a reasonable chance of finding an effect which is likely to be important. In the past many studies have failed to do this and have therefore been of little benefit. Therefore patients have been exposed to risk, and resources wasted, to no effect. Today a project is unlikely to obtain funding or ethical approval unless these issues are dealt with at the start.

 

What do you need to know?

  How big is the effect you are looking for?       Click Here

  What power and significance do you want?    Click Here

  What variation exists in the population?          Click Here

  How will you analyze the results?                   Click Here

 

Top

Sensitivity, Specificity and related topics

If we carry out a diagnostic test there are 4 possible outcomes

 

True +ve

True -ve

 

Test +ve

a b a+b

Test -ve

c d c+d
  a+c b+d n

The sensitivity of the test is the percentage of true positives that test positive 100a/(a+c)

And the specificity is the percentage of true negatives that test negative    100d/(b+d)

Other terms are also used: the ‘positive predictive value' means the percentage of true positives among those who test positive. It is important to realise that it depends on the frequency of the condition as well as sensitivity and specificity: with a rare disease a test can have good sensitivity and specificity but low ‘positive predictive value'.

All of these are proportions and have standard errors associated with them.

If the test is actually a value of a continuous variable (for example blood pressure or plasma glucose), different values will have different sensitivities and specificities. A plot of sensitivity against specificity (the exact form varies) is called an ‘ROC' (receiver operating characteristic) curve.

Top

 

Descriptive Statistics

Statistics involves description of data as well as analysis. A variety of measures are used to describe data.

The (arithmetic) mean of a population or sample is the ‘average' value.

The median is the value which splits a sample or population in 2, with equal numbers above and below. There are also ‘quartiles' (which divide the population into 4 equal groups), ‘centiles' and other groupings.

The mode is the most common value. Data can have more than one mode. In reality modes are not often used.

If the distribution is symmetrical mean and median are the same.

The variance is a measure of the spread of data about the mean: it is the mean squared distance of values from the population mean (actually the sum of squares is divided by n-1, not n, to allow for the fact that the mean is estimated from the data). Its square root is called the ‘ standard deviation ' (sd, often ‘ σ') and is the commonest measure of spread. For a ‘normal' distribution 95% of cases lie within 2 s.d. of the mean.

The ‘ standard error of the mean ' (sem) is a measure of the precision of an estimate of the mean. It is equal to sd/vn.

For data which are not normally distributed the ‘inter- quartile range' is sometimes used as a measure of spread.

However some datasets are quite hard to describe.

Skewness is a measure of the symmetry of a distribution: for a symmetrical distribution it is 0. Positive values indicate ‘skewing' to the right.

Kurtosis is a function of the fourth power of the observations; its main use is in testing departures from normality.

The ‘ Normal ' Distribution

The ‘normal' or ‘Gaussian' distribution plays an important role in statistics.

It is unimodal and symmetrical: it has 2 ‘parameters' corresponding to mean and variance (or standard deviation).

Its importance comes from the fact that a lot of distributions approximate to it: in particular, means of samples do so as sample size increases, pretty much whatever the original distribution. This is called a ‘central limit theorem' and justifies the widespread use of tests (such as the ‘t-test') based on the normal distribution.

The ‘density function' is:

f(x) = (1/(√2 π σ² ))exp-{(x- μ )²/2 σ² }

where μ = mean σ² = variance exp means ‘e to the power of' and π has its usual meaning,

Tables and software are available for calculating values, especially of tail areas which are used for testing ..

Despite the central limit theorem sometimes we come across data for which the normal distribution does not work well. In some cases a transformation ( eg taking logs) will solve the problem but if not we have to use other methods. These are often called ‘non- parametric'.

If we need to know whether data are in fact normally distributed, there are graphical methods and several tests available in software.

Statistical Analysis

There are a wide variety of statistical methods. We do not always rely on the traditional t- test and chisquare, but have a variety of methods for different problems. If you want the most efficient design for your research you should seek advice early on. Like any other area, statistics is developing all the time.

One area for example concerns measurements which may be correlated; the traditional methods assume that they are independent, but that is unreasonable if you are doing measurements on the same person. Apart from anything else, your papers may get sent back if you do so.

‘Logistic regression' is now fairly familiar but there is a variety of more general methods.

Top

 

 

Home | Patients | Careers | News | Corporate | Resources | Freedom of Information | Website Help