Skip to content The Leeds Teaching Hospitals NHS Trust Website
    

Research & Development Website

Statistics Support              

 

Statistical Graphs

Graphs are widely used to display statistical data and to help with interpretation. Most software will produce a variety of graphs

For looking at simple ‘univariate' data there are several methods. A ‘histogram' is a type of bar chart in which frequency is plotted against range of values. Here is a histogram of haemoglobin levels in a group of patients.

‘Boxplots' are also commonly used, especially for ‘skewed' data.

For displaying relations between 2 variables, a ‘scatterplot' is often used.

Here is a scatterplot of haemoglobin vs age for the same patients.

Back to Main Stats Menu

Power and Significance 1

Hypothesis testing

‘Power' and ‘Significance' are commonly used concept in statistics.

To understand them we need to understand the idea of ‘hypothesis testing'. We set up 2 hypotheses, usually called ‘null' and ‘alternative'.

In a clinical trial our ‘null hypothesis' is typically that the new treatment is no better than placebo or control. In another sort of trial the null hypothesis is that the defendant is innocent.

The ‘alternative hypothesis' in the first case is that the treatment is better than control; in the second that the defendant is guilty.

In both cases we ask ‘is the evidence strong enough to cause us to reject the null hypothesis and choose the alternative? In other words in both cases there is a presumption in favour of the ‘null hypothesis'.

In statistics we express this evidence quantitatively and power and significance help us to do so.

Significance and Type I errors

To power and type II errors

Back to Main Stats Menu              Back to sample sizes

Confidence Intervals

When we estimate some value, for example a mean or a proportion, we usually know that although it is (we hope) the best estimate it is not necessarily the true value for the population we are interested in.

For example if we want to know the mean height of 10 year old children in Leeds , we could take a random sample, measure their heights and calculate the mean of that sample. If we took another sample we would get a different sample mean. So the sample mean is not a precise measure of the population mean.

A confidence interval (c.i.) is a way of expressing this imprecision.

Instead of saying that the population mean is the sample mean, we can think of a range of values around the sample mean and say that the population mean is likely to lie within that range. The range in which we say there is a 90% chance that the true value lies is called a ‘90% confidence interval'; typically it is evenly distributed around the estimate. Often an approximate 95% interval is calculated as ‘estimate +/- 2 x standard error'.

However this formula may not always be accurate. If a c.i. includes impossible values (for examples proportions greater than 100%) it is time to use a better method!

A ‘95% c.i'. will include a ‘90% c.i.'

There is a relation between confidence intervals and significance tests: for example, if we are interested in knowing what difference exists between 2 groups, to say the difference is not significant is equivalent to saying that the confidence interval for the difference includes zero. The significance test however tells us nothing about other possible values for the difference and so is less informative than the c.i.

Back to Main Stats Menu

Statistical Software

Almost all statistical analysis is performed on computers, using suitable software.

If you are going to be doing such analysis, you need to decide what software you will use.

There is a very wide range: Genstat, Minitab, Stata and other programs all have their fans. Some of these date back before PCs, but all are available to run on them and some run on other computers as well. Most of them now run through GUIs (with menus and mice). All of them have fairly regular upgrades.

Further comments will be confined to some main ones: SPSS , SAS , free software and Excel with add- ins.

This is not because the others are no good but because these it is not possible to deal with them all.

If you have connections to the University, you can get software very cheaply. The Trust provides SPSS Otherwise you pay up, or download free software..

If you want to know more about software, reviews are sometimes published in statistical journals, particularly ‘The American Statistician'.

Whatever you use you will have to spend some time learning to do so effectively.

It is a good idea to cite your software in papers etc.

Back to Main Stats Menu

 

 

Home | Patients | Careers | News | Corporate | Resources | Freedom of Information | Website Help