Saturday, February 11, 2012

Uncertainties in Data

Showing uncertainties in raw data:

Students should be using one or more measuring tools to collect their

raw data. The most common way to present this raw data is by way of a data

table. An acceptable way to give both the unit and instrument precision of that

measuring tool is to list the variable being measured in a column heading and

give both the unit and tool precision as part of that same heading. For example:



Temperature

( +/- 1 0C ) {for a thermometer with 1 degree markings}

or

Distance Travelled

( +/- 0.1 cm ) or ( +/- 1 mm) {for a ruler with smallest increments of 1 mm}



Students should receive training to not report raw data beyond the limit of

the measuring tool being used. Thus, they should also be consistent in the use

of decimals in their data set. If a student is using the metric ruler shown above

with a precision of +/- 0.1 cm, they should not report some measurements such

as 6.1 cm, others as 6.25 cm and still others as 6 cm. The degree of precision of

the instrument should dictate the consistent choice of decimal place. The data

set shown above should read: 6.1 cm, 6.3 cm, and 6.0 cm.



Other forms of uncertainties / errors can be given as bullet points beneath

a data table. For example, if a student is takes a reading ‘late’ it could/should be

noted, if the instrument used is calibrated before using (or not) it could/should be

noted. Note: Outlier points should be given in raw data even if the student is

later going to exclude those points from their processing and analysis.



Showing uncertainties in presentation of processed data:

There are many ways to show that data which has undergone processing

of some type should not be considered ‘exact’. One of the best ways to

represent uncertainties in processed quantitative data is by the use of error bars

within graphs. If you recall, the lower limit of replicates in data collection is five.

This means that students should be attempting at least 5 ‘trials’ or ‘repeats’ for

each data point that is being attempted. One of the advantages to these repeats

is that now a mean can be calculated from the five (or more) data points

generated from each trial. The mean is more trustworthy than any one of the

individual points.

Another advantage is that the student now could decide to calculate the

standard deviation of this set of data. There is currently no requirement that

students use any form of statistical testing, but calculation of standard deviation

is, in itself, a form of representing uncertainty as long as the student understands

that standard deviation is only showing how closely the data set is clustered

around the mean and does not show overarching things like “the data is or is not

valid”.

Here is how a student could now use their five (or more) replicates as

error bars within a graph. The student should be graphing their independent

variable on the “x” axis and dependent variable on the “y” axis. Each plotted

point should only be the means calculated earlier. Two common forms of error

bars are:

1) plot the +/- standard deviation above and below the mean point

2) plot the range of the data (upper limit and lower limit which led to the

mean)

Either system provides a visual display of how closely the data is clustered

around the mean. A data point with a relatively small error bar is data that was

fairly consistent; a data point with a relatively large error bar is data that perhaps

showed little consistency upon collection and thus is perhaps not as ‘trustworthy’.

This makes it much easier to both identify and justify excluding an outlier point.

An error bar that becomes much smaller when excluding a single data collection

point is case in point.

Error bars also give students a chance to discuss one source of

‘weaknesses and limitations’ within their Conclusion and Evaluation section of an

IA lab report. Students should make an attempt to dissect the data and not just

attempt to give an overall pattern. There are many other things they should also

consider as part of this section as well.

If students are going to use one or more statistical tests within their data

processing, training should occur to show students the limitations of what each

statistical test indicates about the data. For example, chi-square analysis can

only show how observed data compares to predicted data and standard deviation

can only show how closely data is clustered around a mean. Students often

accomplish a statistical test and then do not know what to do with the results.

No comments:

Post a Comment