Physics is an experimental science. And every experiment requires measurement. And every measurement is imperfect.

The scientific method always starts with a question. In light of current events, a natural question that comes to mind is: how big are viruses?

That, of course, is a bit too vague for us to really do anything and we will need to refine our question. Do we want to know the average size of viruses across many families or do we want to characterize the size of a specific virus? And what, exactly, do we mean by size? Volume? Surface area? Length? These are all valid questions, but let's settle on this: "What is the average length and width of the Enterobacteria T4 phage?"

Why did we choose that virus? Simply because there happens to be some excellent electron microscope images already available. The shape of the virus might seem weird, but it is well designed for its purpose.

The Cell Image Library hosts a catalog of interesting images. We are going to use eight images of T4 phage, which are available at the links below.

- http://cellimagelibrary.org/images/41128
- http://cellimagelibrary.org/images/41125
- http://cellimagelibrary.org/images/41130
- http://cellimagelibrary.org/images/41127
- http://cellimagelibrary.org/images/41124
- http://cellimagelibrary.org/images/41126
- http://cellimagelibrary.org/images/41129
- http://cellimagelibrary.org/images/41131

You should download each image, preferably at the highest resolution available. You will note that each image has an embedded scale.

We will do this old school using a ruler.

At the end of this process, we will have over 200 measurements of the length and width of the T4 phage. The question is: how should we compile all this data to report the average length and width?

You can imagine that out in the wild there will be some variation in the size of a T4 phage. Experience also tells us that if ten people measure the same quantity there will be some variation in the reported values, even if they used the same instruments. In light of these two sources of *noise* how do we come to any conclusion regarding the size of the T4 phage?

This is where we turn to statistics. In an ideal world, we would be able to perfectly measure the size of every single T4 phage in existence. If we were to plot the result of every single perfect measurement as a histogram, we would very likely end up with a distribution that looks something like this:

If we could produce this figure, we would report the size of a T4 phage as the population *mean* $\mu$. We might also report on the natural variability in size by stating how large the spread is through the *standard deviation* $\sigma$.

These two parameters tell us something very important: if we picked any phage at random, there would be a 68% chance that its size would be within one standard deviation of the mean, about 95% chance that it would be within two, and 99% chance it would be within three. A plot of the data along with a normal distribution using the same parameters as the population helps to illustrate.

\[f(x|\mu,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]Alas, we live in an imperfect world and, sadly, we can't measure every single T4 phage. Remember, we can't even perfectly measure **one**! But we can do our best to accurately measure the size of each T4 phage in a *sample*. Our best estimate for the population mean $\mu$ (which is what we really want to know) is the sample mean $\bar{x}$. As you might expect, our best estimate for the population standard deviation $\sigma$ is the sample standard deviation $\sigma_x$.

How good these estimates are depends largely on how we select our sample. If the sample consists of only two surprisingly large phages measured by the same person (with the same bias), the sample statistics are almost certainly poor estimators of the true mean and standard deviation of the population. However, if the sample is large, is drawn basically at random, and is measured multiple times by different people using a variety of equipment, it is much more likely that the sample statistics are good estimates of the population parameters.

While the sample standard deviation tells us about the variability of the population, it actually grossly overestimates the uncertainty on $\bar{x}$. As the number of data points increases, the sample standard deviation will tend to a constant. However, the uncertainty on the mean should **decrease**. A new quantity labelled the *standard error*, or *standard deviation of the mean* is defined as:

In the end, the uncertainty on $\bar{x}$ due to the statistical variation in measurement (assuming a 95% *confidence interval*) is:

You might imagine a situation where there is **no variability** in a set of measurements, which suggests that the uncertainty is zero and thus we exactly know the value of interest. This is a tempting idea but, sadly, this is a fallacy.

Statistical uncertainty is, in some sense, a measure of *precision* and quantifies the repeatability of measurements. But very precise measurements aren't necessarily *accurate.* An example might help clarify: imagine we measure the length of certain phage 100 times and each time we find it to be 105 nm. This is a very precise measurement. But if the actual length is only 87 nm, our result is definitely inaccurate.

How could such a situation arise? One possibility is the measuring equipment itself, possibly due to its calibration, its finite precision, or to the technique used. If you were to measure your height with a tape measure with 1 mm gradations, no matter how precise your measurements are you will never know your height to within less than $\pm 0.5$ mm. To state that the uncertainty on your height is zero, even if the statistical uncertainty is, would be incorrect. We must account for possible *systematic* uncertainties. Unfortunately, unlike statistical uncertainties, systematic uncertainties can only be estimated.

The overall uncertainty is determined as a combination of the statistical and systematic contributions. Happily, these sources of uncertainty are usually independent of each other. In other words, they are rarely both at their worst at the same time. (Another way to interpret this is to recognize that there will be a partial cancellation of the two types of uncertainty on average: the error in one will sometimes compensate for the error in the other.) Due to this, the uncertainties are generally added in quadrature:

\begin{equation} \delta \bar{x} = \sqrt{\delta x_{syst}^2 + \delta \bar{x}_{stat}^2} \end{equation}Note that $\delta \bar{x} \leq \delta x_{syst} + \delta \bar{x}_{stat} = \delta \bar{x}_{max}$ where the sum of the uncertainties corresponds to the worst scenario possible, $\delta \bar{x}_{max}$, but not to the most likely case.

There are two ways to express the uncertainty of a result: the *absolute* uncertainty and the *relative* uncertainty.

The **absolute** uncertainty is the uncertainty expressed with units. For instance $\delta x_{sys} = 0.5$ mm for a ruler. It is usually written with a single non-zero digit as that correspond to the first uncertain digit of the best value. However, when the first non-zero digit is 1, a second significant figure is given, *e.g.* $\delta x = 0.18$ cm.

The **relative** uncertainty is the ratio between the absolute uncertainty and the value itself. It expresses the precision of a measurement by letting us know what fraction of the result is uncertain. It is **unitless** and will usually be given as a percentage with 2 significant figures. For instance, if $z=10.0 \pm 0.6$ cm then the relative uncertainty $\delta z/z = 0.6 \text{cm} / 10.0 \text{cm} \times 100\% = 6.0\%$.

Finally, the main value should be quoted with a number of decimals that corresponds to the one used in the absolute uncertainty. To avoid non-meaningful zeros to the right of a number, use scientific notation.

- $L = 123.23 \pm 0.05$ cm
- $K = 23.582 \pm 0.017$ J
- $F =(8.39 \pm 0.04) \times 10^4$ N