Bruker AXS index previous next

Previous: dark images
Next: licenses and license files


Data redundancy

Average Redundancy

Some people define redundancy as "the total number of reflections scanned divided by the total number of unique reflections". I don't think this is a correct definition: On a serial diffractometer measuring the 4 0 0 reflection 500 times in a dataset of 200 reflections would result in an average redundancy of 2.5! Admitted, the completeness is 0%, but I think this argument proves that the "average redundancy" is meaningless.

A better quantification

Instead of average redundancy, we will be using the 90th percentile redundancy, defined like a median value. Imaging a data set of 21 unique reflections, measured redundantly. 50 reflections were collected in total. Count the number of times each reflection is measured, and sort the numbers:
0  1  1  1  2  2  2  2  2  2  2  2  2  2  3  3  3  4  4  5  5
So: 1 reflection was missed, 3 reflections measured once, 10 reflections measured twice etc. The (nonsense number) "average redundancy" here is 50/21=2.4. Now lets look at how we calculate redundancy instead:
  0  1  1  1  2  2  2  2  2  2  2  2  2  2  3  3  3  4  4  5  5
100%   90%   80%   70%   60%   50%   40%   30%   20%   10%    0%
Here we can see: 90% of all reflections was measured 1 or more times.

There is one more factor that comes into play. If we're calculating the redundancy this way, there is no way of seeing the difference between the data set above, and another one like:

  0  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  3  3  3  4
100%   90%   80%   70%   60%   50%   40%   30%   20%   10%    0%
This one also has a "90% redundancy" of 1. To be able to make a difference between this data set and the previous one, we need to change the redundancy from an integer to a floating point number.

How?

Well, in the first data set, the "1" at 90% is the central "1" out of 3. If we imagine the ones as being rounded numbers they would have been the result of numbers between 0.5 and 1.5. Since our selected "1" is the central one, "originally" it most probably was a "1.0", from the center of the 0.5 to 1.5 range.

In the second data set, the "1" at 90% is the second out of 8, so it is at a quarter of the range. That would make the "90% redundancy" around 0.75.

Obviously the "floating point" trick is a trick, because the redundancy numbers never were floating point numbers to begin with. It allows us, however, to quantify redundancy in a more fine-grained way.

Using the "average redundancy" number would make some people happier, because the number is higher. This gives, however, a false sense of security. "average redundancy" numbers might be biased by a few reflections at low theta which occur very frequently because they're hard to avoid. These reflections do not give sufficient information for a good empirical absorption correction.

With the "collect" definition of "90% redundancy", a more even distribution of redundancy will be favored over uneven distributions, setting a proper target for getting more accurate final data.

Please note that the actual percentile used by "collect" strategy calculations can be changed in the configuration file.

What now is a good data collection strategy?

Lets look back at one of the data sets we measured.
  0  1  1  1  2  2  2  2  2  2  2  2  2  2  3  3  3  4  4  5  5
100%   90%   80%   70%   60%   50%   40%   30%   20%   10%    0%
If we want to increase the redundancy, what reflections should we be looking for? The last strategy will effectively raise the 90%-redundancy, but it will be less effective in raising the 50% redundancy or the average redundancy. It might appear that the final data collection is less effective than one that targets the average redundancy, but in fact the quality of the data set will be better.

Note

Please note that collecting exactly a "half sphere" (assuming that would be possible with an area detector; only at low to intermediate resolution protein work we can get close) will not give you the same redundancy for each reflection. Take an orthorhombic set. In a half sphere, you might have all the equivalent reflections -2,1,3 and 2,1,3 and 2,-1,3 and -2,-1,3: a redundancy of 4 as expected. But: the -2,0,3 and 2,0,3 reflections make a redundancy of 2. And the 0,0,3 reflection is scanned only once. The fraction of "symmetric" reflections like this in a highly symmetric data set is surprisingly high!

index previous next

Previous: dark images
Next: licenses and license files


(C) 1997-2008, Bruker AXS BV, R.W.W. Hooft