Statsar Statistics Library

v1.0.1 for .NET

Product Guide



More Information...

Valid XHTML


4. Descriptive Statistics

4.1 Working with Calculator Functions

Once you have specified how to load your data, you can use the StatsCalculator class to perform descriptive statistics. Over 30 functions are provided covering summary statistics, spread and deviation, percentiles and ranks and central moments.

There are two ways to use each StatsCalculator function. The first is to directly pass in a list of values as an array. For example, to compute the mean of 1, 2, 10 and 8 we can use the following line of code:

[C#]
double mean = calculator.Mean(1, 2, 10, 8);

The above code does not use a data sheet and calculates a mean directly using an array of values. If your data has been loaded into a data sheet consisting of rows and columns, you can specify a column name in the for each descriptive statistic function.

[C#]
DataSheet sheet = calculator.Load("MyFile.csv");
double mean = calculator.Mean("MyColumn");

In the above example it is not necessary to specify the data sheet in the mean method, as the calculator will remember the last data sheet that was loaded. It is only necessary to specify the column name for which you wish to calculate a descriptive statistic.

It is also possible to pass in a filter expression to each StatsCalculator descriptive statistic function. For example, suppose we wish to calculate the mean of a column named "TestScore". Suppose we also wish to filter out NaN and negative values. The expression we are looking for is

"VALUE IS NOT NAN AND VALUE >= 0"

The following line of code will compute a descriptive statistic with a filter:

[C#]
double filteredMean = calculator.Mean(
    "TestScore", "VALUE IS NOT NAN AND VALUE >= 0");

The above code uses the VALUE keyword to filter out NaN and negative values. The mean will be computed considering only rows that contain non-negative real numbers.

4.2 Summary Statistics

4.2.1 The Count Function

The StatsCalculator count function returns the number of elements in a list of values. When not using a data sheet, it is possible to count the number of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {5, 6, 12, 45};
double count = calculator.Count(values);

The second line of the code above will return 4, since there are 4 elements in the array. To illustrate the count function when using a data sheet, consider the data sheet below which lists house prices by area code:

Area Price
A456 32
A325 45
A345 61
A212 66
A342 67
A332 82

Table 4.1 - House Prices by Area Code.

Suppose we wish to count how many prices we have in the data sheet. We can use the count function passing in the column name as a parameter:

[C#]
double count = calculator.Count("Price");

This line of code will return 6 since we have 6 prices in the data sheet. It is also possible to use the count function with a filter. Suppose we wish to count house prices which are greater or equal to 65. We can use the VALUE keyword with the following expression:

"VALUE >= 65"

The following C# code illustrates how the filter is passed to the count function after specifying a column name:

[C#]
double count = calculator.Count("Price", "VALUE >= 65");

The line of code above will count the number of elements in the price column with a value of at least 65. This will return 3 since there are 3 elements of at least 65, these are 66, 67 and 82.

4.2.2 The Sum Function

The StatsCalculator sum function returns the sum of elements in a list of values. When not using a data sheet, it is possible to sum the number of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {7, 2, 3, -4};
double sum = calculator.Sum(values);

The second line of the code above will return 8, since

7 + 2 + 3 – 4 = 8

To illustrate the sum function when using a data sheet, consider the data sheet below which lists the number of items in stock by product:

Item Stock
Carrots 62
Beans 33
NULL -1
Peas 82
NULL -1
Onions 51

Table 4.2 - Number of items in stock by product.

Suppose we wish to find out how many items we have in total. We could use the sum function passing in the column name as a parameter:

[C#]
double sum = calculator.Sum("Stock");

This line of code will return 226 since

62 + 33 – 1 + 82 – 1 + 51 = 226

As can be seen above, the data sheet contains two invalid values with NULL for item and with -1 for stock. We can use the sum function with a filter to exclude these invalid values from our calculation. We can use the VALUE keyword with the following expression:

"VALUE > 0"

The following C# code illustrates how the filter is passed to the sum function after specifying a column name:

[C#]
double sum = calculator.Sum("Stock", "VALUE > 0");

The line of code above will sum all positive elements. This will return 228 since

62 + 33 + 82 + 51 = 228

4.2.3 The Mean Function

The StatsCalculator mean function computes the arithmetic mean of elements in a list. These elements may come from an array or from a data sheet column. The arithmetic mean is computed according to the formula:

To see how to compute the mean of an array of doubles, consider the following C# code:

[C#]
double[] values = {12, 14, 11, 16};
double mean = calculator.Mean(values);

The above code computes the mean of the values 12, 14, 11 and 16. This returns 13.25. To see how to compute the mean of a data sheet column, consider the following data sheet, which represents heights of a group of people:

Person Height
Jack 180
Paul 172
Mary 162
Susan 154
James 175
William 192

Table 4.3 - Heights of a group of people.

The mean of the height column is computed using the following C# code:

[C#]
double mean = calculator.Mean("Height");

This C# code will return 172.5 since

(180 + 172 + 162 + 154 + 175 + 192) / 6 = 172.5

When computing the mean of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the mean of only the first three people in the list, i.e. of Jack, Paul and Mary. We can use the following expression as a filter:

"INDEX < 3"

Since the first row has index 0, the following C# code will compute the mean of the first three people:

[C#]
double mean = calculator.Mean("Height", "INDEX < 3");

The above line of code will return 171.33 since

(180 + 172 + 162) / 3 = 171.33

4.2.4 The Trim Mean Function

The StatsCalculator trim mean function computes the arithmetic mean of elements in a list, after trimming a certain proportion of the elements. The trimmed mean may be computed by specifying an array, or by specifing a data sheet column. To illustrate using an array, consider the following C# code:

[C#]
double[] values = {8, 16, 22, 25, 28, 32, 36, 42};
double trimMean = calculator.TrimMean(values, 0.5);

This will compute the arithmetic mean after trimming 50% of the values. That is, 25% of the values will be removed from either end of the list of elements after sorting in ascending order. Since we have 8 elements, 2 elements will be removed from either end of the list. Thus, the trimmed mean will be 26.75 since

(22 + 25 + 28 + 32) / 4 = 26.75

To illustrate using a data sheet consider the following exam results:

Subject Result
English 32
French 48
Drama 55
Art 64
Maths 74
Business 82

Table 4.4 - Exam Results.

To compute the mean trimmed 33.34%, we can use the following C# code:

[C#]
double trimMean = calculator.TrimMean("Result", 0.3334);

This will exclude 33.34% of the values. Since we have 6 values, 2 values will be excluded, one from the top of the list and one from the bottom. The result will be 60.25 since

(48 + 55 + 64 + 74) / 4 = 60.25

It is also possible to pass in a filter when computing the trimmed mean. The filter will be evaluated first, before the trimmed mean is computed. The syntax for this is to specify an expression after the specifying the amount to trim by.

4.2.5 The Geometric Mean Function

The StatsCalculator geometric mean function computes the geometric mean of elements in a list. These elements may come from an array or from a data sheet column. The geometric mean is computed according to the formula:

To see how to compute the geometric mean of an array of doubles, consider the following C# code:

[C#]
double[] values = {8, 12, 14, 14, 20, 22};
double geometricMean = calculator.GeometricMean(values);

The above code computes the geometric mean of the values 8, 12, 14, 14, 20 and 22. This returns 14.22. To see how to compute the geometric mean of a data sheet column, consider the following data sheet, which represents prices of takeway meals:

Meal Price
Pizza 6.99
Indian 7.50
Chinese 12.50
Italian 11.75
Chicken 4.99
Fish 10.50

Table 4.5 - Prices of takeaway meals.

The geometric mean of the price column is computed using the following C# code:

[C#]
double geometricMean = calculator.GeometricMean("Price");

This C# code will return 8.60 since

When computing the geometric mean of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the geometric mean of only the last three meals in the list, i.e. of Italian, Chicken and Fish. We can use the following expression as a filter:

"INDEX >= 3"

Since the first row has index 0, the following C# code will compute the geometric mean of the last three meals:

[C#]
double geometricMean = calculator.GeometricMean("Price", "INDEX >= 3");

The above line of code will return 8.51 since

4.2.6 The Harmonic Mean Function

The StatsCalculator harmonic mean function returns the harmonic mean of elements in a list of values. This is calculated according to

When not using a data sheet, it is possible to compute the harmonic mean of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {12, 14, 11, 13};
double harmonicMean = calculator.HarmonicMean(values);

The second line of the code above will return 12.4, since

To illustrate the harmomic mean function when using a data sheet, consider the data sheet below which lists the number of class students by year:

Year Students
1 25
2 28
NULL -1
3 22
NULL -1
4 26

Table 4.6 - Number of class students by year.

Suppose we wish to compute the harmonic mean of students per class. We could use the harmonic mean function passing in the column name as a parameter:

[C#]
double harmonicMean = calculator.HarmonicMean("Students");

This line of code will return -3.26. The data sheet contains two invalid values with NULL for year and with -1 for the number of students. We can use a filter to exclude invalid values from our calculation. We can use the VALUE keyword with the following expression:

"VALUE > 0"

The following C# code illustrates how the filter is passed to the harmonic mean function after specifying a column name:

[C#]
double harmonicMean = calculator.HarmonicMean("Students", "VALUE > 0");

The line of code above will compute the harmonic mean of all positive elements. This will return 25.06 since

4.2.7 The Mode Function

The StatsCalculator mode function returns the most common element in a list of values. When not using a data sheet, it is possible to compute the mode of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {1, 3, 2, 3, 2, 2, 1};
double mode = calculator.Mode(values);

The second line of the code above will return 2, since this is the most common element in the array. To illustrate the mode function when using a data sheet, consider the data sheet below which lists the number of windows in street houses:

Address Windows
44 4
45 6
46 6
47 4
48 2
49 4

Table 4.7 - Number of windows in street houses.

Suppose we wish to compute the mode of windows in the data sheet. We can use the mode function passing in the column name as a parameter:

[C#]
double mode = calculator.Mode("Windows");

This line of code will return 4, since 4 is the most common element in the windows column. It is also possible to use the mode function with a filter. Suppose we wish to compute the mode of houses with greater than 5 windows. We can use the VALUE keyword with the following expression:

"VALUE > 5"

The following C# code illustrates how the filter is passed to the mode function after specifying a column name:

[C#]
double mode = calculator.Mode("Windows", "VALUE > 5");

The line of code above will compute the mode of the elements in the windows column with a value greater than 5. This will return 6 since these are the only elements greater than 5 in the above list.

4.2.8 The Max Function

The StatsCalculator max function computes the maximum value of elements in a list. These elements may come from an array or from a data sheet column. To see how to compute the max of an array of doubles, consider the following C# code:

[C#]
double[] values = {-3, 3, 5, 4, -8, 2};
double max = calculator.Max(values);

The above code computes the max of the values -3, 3, 5, 4, -8 and 2. This returns 5. To see how to compute the max of a data sheet column, consider the following data sheet, which represents weights of a group of people:

Person Weight
Paul 78
Sue 55
James 82
Lilly 58
John 75
Bob 85

Table 4.8 - Weights of a group of people.

The max of the weight column is computed using the following C# code:

[C#]
double max = calculator.Max("Weight");

This C# code will return 85 since this is the largest value in the weight column. When computing the max of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the max of only the first three people in the list, i.e. of Paul, Sue and James. We can use the following expression as a filter:

"INDEX < 3"

Since the first row has index 0, the following C# code will compute the maximum weight of the first three people:

[C#]
double max = calculator.Max("Weight", "INDEX < 3");

The above line of code will return 82 since this is the maximum value of the first three elements in the weight column.

4.2.9 The Max Abs Function

The StatsCalculator max abs function computes the maximum absolute value of elements in a list. These elements may come from an array or from a data sheet column. To see how to compute the maximum absolute value of an array of doubles, consider the following C# code:

[C#]
double[] values = {-3, 5, -7, 6, 4};
double maxAbs = calculator.MaxAbs(values);

The above code computes the maximum absolute value of the numbers -3, 5, -7, 6 and 4. This returns 7. To see how to compute the maximum absolute value of a data sheet column, consider the following data sheet, which represents price changes of fruits over the last month:

Fruit Change
Apples 3.2
Oranges -5.3
Bananas -2.8
Pears 4.4
Kiwis -1.3
Grapes 2.7

Table 4.9 - Price changes of fruits over the last month.

The maximum absolute value of the change column is computed using the following C# code:

[C#]
double maxAbs = calculator.MaxAbs("Change");

This C# code will return 5.3 since this is the largest absolute value in the change column. When computing the maximum absolute value of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the maximum absolute value of only the last three fruits in the list, i.e. of Pears, Kiwis and Grapes. We can use the following expression as a filter:

"INDEX >= 3"

Since the first row has index 0, the following C# code will compute the maximum absolute value of the last three fruits:

[C#]
double maxAbs = calculator.MaxAbs("Change", "INDEX >= 3");

The above line of code will return 4.4 since this is the largest absolute value among the last three elements of the change column.

4.2.10 The Min Function

The StatsCalculator min function computes the minimum value of elements in a list. When not using a data sheet, it is possible to compute the minimum of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {4, -3, 5, -6, 8, 7, -4};
double min = calculator.Min(values);

The second line of the code above will return -6, since this is the smallest element in the array. To illustrate the min function when using a data sheet, consider the data sheet below which lists the speeds of different cars:

Car Speed
Toyota 190
Honda 220
Ford 240
Audi 120
Fiat 180
BMW 320

Table 4.10 - Car speeds.

Suppose we wish to compute the minimum of the speed column in the data sheet. We can use the min function passing in the column name as a parameter:

[C#]
double min = calculator.Min("Speed");

This line of code will return 120 since 120 is the smallest element in the speed column. It is also possible to use the min function with a filter. Suppose we wish to compute the minimum of the speed column for values greater than 200. We can use the VALUE keyword with the following expression:

"VALUE > 200"

The following C# code illustrates how the filter is passed to the min function after specifying a column name:

[C#]
double min = calculator.Min("Speed", "VALUE > 200");

The line of code above will compute the minimum of the elements in the speed column with a value of greater than 200. This will return 220 since this is the minimum of the values greater than 200 (i.e. of 220, 240 and 320).

4.2.11 The Min Abs Function

The StatsCalculator min abs function computes the minimum absolute value of elements in a list. These elements may come from an array or from a data sheet column. To see how to compute the minimum absolute value of an array of doubles, consider the following C# code:

[C#]
double[] values = {2, -3, 5, -6, 4};
double minAbs = calculator.MinAbs(values);

The above code computes the minimum absolute value of the numbers 2, -3, 5, -6 and 4. This returns 2. To see how to compute the minimum absolute value of a data sheet column, consider the following data sheet, which represents price changes of stocks over the last week:

Stock Change
IBM -10.2
MSFT -1.5
DCX 6.5
GM 7.22
GMAC -8.43
BRTEL -2.12

Table 4.11 - Price changes of stocks over the last week.

The minimum absolute value of the change column is computed using the following C# code:

[C#]
double minAbs = calculator.MinAbs("Change");

This C# code will return 1.5 since this is the smallest absolute value in the change column. When computing the minimum absolute value of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the minimum absolute value of only the last three stocks in the list, i.e. of GM, GMAC and BRTEL. We can use the following expression as a filter:

"INDEX >= 3"

Since the first row has index 0, the following C# code will compute the minimum absolute value of the last three stocks:

[C#]
double minAbs = calculator.MinAbs("Change", "INDEX >= 3");

The above line of code will return 2.12 since this is the smallest absolute value among the last three elements of the change column.

4.2.12 The Midrange Function

The StatsCalculator midrange function returns the midrange of elements in a list of values. This is the arithmetic mean of the minimum and maximum values. When not using a data sheet, it is possible to compute the midrange of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {4, 7, 13, 15};
double midrange = calculator.Midrange(values);

The second line of the code above will return 9.5, since

(4 + 15) / 2 = 9.5

To illustrate the midrange function when using a data sheet, consider the data sheet below which lists the number of flights by airport:

Airport Flights
London 480
NULL -1
New York 420
Tokyo 320
NULL -1
Paris 345

Table 4.12 - Number of flights by airport.

Suppose we wish to compute the midrange of the flights column. We could use the midrange function passing in the column name as a parameter:

[C#]
double midrange = calculator.Midrange("Flights");

This line of code will return 239.5. The data sheet contains two invalid values with NULL for airport and with -1 for the number of flights. We can use a filter to exclude invalid values from our calculation. We can use the VALUE keyword with the following expression:

"VALUE > 0"

The following C# code illustrates how the filter is passed to the midrange function after specifying a column name:

[C#]
double midrange = calculator.Midrange("Flights", "VALUE > 0");

The line of code above will compute the midrange of all positive elements. This will return 400 since

(320 + 480) / 2 = 400

4.2.13 The Range Function

The StatsCalculator range function returns the difference between the maximum and minimum values of elements in a list. The range is the smallest interval which contains all the data. When not using a data sheet, it is possible to compute the range of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {4, 7, 12, -3, 8, 5};
double range = calculator.Range(values);

The second line of the code above will return 15, since

12 – (-3) = 15

To illustrate the range function when using a data sheet, consider the data sheet below which lists children's favourite colors:

Color Children
Red 4
Orange 6
Yellow 7
Green 14
Blue 8
Violet 3

Table 4.13 - Children's favourite colors.

Suppose we wish to compute the range of the children column in the data sheet. We can use the range function passing in the column name as a parameter:

[C#]
double range = calculator.Range("Children");

This line of code will return 11 since

14 – 3 = 11

It is also possible to use the range function with a filter. Suppose we wish to compute the range of the children column for values of at least 7. We can use the VALUE keyword with the following expression:

"VALUE >= 7"

The following C# code illustrates how the filter is passed to the range function after specifying a column name:

[C#]
double range = calculator.Range("Children", "VALUE >= 7");

The line of code above will compute the range of the elements in the speed column with a value of at least 7. This will return 7 since

14 – 7 = 7

4.2.14 The Median Function

The StatsCalculator median function returns the middle number of elements in a list of values, after sorting in ascending order. If the number of elements is even, the arithmetic mean of the middle two values will be returned. When not using a data sheet, it is possible to compute the median of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {1, 5, 7, 11, 14, 15, 17};
double median = calculator.Median(values);

The second line of code above will return 11, since this is the middle number in the array, after sorting in ascending order. To illustrate the median function when using a data sheet, consider the data sheet below which lists the number of people by desk:

Desk People
A1 5
A2 7
A3 8
A4 6
A5 3
A6 5

Table 4.14 - Number of people by desk.

Suppose we wish to compute the median of people in the data sheet. We can use the median function passing in the column name as a parameter:

[C#]
double median = calculator.Median("People");

This line of code will return 5.5 since

(5 + 6) / 2 = 5.5

It is also possible to use the median function with a filter. Suppose we wish to compute the median of desks with greater than 5 people. We can use the VALUE keyword with the following expression:

"VALUE > 5"

The following C# code illustrates how the filter is passed to the median function after specifying a column name:

[C#]
double median = calculator.Median("People", "VALUE > 5");

The line of code above will compute the median of the elements in the people column with a value of greater than 5. This will return 7 since 7 is middle number out of 6, 7 and 8.

4.3 Spread and Deviation

Some spread and deviation functions are overloaded to accept a member of the Estimator enumeration. This may be unbiased or biased. The default estimator is specified by the StatsConfiguration.Estimator property. For example, the following C# code sets the default estimator to biased:

[C#]
calculator.Configuration.Estimator = Estimator.Biased;

4.3.1 The Coefficient of Variation Function

The StatsCalculator coefficient variation function computes the coefficient of variation of a list of values, with the specified estimator. This is the ratio of the standard deviation to the mean:

The coefficient of variation function is overloaded to accept either a data sheet column or an array of values. When using a data sheet column, a filter may also be specified. The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public double CoefficientOfVariation(
        string columnName);

    public double CoefficientOfVariation(
        string columnName, Estimator estimator);

    public double CoefficientOfVariation(
        string columnName, string filter);

    public double CoefficientOfVariation(
        params double[] values);

    public double CoefficientOfVariation(
        double[] values, Estimator estimator);

    // ...
}

In the above methods, a member of the Estimator enumeration may be specified, which indicates if the sample is unbiased or biased.

As an example, consider the values 12, 14, 11, 9 and 15. We can compute the unbiased coefficient of variation using the following C# code:

[C#]
double[] values = { 12, 14, 11, 9, 15 };
double c = calculator.CoefficientOfVariation(values);

The above values have a mean of 12.2 and standard deviation of 2.39. Hence, the coefficient of variation is equal to

2.39 / 12.2 = 0.2

4.3.2 The Mean Deviation Function

The StatsCalculator mean deviation function computes the arithmetic mean of the absolute deviation from the mean of a list of elements. These elements may come from an array or from a data sheet column. The mean deviation is computed according to the formula:

To see how to compute the mean deviation of an array of doubles, consider the following C# code:

[C#]
double[] values = {22, 24, 21, 23, 19};
double meanDeviation = calculator.MeanDeviation(values);

The above code computes the mean deviation of the values 22, 24, 21, 23 and 19. This returns 1.44. To see how to compute the mean deviation of a data sheet column, consider the following data sheet, which represents exam scores of a group of people:

Person Score
John 55
Adam 65
Betty 75
Catherine 62
Mary 84
Howard 91

Table 4.15 - Exam scores of a group of people.

The mean deviation of the score column is computed using the following C# code:

[C#]
double meanDeviation = calculator.MeanDeviation("Score");

This C# code will return 11.33 since the mean is 72 and

(17 + 7 + 3 + 10 + 12 + 19) / 6 = 11.33

When computing the mean deviation of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the mean deviation of only the first three people in the list, i.e. of John, Adam and Betty. We can use the following expression as a filter:

"INDEX < 3"

Since the first row has index 0, the following C# code will compute the mean deviation of the first three people:

[C#]
double meanDeviation = calculator.MeanDeviation("Score", "INDEX < 3");

The above line of code will return a mean deviation of 6.67.

4.3.3 The Median Deviation Function

The StatsCalculator median deviation function computes the middle number of the absolute deviation from the mean of elements in a list. These elements may come from an array or from a data sheet column.

To see how to compute the median deviation of an array of doubles, consider the following C# code:

[C#]
double[] values = {4, 8, 12, 16, 16, 19};
double medianDeviation = calculator.MedianDeviation(values);

The above code computes the median deviation of the values 4, 8, 12, 16, 16 and 19. This returns 4. To see how to compute the median deviation of a data sheet column, consider the following data sheet, which represents prices of venues:

Venue Price
Cinema 8.50
Bowling 4.50
Zoo 12.50
Museum 2.50
Concert 52.00
Opera 70.00

Table 4.16 - Prices of venues.

The median deviation of the price column is computed using the following C# code:

[C#]
double medianDeviation = calculator.MedianDeviation("Price");

This will return 21.5 since the mean is 25 and the absolute deviates from the mean are

16.5, 20.5, 12.5, 22.5, 27, 45

When computing the median deviation of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the median deviation of only the last three venues in the list, i.e. of Museum, Concert and Opera. We can use the following expression as a filter:

"INDEX >= 3"

Since the first row has index 0, the following C# code will compute the median deviation of the last three venues:

[C#]
double medianDeviation
    = calculator.MedianDeviation("Price", "INDEX >= 3");

This will return 28.5 since the mean is 41.5 and the absolute deviates from the mean are

39, 10.5, 28.5

4.3.4 The Root Mean Square Function

The StatsCalculator root mean square function returns the RMS (root mean square) of elements in a list of values. The RMS is calculated according to

When not using a data sheet, it is possible to compute the RMS of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {5, 8, 16, 22};
double rootMeanSquare = calculator.RootMeanSquare(values);

The second line of the code above will return 14.4, since

To illustrate the RMS function when using a data sheet, consider the data sheet below which lists scores by team:

Team Score
Alpha 55
NULL -1
Beta 62
NULL -1
Gamma 48
Delta 75

Table 4.17 - Scores by team.

Suppose we wish to calculate the RMS of the score column. We could use the RMS function passing in the column name as a parameter:

[C#]
double rootMeanSquare = calculator.RootMeanSquare("Score");

This line of code will return 49.7 since

As can be seen above, the data sheet contains two invalid values with NULL for team and with -1 for score. We can use the RMS function with a filter to exclude these invalid values from our calculation. We can use the VALUE keyword with the following expression:

"VALUE > 0"

The following C# code illustrates how the filter is passed to the RMS function after specifying a column name:

[C#]
double rootMeanSquare
    = calculator.RootMeanSquare("Score", "VALUE > 0");

The line of code above will compute the RMS of all positive elements. This will return 60.8 since

4.3.5 The Variance Function

The StatsCalculator variance function computes the variance of a list of values, with the specified estimator. The unbiased variance is calculated according to

Similarly, the biased variance is calculated according to:

The variance function is overloaded to accept either a data sheet column or an array of values. When using a data sheet column, a filter may also be specified. The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public double Variance(string columnName);

    public double Variance(string columnName, Estimator estimator);

    public double Variance(string columnName, string filter);

    public double Variance(params double[] values);

    public double Variance(double[] values, Estimator estimator);

    // ...
}

In the above methods, a member of the Estimator enumeration may be specified, which indicates if the sample is unbiased or biased.

As an example, consider the values 4, 7, 3, 5 and 9. We can compute the unbiased variance using the following C# code:

[C#]
double[] values = { 4, 7, 3, 5, 9 };
double variance = calculator.Variance(values);

The above values have a mean of 5.6. The C# code above returns a variance of 5.8 since

4.3.6 The Standard Deviation Function

The StatsCalculator standard deviation function computes the standard deviation of a list of values, with the specified estimator. The unbiased standard deviation is calculated according to

Similarly, the biased standard deviation is calculated according to:

The standard deviation function is overloaded to accept either a data sheet column or an array of values. When using a data sheet column, a filter may also be specified. The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public double StandardDeviation(
        string columnName);

    public double StandardDeviation(
        string columnName, Estimator estimator);

    public double StandardDeviation(
        string columnName, string filter);

    public double StandardDeviation(
        params double[] values);

    public double StandardDeviation(
        double[] values, Estimator estimator);

    // ...
}

In the above methods, a member of the Estimator enumeration may be specified, which indicates if the sample is unbiased or biased.

As an example, consider the values 6, 8, 12, 11 and 16. We can compute the unbiased standard deviation using the following C# code:

[C#]
double[] values = { 6, 8, 12, 11, 16 };
double standardDeviation = calculator.StandardDeviation(values);

The above values have a mean of 10.6. The C# code above returns a standard deviation of 3.8 since

4.3.7 The Standard Error Function

The StatsCalculator standard error function computes the standard error of the mean (SEM) of elements in a list. These elements may come from an array or from a data sheet column. The standard error is calculated according to the equation:

To see how to compute the standard error of an array of doubles, consider the following C# code:

[C#]
double[] values = {32, 35, 37, 29, 36};
double standardError = calculator.StandardError(values);

The above code computes the standard error of the values 32, 35, 37, 29 and 36. This returns 1.46. To see how to compute the standard error of a data sheet column, consider the following data sheet, which represents the number of shareholders by company:

Company Shareholders
Pan American 144
Middle Utilities 133
Island Lighting 143
RCA 246
Bank America 175
Dow Chemical 137

Table 4.18 - Number of shareholders by company.

The standard error of the shareholders column is computed using the following C# code:

[C#]
double standardError = calculator.StandardError("Shareholders");

This will return 17.67 since

When computing the standard error of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the standard error of only companies with more than 140 shareholders, i.e. of Pan American, Island Lighting, RCA and Bank America. We can use the following expression as a filter:

"VALUE > 140"

The following C# code will compute the standard error of companires with more than 140 shareholders:

[C#]
double standardError
    = calculator.StandardError("Shareholders", "VALUE > 140");

This will return 24.17 since

4.3.8 The Sum Square Deviation Function

The StatsCalculator sum square deviation function computes the sum of the square deviation from the mean of a list of elements. These elements may come from an array or from a data sheet column. The sum square deviation is computed according to the formula:

To see how to compute the sum square deviation of an array of doubles, consider the following C# code:

[C#]
double[] values = {10, 11, 10, 8, 14};
double sumSquareDeviation = calculator.SumSquareDeviation (values);

The above code computes the sum square deviation of the values 10, 11, 10, 8 and 14. This returns 19.2. To see how to compute the sum square deviation of a data sheet column, consider the following data sheet, which represents ages of a group of people:

Person Age
John 17
Jack 25
Mary 34
William 44
Andrew 28
Nancy 32

Table 4.19 - Ages of a group of people.

The sum square deviation of the age column is computed using the following C# code:

[C#]
double sumSquareDeviation = calculator.SumSquareDeviation("Age");

This C# code will return 414 since the mean is 30 and

169 + 25 + 16 + 196 + 4 + 4 = 414

When computing the sum square deviation of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the sum square deviation of only the first three people in the list, i.e. of John, Jack and Mary. We can use the following expression as a filter:

"INDEX < 3"

Since the first row has index 0, the following C# code will compute the sum square deviation of the first three people:

[C#]
double sumSquareDeviation
    = calculator.SumSquareDeviation("Age", "INDEX < 3");

The above line of code will return a sum square deviation of 144.67.

4.4 Percentiles and Ranks

There are several different algorithms which may be used to calculate quantiles (i.e. deciles, percentiles and quartiles), with no clear standard in general usage. However, in order to support as many of these algorithms as possible, the quantile functions are overloaded to accept a member of the QuantileType enumeration - this is used to specify how quantiles are calculated.

The default quantile type is specified by the StatsConfiguration.QuantileType property. For example, the following C# code sets the default quantile type to closest observation:

[C#]
calculator.Configuration.QuantileType = QuantileType.ClosestObservation;

The following table lists the different quantile calculation types that are supported by the Statsar library:

Quantile Type Description
Default Default quantile type (interpolated).
Interpolated Interpolated quantile type.
WeightedAverageLeft Left weighted average quantile type. The quantile will be calculated according to a weighted average, aimed at the left point that contains the percentile interval.
ClosestObservation Closest observation quantile type. The quantile will be calculated to match the closest value in the observed data.
EmpiricalDistribution Empirical distribution quantile type.
WeightedAverageRight Right weighted average quantile type. The quantile will be calculated according to a weighted average, aimed at the right point that contains the percentile interval.
EmpiricalDistributionAveraged Empirical distribution averaged quantile type. This quantile calculation type is the same as when using the empirical distribution quantile type, with the exception that observation values will be averaged when the percentile exactly lies on a value.

Table 4.20 - Quantile calculation types.

4.4.1 The Interquartile Range Function

The StatsCalculator interquartile range function computes the difference between the 75th and 25th percentiles of a list of elements. These elements may come from an array or from a data sheet column.

To see how to compute the interquartile range of an array of doubles, consider the following C# code:

[C#]
double[] values = {38, 42, 40, 45, 35};
double interquartileRange = calculator.InterquartileRange(values);

The above code computes the interquartile range of the values 38, 42, 40, 45 and 35. This returns 4. To see how to compute the interquartile range of a data sheet column, consider the following data sheet, which represents annual income of a group of people:

Person Income
Jack 22,000
Jason 25,000
Tara 48,500
Sarah 70,000
Lucy 12,400
Bob 16,750

Table 4.21 - Annual income of a group of people.

The interquartile range of the score column is computed using the following C# code:

[C#]
double interquartileRange = calculator.InterquartileRange("Income");

This C# code will return 24562.5 since the 25th percentile is 18062.5 and the 75th percentile is 42625.

When computing the interquartile range of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the interquartile range of only the first three people in the list, i.e. of Jack, Jason and Tara. We can use the following expression as a filter:

"INDEX < 3"

Since the first row has index 0, the following C# code will compute the interquartile range of the first three people:

[C#]
double interquartileRange
    = calculator.InterquartileRange("Income", "INDEX < 3");

The above line of code will return a interquartile range of 13250.

4.4.2 The Decile Function

The StatsCalculator decile function computes a decile of elements in a list. These elements may come from an array or from a data sheet column. Each decile divides a data set into a 10th.

To see how to compute a decile of an array of doubles, consider the following C# code:

[C#]
double[] values = {3, 4, 8, 9, 10};
double decile = calculator.Decile(values, 3);

The above code computes the 3rd decile decile of the values 3, 4, 8, 9 and 10. This returns 4.8 since this is the 30th percentile. To see how to compute a decile of a data sheet column, consider the following data sheet, which represents prices of soda drinks:

Drink Price
Fanta 1.40
Sprite 1.35
Cola 0.95
Cola Diet 1.25
Pepsi 1.05
Pepsi Light 1.30

Table 4.22 - Prices of soda drinks.

The 7th decile of the price column is computed using the following C# code:

[C#]
double decile = calculator.Decile("Price", 7);

This C# code will return 1.325 since this is the 70th percentile.

When computing a decile of a data sheet column it is also possible to specify a filter. Suppose we wish to compute the 9th decile of only the last three drinks in the list, i.e. of Cola Diet, Pepsi and Pepsi Light. We can use the following expression as a filter:

"INDEX >= 3"

Since the first row has index 0, the following C# code will compute the 9th decile of the last three drinks:

[C#]
double decile = calculator.Decile("Price", 9, "INDEX >= 3");

The above line of code will return 1.29 since this is the 90th percentile.

The decile function is overloaded to accept a member of the QuantileType enumeration, which may be used to specify which quantile algorithm to use when calculating deciles. The following C# code calculates the 3rd decile of the price column, using the left weighted average quantile algorithm:

[C#]
double decile = calculator.Decile(
    "Price", 3, QuantileType.WeightedAverageLeft);

The C# code above returns 1.03 since this is the 30th percentile when using left weighted average quantiles.

4.4.3 The Percentile Function

The StatsCalculator percentile function returns a percentile of elements in a list of values. When not using a data sheet, it is possible to compute the percentile of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {4, 8, 12, 16, 25, 29};
double percentile = calculator.Percentile(values, 0.35);

The second line of the code above will return 11, since this is the 35th percentile of elements in the array. To illustrate the percentile function when using a data sheet, consider the data sheet column below:

Data
9
10
14
15
15
25

Table 4.23 - Percentile data.

Suppose we wish to compute the 85th percentile of the data column. We can use the percentile function passing in the column name as a parameter:

[C#]
double percentile = calculator.Percentile("Data", 0.85);

This line of code will return 17.5 since this is the 85th percentile. It is also possible to use the percentile function with a filter. Suppose we wish to compute a percentile of values of at least 15. We can use the VALUE keyword with the following expression:

"VALUE >= 15"

The following C# code illustrates how the filter is passed to the percentile function after specifying the percentile amount:

[C#]
double percentile = calculator.Percentile("Data", 0.65, "VALUE >= 15");

The line of code above will compute the 65th percentile of the elements in the data column with a value of at least 15. This will return 18 since this is the 65th percentile of the values 15, 15 and 25.

The percentile function is overloaded to accept a member of the QuantileType enumeration, which may be used to specify which quantile algorithm to use when calculating percentiles. The following C# code calculates the 45th percentile of the above data, using the right weighted average quantile algorithm:

[C#]
double percentile = calculator.Percentile(
    "Data", 0.45, QuantileType.WeightedAverageRight);

This C# code will return 14.15 since this is the 45th percentile when using right weighted average quantiles.

4.4.4 The Percent Rank Function

The StatsCalculator percent rank function returns the rank of a value in a data set as a percentage of the data set. This data set may be an array or may be a data sheet column. The following overloads are provided as part of the StatsCalculator class:

[C#]
public class StatsCalculator
{
    // ...

    public double PercentRank(
        string columnName, double value);

    public double PercentRank(
        string columnName, double value, string filter);

    public double PercentRank(
        params double[] values, double value);

    // ...
}

To see how the percent rank function works with an array, consider the following C# code:

[C#]
double[] values = {1, 1, 1, 2, 3, 4, 8, 11, 12, 13};
double percentRank = calculator.PercentRank(values, 2);

The above code will rank the number 2 in the above list and return the result as a percentage. Since 3 values in the list are smaller than 2 and 6 are larger than 2 the result is 3 / (3 + 6) = 0.333.

To see how to use the percent rank function with a data sheet column, consider the data sheet below which consists of a single column named Values:

Values
4
8
12
14
22
29

Table 4.24 - Percent rank values.

The following C# code computes a percent rank for this column:

[C#]
double percentRank = calculator.PercentRank("values", 10);

The above code will return 0.3. It is also possible to filter out data sheet column values by specifing a filter expression.

4.4.5 The Quartile Function

The StatsCalculator quartile function computes a quartile of elements in a list. These elements may come from an array or from a data sheet column. A quartile devides a dataset into quarters.

To see how to compute a quartile of an array of doubles, consider the following C# code:

[C#]
double[] values = {5, 8, 6, 2, 12, 7};
double quartile = calculator.Quartile(values, 3);

The above code computes the 3rd quartile of the values 5, 8, 6, 2, 12 and 7. This returns 7.75 since this is the 75th percentile. To see how to compute a quartile of a data sheet column, consider the following data sheet, which represents monthly bills of a group of people:

Person Bills
Angie 350
Bart 275
Tom 485
Homer 265
William 320
Lisa 440

Table 4.25 - Monthly bills of a group of people.

The 1st quartile of the bills column is computed using the following C# code:

[C#]
double quartile = calculator.Quartile("Bills", 1);

This C# code will return 286.25 since this is the 25th percentile.

When computing a quartile of a data sheet column it is also possible to specify a filter. Suppose we wish to compute a quartile of only the first three people in the list, i.e. of Angie, Bart and Tom. We can use the following expression as a filter:

"INDEX < 3"

Since the first row has index 0, the following C# code will compute the 2nd quartile of the first three people:

[C#]
double quartile = calculator.Quartile("Bills", 2, "INDEX < 3");

The above line of code will return 350 since this is the 50th percentile.

The quartile function is overloaded to accept a member of the QuantileType enumeration, which may be used to specify which quantile algorithm to use when calculating quartiles. The C# below calculates the 3rd quartile of the bills column, by using the closest observation quantile algorithm:

[C#]
double quartile = calculator.Quartile(
    "Bills", 3, QuantileType.ClosestObservation);

This C# code returns 350 as this is the 75th percentile when using closest observation quantiles.

4.4.6 The Weighted Percentile Function

The StatsCalculator weighted percentile function returns a weighted percentile of elements in a list of values, using a corresponding list of weights. In a non-weighted percentile calculation, each element of the list gives an equal contribution to the distribution. In a weighted percentile calculation, each element contributes according to the corresponding weight.

When not using a data sheet, it is possible to compute the weighted percentile of elements in an array. The following C# code illustrates this:

[C#]
double[] values = {5, 6.5, 3.2, 7.6, 12.4, 12.2};
double[] weights = {0.1, 0.5, 2.5, 3.2, 1.5, 0.5};
double percentile = calculator.WeightedPercentile(values, weights, 0.45);

The only restriction on the weights is that each of them is expected to be a positive value. The above C# code will return 7.6, since this is the 45th weighted percentile of elements in the values array. To illustrate the weighted percentile function when using a data sheet, consider the sample data below which corresponds to book sales by category:

Category Sales Rating
Fiction 1250 1.5
Biography 950 1.5
Travel 775 0.5
Cookery 675 0.5
Fashion 600 0.5
Business 550 5.0
Educational 475 5.0

Table 4.26 - Book sales by category.

Suppose we wish to calculate the 65th weighted percentile of the sales column, with the rating column as weights. We can use the weighted percentile function, specifying column names as parameters:

[C#]
double percentile = calculator.WeightedPercentile("Sales", "Rating", 0.65);

The above C# code will return 550 since this is the 65th weighted percentile. It is also possible to use the percentile function with a filter. Suppose we wish to calculate a weighted percentile for sales under 1000 books. We can use the VALUE keyword with the following expression:

"VALUE < 1000"

The following C# code illustrates how the filter is passed to the weighted percentile function after specifying the percentile amount:

[C#]
double percentile = calculator.WeightedPercentile(
    "Sales", "Rating", 0.85, "VALUE < 1000");

This C# code computes the 85th weighted percentile of the elements in the sales column, with a value of under 1000. This will return 775 since this is the 85th filtered weighted percentile.

Unlike other quantile functions (deciles, non-weighted percentiles and quartiles), the weighted percentile function does not accept a member of the QuantileType enumeration. However, if equal weights are provided to the weighted percentile function, then the calculated percentiles are equivalent to using the non-weighted percentile function with the empirical distribution averaged quantile type (QuantileType.EmpiricalDistributionAveraged).

4.4.7 The Rank Function

The StatsCalculator rank function returns the rank of a number in a list of values. The rank of a number is its size relative to the other values in the list. If you were to sort the list, the rank of the number would be its position, with 1 being the first position.

The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public int Rank(string columnName, double value);

    public int Rank(string columnName, double value, SortOrder order);

    public int Rank(string columnName, double value, string filter);

    public int Rank(double[] values, double value);

    public int Rank(double[] values, double value, SortOrder order);

    // ...
}

The rank function accepts a data sheet column name, or an array followed by the value which to rank. A member of the SortOrder enumeration may also be specified which allows the ranking to be ascending or descending. The default rank sort order is controlled by the StatsConfiguration class SortOrder property. Unless otherwise set, sorting will be in ascending order.

To see how to rank a number against an array, consider the following C# code:

[C#]
double[] values = {8, 12, 14, 20, 35};
int rank = calculator.Rank(values, 13);

This will return 3 since the value 13 would be in the third position if added to the array and sorted in ascending order.

4.4.8 The Ranks Function

The StatsCalculator ranks function returns the ranks of a list of numbers. The ranks are the sizes relative to other values in the list. If you were to sort the list, the rank of each number would be its position, with 1 being the first position.

The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public int[] Ranks(string columnName);

    public int[] Ranks(string columnName, SortOrder order);

    public int[] Ranks(string columnName, string filter);

    public int[] Ranks(double[] values);

    public int[] Ranks(double[] values, SortOrder order);

    // ...
}

The ranks function accepts a data sheet column name, or an array. A member of the SortOrder enumeration may also be specified which allows the ranking to be ascending or descending. The default rank sort order is controlled by the StatsConfiguration class SortOrder property. Unless otherwise set, sorting will be in ascending order.

To see how to rank an array, consider the following C# code:

[C#]
double[] values = {22, 14, 17, 20, 15};
int[] ranks = calculator.Ranks(values);

This will return an array containing the values 5, 1, 3, 4, 2 since these would be the positions of the elements in the input array if the input array was sorted in ascending order.

4.5 Central Moments

4.5.1 The Central Moment Function

The central moment function returns the central moment of a list of values. Central moments of different orders may be computed. The central moment of order k is calculated according to the formula:

Hence the 0th central moment is 1, the 1st central moment is zero, and the 2nd central moment is the biased variance.

The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public double CentralMoment(
        string columnName, int order);

    public double CentralMoment(
        string columnName, int order, string filter);

    public double CentralMoment(
        double[] values, int order);

    // ...
}

To see how to compute a central moment of an array, consider the following C# code:

[C#]
double[] values = {41, 42, 45, 38, 46};
double centralMoment = calculator.CentralMoment(values, 3);

The above code will compute the 3rd central moment of the values 41, 42, 45, 38 and 46. Since the mean is 42.4 this will return a 3rd central moment of -4.752.

4.5.2 The Standard Moment Function

The standard moment function returns the standard moment of a list of values. Standard moments of different orders may be computed. The standard moment of order k is calculated according to the formula:

Hence the 0th standard moment is 1, the 1st standard moment is zero, the 2nd standard moment is 1 and the 3rd standard moment is the biased skew.

The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public double StandardMoment(
        string columnName, int order);

    public double StandardMoment(
        string columnName, int order, string filter);

    public double StandardMoment(
        double[] values, int order);

    // ...
}

To see how to compute a standard moment of an array, consider the following C# code:

[C#]
double[] values = {55, 56, 60, 53, 59};
double standardMoment = calculator.StandardMoment(values, 4);

The above code will compute the 4th standard moment of the values 55, 56, 60, 53, 59. Since the mean is 56.6 this will return a 4th standard moment of 1.55.

4.5.3 The Kurtosis Function

The StatsCalculator kurtosis function computes the kurtosis of a list of values, with the specified estimator. The kurtosis function is overloaded to accept either a data sheet column or an array of values. When using a data sheet column, a filter may also be specified.

The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public double Kurtosis(
        string columnName);

    public double Kurtosis(
        string columnName, Estimator estimator);

    public double Kurtosis(
        string columnName, string filter);

    public double Kurtosis(
        params double[] values);

    public double Kurtosis(
        double[] values, Estimator estimator);

    // ...
}

In the above methods, a member of the Estimator enumeration may be specified, which indicates if the sample is unbiased or biased.

As an example, consider the values 31, 28, 29, 35 and 26. We can compute the unbiased kurtosis using the following C# code:

[C#]
double[] values = {31, 28, 29, 35, 26};
double kurtosis = calculator.Kurtosis(values);

The above values have a mean of 29.8. The C# code above returns a kurtosis of 0.7.

4.5.4 The Skew Function

The StatsCalculator skew function computes the skew of a list of values, with the specified estimator. The skew function is overloaded to accept either a data sheet column or an array of values. When using a data sheet column, a filter may also be specified.

The following overloads are provided:

[C#]
public class StatsCalculator
{
    // ...

    public double Skew(
        string columnName);

    public double Skew(
        string columnName, Estimator estimator);

    public double Skew(
        string columnName, string filter);

    public double Skew(
        params double[] values);

    public double Skew(
        double[] values, Estimator estimator);

    // ...
}

In the above methods, a member of the Estimator enumeration may be specified, which indicates if the sample is unbiased or biased.

As an example, consider the values 45, 46, 49, 52 and 48. We can compute the unbiased skew using the following C# code:

[C#]
double[] values = {45, 46, 49, 52, 48};
double skew = calculator.Skew(values);

The above values have a mean of 48. The C# code above returns a skew of 0.6.