Statsar Statistics Library

v1.0.1 for .NET

Product Guide



More Information...

Valid XHTML


1. Getting Started

1.1 Product Overview

Statsar is a high-performance .NET platform statistics library. The powerful numerical classes are easy to use, and the download includes a user guide, reference manual and over 25 examples in C# and VB.NET.

The library calculates descriptive statistics (mean, standard deviation and variance), discrete and continuous probability distributions, hypothesis tests, chi-square and student's T-test, linear regression and correlation, and analysis of variance (ANOVA).

The Statsar statistics library allows you to add high-performance statistics calculations to your .NET platform applications. The object-oriented library was designed and implemented by numerical experts with proven expertise in the financial industry.

Providing a simple and intuitive object model, the library allows you to rapidly analyze your data by importing familiar data objects such as ADO.NET data tables. A powerful and robust CSV reader is also included with the component, allowing you to work with existing data files.

1.2 Product Features

The Statsar library has been designed to be as intuitive as possible, and is organized around the following product features:

1.2.1 Library Design

  • Rapid integration: start developing with as little as three lines of code.
  • An intuitive and easy to use object model reflecting the underlying numerical algorithms.
  • Organized according to Microsoft .NET platform library design, following best practices.
  • Efficiency at all levels by implementing the lowest order numerical algorithms, and employing propriety optimization in .NET.
  • Professionally designed software, including a complete user guide documenting all library features, a reference manual and over 25 examples in C# and VB.NET.

1.2.2 Data Analysis

  • Data manipulation classes including data sheet objects, used to efficiently store and access numerical data for statistics calculations.
  • Import data from virtually any data source including standard ADO.NET data tables, arrays and lists. Binds to your own custom objects via reflection.
  • Includes a robust and efficient CSV reader, ideal for importing data from Microsoft Excel or for quickly processing high volumes of legacy data.
  • Multiple and extensible data filters are included, providing a powerful way to remove unwanted or missing values.
  • Efficiently sort, reorder, permute, insert or remove data according to complex criteria.

1.2.3 Descriptive Statistics

  • Compute the count, sum, min or max of an entire data set, or on filtered data.
  • Measures of central tendency including mean, median, harmonic mean and geometric mean.
  • Variance, standard deviation, and absolute deviation from the mean and median.
  • Ranks, percentiles and interquartile range.
  • Central moments: skew and kurtosis.

1.2.4 Random Number Generators

  • High-quality random generators using standard .NET methods.
  • Mersenne twister pseudorandom numbers.
  • Linear congruential generator (LCG).
  • Linear feedback shift register.

1.2.5 Probability Distributions

  • Consistent interface across all distributions by deriving from a common base class.
  • Probability density function (PDF), cumulative distribution function (CDF) and inverse cumulative distribution function (inverse CDF).
  • Methods to compute statistics including the mean, variance, skew and kurtosis.
  • Random number generation according to multiple probability distributions.

1.2.6 Discrete Distributions

  • Bernoulli distribution.
  • Binomial distribution.
  • Discrete uniform distribution.
  • Geometric distribution.
  • Hypergeometric distribution.
  • Negative binomial distribution.
  • Poisson distribution.

1.2.7 Continuous Distributions

  • Beta distribution.
  • Cauchy distribution.
  • Chi-squared distribution.
  • Continuous uniform distribution.
  • Erlang distribution.
  • Exponential distribution.
  • F distribution.
  • Gamma distribution.
  • Gumbel distribution.
  • Laplace distribution.
  • Logistic distribution.
  • Lognormal distribution.
  • Normal distribution.
  • Pareto distribution.
  • Rayleigh distribution.
  • Student's T distribution.
  • Triangular distribution.
  • Weibull distribution.

1.2.8 Linear Regression

  • Multiple linear regression and regression analysis.
  • Least squares minimization, optimized using efficient matrix techniques.
  • Polynomial linear regression.
  • Confidence measures and intervals.

1.2.9 Hypothesis Tests

  • Critical values and confidence measures.
  • One sample and two sample testing.
  • Z-test and T-test hypothesis testing including p-values.
  • Two variance F-test.
  • Chi-square test.
  • Kolmogorov-Smirnov test.
  • Anderson-Darling test.
  • Bartlett's test.
  • Levene's test.

1.2.10 Analysis of Variance

  • Analysis of variance (ANOVA).
  • Analysis of variance with repeated measures (RANOVA).
  • One way and two way testing.

1.3 Requirements

1.3.1 Platform Requirements

The following versions of .NET are supported:

  • .NET framework v2.0
  • .NET framework Server Edition 2.0

1.3.2 Operating System Requirements

The following operating systems are supported:

  • Windows 2000
  • Windows XP
  • Windows Server 2003

1.4 Namespaces (overview)

The Statsar statistics library contains many classes; in order to make the library as easy to use as possible the classes have been categorized into several .NET namespaces. These are detailed below:

Namespace Description

Simplexar.Statsar

This is the root namespace and contains classes which are common throughout the library. The four key classes are:

StatsCalculator class: This is the most important class in the library, it is used to load and save data as well as perform descriptive statistics.

Datasheet class: Organizes data into a spreadsheet-like structure consisting of a table with headers, columns and rows.

LicenseManager class: Used to load a license key and manage licensing throughout the library.

Exception classes: These are various exception classes (SheetException, LicenseException, etc...) which are thrown when a feature specific error occurs. As these all derive from a common base class (StatsExceptionBase) it is possible to use a single top-level catch block to handle all Statsar library exceptions.

Simplexar.Statsar.Filters

Multiple and extensible data filters are included, providing a powerful way to remove unwanted or missing values. The key classes are the SheetFilter and the TrimFilter which are used in conjunction with a powerful scripting engine to filter data.

Simplexar.Statsar.RandomNumbers

The random number classes (MersenneTwister, LaggedFibonnaci and the linear congruential generators) each follow the .NET standard random number generator class. The generators all derive from the base class RandomBase and provide methods for generating random integers, real numbers between 0 and 1, and random sequence of bytes.

Simplexar.Statsar.Distributions

Two types of probability distributions are supported, discrete and continuous. Each discrete distribution class derives from the DiscreteDistributionBase class, and each continuous distribution derives from the ContinuousDistributionBase class. This design ensures that each distribution follows a common interface for computing the PDF, CDF and inverse CDF.

Simplexar.Statsar.LinearAlgebra

The Simplexar.Statsar.LinearAlgebra namespace contains classes which may be used to solve matrix and linear regression problems. These classes are:

Vector class: This represents a mathematical vector of N elements, where each element is a double (a real number).

Matrix class: This represents a mathematical matrix of N by M elements, where each element is a double (a real number).

LinearRegression class: This class models the relationship between two or more variables by fitting a linear equation to observed data.

Simplexar.Statsar.HypothesisTests

The hypothesis test classes (E.g. T-test, Z-test, chi-square test, etc...) each follow a standard interface. The hypothesis test classes all derive from the HypothesisTestBase class and provide properties for left critical values, right critical values, left probability values, right probability values and p-values.

Simplexar.Statsar.Anova

This namespace contains classes for analysis of variance. This is a test of the statistical significance of the differences among the mean scores of two or more groups on one or more variables.

Each of the ANOVA classes (e.g. OneWayAnova) are derived from the AnovaBase class, the output of which goes to a corresponding table class (e.g. OneWayAnovaTable).


1.5 Licensing

1.5.1 License Types

Developer License:

A developer license covers development with the Statsar statistics library by an individual, and may be deployed to a single non-server machine. This license includes:

  • Feature complete software
  • User guide, reference manual and examples
  • 60-days free support and product updates

Server License:

A server license covers deployment of the Statsar statistics library to a single server machine. This license includes:

  • 2 developer licenses
  • Deployment to a single server machine
  • 60-days free support and product updates

Enterprise License:

  • Development by up to 8 individuals
  • Multiple server deployment
  • Unlimited royalty-free redistribution
  • 90-days free support and product updates

For further information on licensing please see our online license agreement.

1.5.2 How to Use a License Key

The Statsar library is always used together with a valid license key. Attempting to use any Statsar class with an invalid or expired key results in a LicenseException being thrown. A valid license will either be an evaluation license if the software is being tested, or a professional or enterpise license when using Statsar in a production enviroment. A license key consists of 16 letters or digits, usually formatted with hyphens and braces:

{C5R3-EZ64-6FTD-4PU5}

License keys may be stored in a text file on disk, in the application configuration file or in the registry. When first instantiated, the library will invoke the LicenseManager class to look for a license key in the following order:

  • The LicenseManager.LicenseKey property is checked to see if a key has been loaded programmatically.
  • The application App.config file is checked for the key "StatsarLicense".
  • A registry value "StatsarLicense" is checked under the key:

"HKEY_LOCAL_MACHINE\Software\Simplexar\Statsar"

The LicenseManager class also provides two static members for setting a license key through code:

[C#]
// Load a license key from a text file.
LicenseManager.LoadKey(@"C:\License.txt");
[C#]
// Load a license key from a string.
string licenseKey = "{C5R3-EZ64-6FTD-4PU5}";
LicenseManager.LicenseKey = licenseKey;

The LoadKey() method provides overloads for loading a key from a text file or a stream object. The LicenseKey property may be used to get or set the license key in use by the library.

1.6 QuickStart Application

Calculating statistics with the Statsar library is easy and you can get started in a few minutes:

  • Download and reference the Statsar statistics library.
  • Instantiate a StatsCalculator object.
  • Import virtually any data source, including ADO.NET data tables.
  • Analyze your data using the powerful class library.

You can start developing with as little as three lines of code:

[C#]
StatsCalculator calculator = new StatsCalculator();
calculator.DataSource = dataTable;                       // any data source
double s = calculator.StandardDeviation("price");        // calculate statistic
[VB.NET]
Dim calculator As New StatsCalculator()
calculator.DataSource = dataTable                        ' any data source
Dim s As Double = calculator.StandardDeviation("price")  ' calculate statistic

1.6.1 Step by step guide for C#

In this section we will demonstrate how to use the Statsar statistics library by following a detailed step by step approach, illustrated with screenshots. We assume the following prerequisites:

  • A working knowledge of C# and the .NET framework
  • Visual Studio 2002, 2003 or 2005
  • Windows 2000, XP or Windows Server Edition
  • The .NET framework version 1.0 or 2.0

The aim of the following example is to load up a simple CSV file from Excel (already included as part of the distribution) and compute the average of one of the columns.

Step 1. Create a new Visual Studio project.

Creating a new Visual Studio Project
Figure 1.1 - Creating a new Visual Studio project.

Once the Statsar library is installed you are ready to begin:

  1. Open Visual Studio and from the File menu select New > Project.
  2. Select the Visual C# Console Application as the Visual Studio template we will be using.
  3. In the Name text field enter a suitable name for your project such as "Average".
  4. Click the Browse button and choose a suitable place for the project to reside, for example browse to "C:\Development" then click Open.
  5. Ensure that the Create directory for solution checkbox is selected, and then click the OK button.

The project should now be created and you will be presented with the following screen:

A newly created Visual Studio project
Figure 1.2 - A newly created Visual Studio project.

Step 2. Referencing the Statsar statistics library.

The next step is to add the Simplexar.Framework and Simplexar.Statsar DLL files as references. The Framework DLL is required as it holds the Simplexar framework, which is a code base shared among all Simplexar products. This should not be confused with the .NET framework.

  1. From the Project menu select Add reference.
  2. Select the Browse tab.
  3. Navigate to the "C:\Program Files\Simplexar Software\Statsar\Bin" folder.
  4. Select the Simplexar.Framework and Simplexar.Statsar DLL files
  5. Click the OK button.

Add Reference window
Figure 1.3 - Add Reference window.

The references should now be added to the project and should appear within the References folder within Visual Studio.

Visual Studio References folder
Figure 1.4 - Visual Studio references folder.

Step 3. Instantiating a StatsCalculator object and importing a CSV data file.

1 The following code should be inserted within the using declarations section:

[C#]
using Simplexar.Statsar;

2 The following code should be inserted within the Main method of the Program.cs class of the project:

[C#]
// Create a calculator.
StatsCalculator calculator = new StatsCalculator();
           
// Use the calculator to load the sheet.
DataSheet sheet = calculator.Load(
    @"C:\Program Files\Simplexar Software\Statsar\Data\ExamResults.csv");

Step 4. Print the data to the screen and calculate the average of one of the columns.

1 The following code should be inserted within the Main method:

[C#]
// Convert the sheet to a string for display.
Console.WriteLine(sheet.ToString());

2 Press Ctrl+F5 to build and run the project. The following console application should be displayed, showing the DataSheet loaded using the StatsCalculator:

The Datasheet on the console
Figure 1.5 - The Datasheet on the console.

3 The average of the EnglishResult column can be calculated by inserting the following code into the main method:

[C#]
// Calculate the mean of the 'EnglishResult' column.
Console.WriteLine(calculator.Mean("EnglishResult"));

The average computed
Figure 1.6 - The average computed.

1.6.2 Step by step guide for VB.NET

In this section we will demonstrate how to use the Statsar statistics library by following a detailed step by step approach, illustrated with screenshots. We assume the following prerequisites:

  • A working knowledge of VB.NET and the .NET framework
  • Visual Studio 2002, 2003 or 2005
  • Windows 2000, XP or Windows Server Edition
  • The .NET framework version 1.0 or 2.0

The aim of the following example is to load up a simple CSV file from Excel (already included as part of the distribution) and compute the average of one of the columns.

Step 1. Create a new Visual Studio project.

Creating a new Visual Studio Project
Figure 1.7 - Creating a new Visual Studio project.

Once the Statsar library is installed you are ready to begin:

  1. Open Visual Studio and from the File menu select New > Project.
  2. Select the Visual Visual Basic Console Application as the Visual Studio template we will be using.
  3. In the Name text field enter a suitable name for your project such as "Average".
  4. Click the Browse button and choose a suitable place for the project to reside, for example browse to "C:\Development" then click Open.
  5. Ensure that the Create directory for solution checkbox is selected, and then click the OK button.

The project should now be created and you will be presented with the following screen:

A newly created Visual Studio project
Figure 1.8 - A newly created Visual Studio project.

Step 2. Referencing the Statsar statistics library.

The next step is to add the Simplexar.Framework and Simplexar.Statsar DLL files as references. The Framework DLL is required as it holds the Simplexar framework, which is a code base shared among all Simplexar products. This should not be confused with the .NET framework.

  1. From the Project menu select Add reference.
  2. Select the Browse tab.
  3. Navigate to the "C:\Program Files\Simplexar Software\Statsar\Bin" folder.
  4. Select the Simplexar.Framework and Simplexar.Statsar DLL files
  5. Click the OK button.

Add Reference window
Figure 1.9 - Add Reference window.

Step 3. Instantiating a StatsCalculator object and importing a CSV data file.

1 The following code should be inserted within the imports section:

[VB.NET]
Imports Simplexar.Statsar

2 The following code should be inserted within the Main method of the Module1.vb file of the project:

[VB.NET]
' Create a calculator.
Dim calculator As New StatsCalculator()
 
' Use the calculator to load the sheet.
Dim sheet As DataSheet = calculator.Load( _
    "C:\Program Files\Simplexar Software\Statsar\Data\ExamResults.csv")

Step 4. Print the data to the screen and calculate the average of one of the columns.

1 The following code should be inserted within the Main method:

[VB.NET]
' Convert the sheet to a string for display.
Console.WriteLine(sheet.ToString())

2 Press Ctrl+F5 to build and run the project. The following console application should be displayed, showing the DataSheet loaded using the StatsCalculator:

The Datasheet on the console
Figure 1.10 - The Datasheet on the console.

3 The average of the EnglishResult column can be calculated by inserting the following code into the main method:

[VB.NET]
' Calculate the mean of the 'EnglishResult' column.
Console.WriteLine(calculator.Mean("EnglishResult"))

The average computed
Figure 1.11 - The average computed.