`qats.stats`#

Sub-package for statistics/distributions.

`qats.stats.empirical`#

Basic functions for statistical inference.

Functions overview

empirical_cdf(n[, kind])

Empirical cumulative distribution function given a sample size.

API

empirical_cdf(n, kind='mean')#

Empirical cumulative distribution function given a sample size.

Parameters:

n (int) – sample size
kind (str, optional) –
- ‘mean’: i/(n+1) (aka. Weibull method)
- ’median’ (i-0.3)/(n+0.4)
- ’symmetrical’: (i-0.5)/n
- ’beard’: (i - 0.31)/(n + 0.38) (Jenkinson’s/Beard’s method)
- ’gringorten’: (i - 0.44)/(n + 0.12) (Gringorten’s method)

Returns:

Empirical cumulative distribution function

Return type:

array

Notes

Gumbel recommended the following quantile formulation Pi = i/(n+1). This formulation produces a symmetrical CDF in the sense that the same plotting positions will result from the data regardless of whether they are assembled in ascending or descending order.

Jenkinson’s/Beard’s method is based on the “idea that a natural estimate for the plotting position is the median of its probability density distribution”.

A more sophisticated formulation Pi = (i-0.3)/(n+0.4) approximates the median of the distribution free estimate of the sample variate to about 0.1% and, even for small values of n, produces parameter estimations comparable to the result obtained by maximum likelihood estimations (Bury, 1999, p43)

The probability corresponding to the unbiased plotting position can be approximated by the Gringorten formula in the case of type 1 Extreme value distribution.

References

Plotting positions, About plotting positions

`qats.stats.gumbel`#

Gumbel class and functions related to Gumbel distribution.

Classes and functions overview

`Gumbel`(loc, scale[, data])	The Gumbel maxima distribution.
`bootstrap`(loc, scale, size, repetitions[, ...])	Quantify mean and coefficient of variation of Gumbel distribution parameters using parametric bootstrapping
`lse`(x)	Fit Gumbel distribution parameters to sample by method of least square fit to empirical cdf
`mle`(x)	Fit distribution parameters to sample by maximum likelihood estimation
`msm`(x)	Fit Gumbel distribution parameters to sample by method of sample moments
`plot_fits`(data[, filename, methods])	Plot data sample versus empirical and fitted cumulative distribution function on linearized Gumbel scales
`pwm`(x)	Fit Gumbel distribution parameters to sample by method of probability weighted moments [7].

Class API

class Gumbel(loc, scale, data=None)#

The Gumbel maxima distribution.

The cumulative distribution function is defined as:

F(x) = exp{-exp[-(x-a)/b]}

where a is location parameter and b is the scale parameter.

Parameters:

loc (float) – Gumbel location parameter.
scale (float) – Gumbel scale parameter.
data (array_like, optional) – Sample data, used to establish empirical cdf and is included in plots. To fit the Gumbel distribution to the sample data, use Gumbel.fit().

Attributes:

loc (float) – Gumbel location parameter.
scale (float) – Gumbel scale parameter.
data (array_like) – Sample data.

Examples

To initiate an instance based on parameters, use:

>>> from qats.stats.gumbel import Gumbel
>>> gumb = Gumbel(loc, scale)

If you need to establish a Gumbel instance based on a sample data set, use:

>>> gumb = Gumbel.fit(data, method='msm')

References

Statistical models in applied science., Bury, K.V. (1975), Wiley, New York
Bruk av asymptotiske ekstremverdifordelinger, Haver, S. (2007)
Plotting positions, About plotting positions
Usable estimators for parameters in Gumbel distribution
Bootstrapping statistics
Probability weighted moments, Greenwood, J. A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979, Water Resources Research. 15(5): 1049-1054.
Probability weighted moments compared with some traditional techniques in estimating gumbel parameters and quantiles., Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979., Water Resources Research. 15(5): 1063-1064.

Properties

`cov`	Distribution coefficient of variation (C.O.V.)
`ecdf`	Median rank empirical cumulative distribution function associated with the sample
`kurt`	Distribution kurtosis
`mean`	Distribution mean value
`median`	Distribution median value
`mode`	Distribution mode value
`mse`	Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution
`params`	Distribution parameters.
`skew`	Distribution skewness
`std`	Distribution standard deviation

Methods

`cdf`([x])	Cumulative distribution function (cumulative probability) for specified values x
`fit`(data[, method, verbose])	Determine distribution parameters by fit to sample.
`fit_from_weibull_parameters`(wa, wb, wc, n[, ...])	Calculate Gumbel distribution parameters from n independent Weibull distributed variables.
`invcdf`([p])	Inverse cumulative distribution function for specified probabilities
`pdf`([x])	Probability density function for specified values x
`plot`([filename])	Plot cumulative distribution function
`plot_linear`([filename])	Plot cumulative distribution function on linearized Gumbel scales
`rnd`([size, seed])	Draw random samples from probability distribution

property cov#

Distribution coefficient of variation (C.O.V.)

Returns:: Distribution c.o.v.
Return type:: float

property ecdf#

Median rank empirical cumulative distribution function associated with the sample

Returns:: Empirical cumulative distribution function
Return type:: array

Notes

Requires data/sample to be specified.

Gumbel recommended the following mean rank quantile formulation Pi = i/(n+1). This formulation produces a symmetrical CDF in the sense that the same plotting positions will result from the data regardless of whether they are assembled in ascending or descending order.

A more sophisticated median rank formulation Pi = (i-0.3)/(n+0.4) approximates the median of the distribution free estimate of the sample variate to about 0.1% and, even for small values of n, produces parameter estimations comparable to the result obtained by maximum likelihood estimations (Bury, 1999, p43). A median rank method, pi=(i-0.3)/(n+0.4), is chosen to approximate the mean of the distribution [2].

The empirical cdf is also used as plotting positions when plotting the sample on probability paper.

property kurt#

Distribution kurtosis

Returns:: Distribution kurtosis
Return type:: float

property mean#

Distribution mean value

Returns:: Distribution mean value
Return type:: float

property median#

Distribution median value

Returns:: Distribution median value
Return type:: float

property mode#

Distribution mode value

Returns:: Distribution mode value
Return type:: float

property mse#

Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution

Returns:: mean squared error
Return type:: float

Notes

Requires data/sample to be specified.

property params#

Distribution parameters.

Returns:: Distribution parameters: (loc, scale).
Return type:: tuple

property std#

Distribution standard deviation

Returns:: Distribution standard deviation
Return type:: float

property skew#

Distribution skewness

Returns:: Distribution skewness
Return type:: float

Notes

zetac is the complementary Riemann zeta function (zeta function minus 1). See http://docs.scipy.org/doc/scipy/reference/generated/scipy.special.zetac.html

cdf(x=None)#

Cumulative distribution function (cumulative probability) for specified values x

Parameters:: x (array_like, optional) – Calculate cumulative probability for these values
Returns:: Cumulative probabilities for specified values x
Return type:: array

Notes

A range of x values [loc, loc+3*std] are applied if x is not specified.

classmethod fit(data, method='msm', verbose=False)#

Determine distribution parameters by fit to sample.

Parameters:

data (array_like) – Sample
method (str, optional) –
Method of fit. Options:
- msm = method of sample moments (default)
- lse = least-square estimation
- mle = maximum likelihood estimation
- pwm = probability weighted moments
verbose (bool, optional) – If true, fitted parameters are written to screen.

Examples

Assuming data is a sample array/list:

>>> from qats.stats.gumbel import Gumbel
>>> gumb = Gumbel.fit(data, method="msm")

classmethod fit_from_weibull_parameters(wa, wb, wc, n, verbose=False)#

Calculate Gumbel distribution parameters from n independent Weibull distributed variables.

Parameters:

wa (float) – Weibull loc parameter
wb (float) – Weibull scale parameter
wc (float) – Weibull shape parameter
n (int) – Number independently distributed variables
verbose (bool) – Print fitted parameters

Notes

A warning is issued if Weibull shape parameter less than 1. In this case, the convergence towards asymptotic extreme value distribution is slow , and the asymptotic distribution will be non-conservative relative to the exact distribution. The asymptotic distribution is correct with Weibull shape equal to 1 and conservative with Weibull shape larger than 1. These deviations diminish with larger samples. See [1, p. 380].

References

Bury, Karl V., 1975, “Statistical Models in Applied Science”, University of British Columbia, John Wiley & Sons

Examples

>>> from qats.stats.gumbel import Gumbel
>>> gumb = Gumbel.fit_from_weibull_parameters(wa, wb, wc, n)

invcdf(p=None)#

Inverse cumulative distribution function for specified probabilities

Parameters:: p (array_like, optional) – Calculate the inverse cumulative distribution function for these probabilities
Returns:: Values corresponding to the specified quantiles
Return type:: array

Notes

A range of quantiles from 0.001 to 0.999 are applied if quantiles are not specified

pdf(x=None)#

Probability density function for specified values x

Parameters:: x (array_like, optional) – Cumulative probabilities for specified values x
Returns:: Calculate probability density for these values x
Return type:: array

Notes

A range of x values [loc, loc+3*std] are applied if x is not specified.

plot(filename=None)#

Plot cumulative distribution function

Parameters:: filename (str, optional) – Save plot as filename, default is to show plot on screen

plot_linear(filename=None)#

Plot cumulative distribution function on linearized Gumbel scales

Parameters:: filename (str, optional) – Save plot as filename, default is to show plot on screen

rnd(size=None, seed=None)#

Draw random samples from probability distribution

Parameters:

size (int|numpy shape, optional) – Sample size (default 1 random value is returned)
seed (int, optional) – Seed for random number generator (default seed is random)

Returns:

Random sample

Return type:

array

Examples

Pick 1000 values randomly from a Gumbel distribution

>>> from qats.stats.gumbel import Gumbel
>>> g = Gumbel(loc, scale)
>>> sample = g.rnd(size=1000)

If you want to preset the seed for the random sampling (to be able to repeat the sampling)

>>> from qats.stats.gumbel import Gumbel
>>> g = Gumbel(loc, scale)
>>> sample = g.rnd(size=1000, seed=3)

Functions API

bootstrap(loc, scale, size, repetitions, method='pwm')#

Quantify mean and coefficient of variation of Gumbel distribution parameters using parametric bootstrapping

Parameters:

loc (float) – Source distribution location parameter
scale (float) – Source distribution scale parameter
size (int) – Size of bootstrapped sample
method (str, optional) – method of fit, optional ‘msm’ = method of sample moments ‘lse’ = least-square estimation ‘mle’ = maximum likelihood estimation ‘pwm’ = probability weighted moments (default)
repetitions (int, optional) – Number of bootstrap samples. default equal to 100

Returns:

array – Mean distribution parameters
array – Coefficient of variation of distribution parameter

Notes

In statistics, bootstrapping is a method for assigning measures of accuracy to sample estimates (variance, quantiles). This technique allows estimation of the sampling distribution of almost any statistic using only very simple methods. Generally, it falls in the broader class of resampling methods. In this case a parametric model is fitted to the data, and samples of random numbers with the same size as the original data, are drawn from this fitted model. Then the quantity, or estimate, of interest is calculated from these data. This sampling process is repeated many times as for other bootstrap methods. If the results really matter, as many samples as is reasonable, given available computing power and time, should be used. Increasing the number of samples cannot increase the amount of information in the original data, it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself. See [5] about bootstrapping.

Examples

To quantify the uncertainty (coefficient of variation) of a Gumbel distribution fitted to a sample with 5 values (using 100 repetition):

>>> from qats.stats.gumbel import bootstrap
>>> m, cv = bootstrap(10, 2.5, 5, 100)

lse(x)#

Fit Gumbel distribution parameters to sample by method of least square fit to empirical cdf

Parameters:: x (array_like) – data sample
Returns:: distribution loc and scale parameters
Return type:: floats

Notes

Uses an approximate median rank estimate for the empirical cdf.

mle(x)#

Fit distribution parameters to sample by maximum likelihood estimation

Parameters:: x (array_like) – data sample
Returns:: distribution loc and scale parameters
Return type:: floats

Notes

MLE equation set is given in ‘Statistical Distributions’ by Forbes et.al. (2010) and referred at [4]

msm(x)#

Fit Gumbel distribution parameters to sample by method of sample moments

Parameters:: x (array_like) – data sample
Returns:: distribution loc and scale parameters
Return type:: floats

Notes

See description in [1] and [2].

plot_fits(data, filename=None, methods=None)#

Plot data sample versus empirical and fitted cumulative distribution function on linearized Gumbel scales

Parameters:

data (array_like) – Data sample
filename (str, optional) – Save plot as filename, default is to show plot on sc
methods (tuple, optional) –
Methods of fit. Options (default all):
- msm = method of sample moments
- lse = least-square estimation
- mle = maximum likelihood estimation
- pwm = probability weighted moments

pwm(x)#

Fit Gumbel distribution parameters to sample by method of probability weighted moments [7].

Parameters:: x (array_like) – data sample
Returns:: distribution parameters location and scale
Return type:: tuple

`qats.stats.gumbelmin`#

GumbelMin class and functions related to Gumbel (minima) distribution.

Classes and functions overview

`GumbelMin`([loc, scale, data])	The Gumbel minima distribution.
`lse`(x)	Fit distribution parameters to sample by method of least square fit to empirical cdf
`mle`(x)	Fit distribution parameters to sample by maximum likelihood estimation
`msm`(x)	Fit distribution parameters to sample by method of sample moments

Class API

class GumbelMin(loc=None, scale=None, data=None)#

The Gumbel minima distribution.

The cumulative distribution function is defined as:

F(x) = 1 - exp{-exp[(x-a)/b]}

where a is location parameter and b is the scale parameter.

Parameters:

loc (float) – Gumbel location parameter.
scale (float) – Gumbel scale parameter.
data (array_like, optional) – Sample data, used to establish empirical cdf and is included in plots. To fit the Gumbel distribution to the sample data, use GumbelMin.fit().

Attributes:

loc (float) – Gumbel location parameter.
scale (float) – Gumbel scale parameter.
data (array_like) – Sample data.

Examples

To initiate an instance based on parameters, use:

>>> from qats.stats.gumbelmin import GumbelMin
>>> gumb = GumbelMin(loc, scale)

If you need to establish a Gumbel instance based on a sample data set, use:

>>> gumb = GumbelMin.fit(data, method='msm')

References

Bury, K.V. (1975) Statistical models in applied science. Wiley, New York
Haver, S. (2007), “Bruk av asymptotiske ekstremverdifordelinger”
Plotting positions, About plotting positions
Usable estimators for parameters in Gumbel distribution
Bootstrapping statistics

Properties

`cov`	Distribution coefficient of variation (C.O.V.)
`ecdf`	Median rank empirical cumulative distribution function associated with the sample
`kurt`	Distribution kurtosis
`mean`	Distribution mean value
`median`	Distribution median value
`mode`	Distribution mode value
`skew`	Distribution skewness
`std`	Distribution standard deviation

Methods

`bootstrap`([size, method, N])	Parametric bootstrapping of source distribution
`cdf`([x])	Cumulative distribution function (cumulative probability) for specified values x
`fit`([data, method, verbose])	Determine distribution parameters by fit to sample.
`fit_from_weibull_parameters`(wa, wb, wc, n[, ...])	Calculate Gumbel distribution parameters from n independent Weibull distributed variables.
`gp_plot`([showfig, save])	Plot data on Gumbel paper (linearized scales))
`invcdf`([p])	Inverse cumulative distribution function for specified quantiles p
`pdf`([x])	Probability density function for specified values x
`plot`([showfig, save])	Plot data on regular scales
`rnd`([size, seed])	Draw random samples from probability distribution

property cov#

Distribution coefficient of variation (C.O.V.)

Returns:: c – distribution c.o.v.
Return type:: float

property ecdf#: Median rank empirical cumulative distribution function associated with the sample

Notes

Gumbel recommended the following mean rank quantile formulation Pi = i/(n+1). This formulation produces a symmetrical CDF in the sense that the same plotting positions will result from the data regardless of whether they are assembled in ascending or descending order.

A more sophisticated median rank formulation Pi = (i-0.3)/(n+0.4) approximates the median of the distribution free estimate of the sample variate to about 0.1% and, even for small values of n, produces parameter estimations comparable to the result obtained by maximum likelihood estimations (Bury, 1999, p43) A median rank method, pi=(i-0.3)/(n+0.4), is chosen to approximate the mean of the distribution [2]

The empirical cdf is also used as plotting positions when plotting the sample on probability paper.

property kurt#

Distribution kurtosis

Returns:: k – distribution kurtosis
Return type:: float

property mean#

Distribution mean value

Returns:: m – distribution mean value
Return type:: float

property median#

Distribution median value

Returns:: m – distribution median value
Return type:: float

property mode#

Distribution mode value

Returns:: m – distribution mode value
Return type:: float

property std#

Distribution standard deviation

Returns:: s – distribution standard deviation
Return type:: float

property skew#

Distribution skewness

Returns:: s – distribution skewness
Return type:: float

bootstrap(size=None, method='msm', N=100)#

Parametric bootstrapping of source distribution

Parameters:

size (int) – bootstrap sample size. default equal to source sample size
method ({'msm','lse','mle'}) – method of fit, optional ‘msm’ = method of sample moments ‘lse’ = least-square estimation ‘mle’ = maximum likelihood estimation
N (int) – number of bootstrap samples. default equal to 10

Returns:

array-like – m - mean distribution parameters
array_like – cv - coefficient of variation of distribution parameter

Notes

In statistics, bootstrapping is a method for assigning measures of accuracy to sample estimates (variance,quantiles). This technique allows estimation of the sampling distribution of almost any statistic using only very simple methods. Generally, it falls in the broader class of resampling methods. In this case a parametric model is fitted to the data, and samples of random numbers with the same size as the original data, are drawn from this fitted model. Then the quantity, or estimate, of interest is calculated from these data. This sampling process is repeated many times as for other bootstrap methods. If the results really matter, as many samples as is reasonable, given available computing power and time, should be used. Increasing the number of samples cannot increase the amount of information in the original data, it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself. See [5] about bootstrapping.

cdf(x=None)#

Cumulative distribution function (cumulative probability) for specified values x

Parameters:: x (array_like) – values
Returns:: cdf – cumulative probabilities for specified values x
Return type:: array

Notes

A range of x values [location, location+3*std] are applied if x is not specified.

fit(data=None, method='msm', verbose=False)#

Determine distribution parameters by fit to sample.

Parameters:

data (array_like) – sample, optional
method ({'msm','lse','mle'}) – method of fit, optional ‘msm’ = method of sample moments ‘lse’ = least-square estimation ‘mle’ = maximum likelihood estimation
verbose (bool) – turn on output of fitted parameters

Notes

If data is not input any data stored in object (self.data) will be used.

fit_from_weibull_parameters(wa, wb, wc, n, verbose=False)#

Calculate Gumbel distribution parameters from n independent Weibull distributed variables.

Parameters:

wa (float) – Weibull location parameter
wb (float) – Weibull scale parameter
wc (float) – Weibull shape parameter
n (int) – Number independently distributed variables
verbose (bool) – print fitted parameters

Notes

gp_plot(showfig=True, save=None)#

Plot data on Gumbel paper (linearized scales))

Parameters:

showfig (bool) – show figure immediately on screen, default True
save (filename) – save figure to file, default None

invcdf(p=None)#

Inverse cumulative distribution function for specified quantiles p

Parameters:: p (array_like) – quantiles (or. cumulative probabilities if you like)
Returns:: x – values corresponding to the specified quantiles
Return type:: array

Notes

A range of quantiles from 0.001 to 0.999 are applied if quantiles are not specified

pdf(x=None)#

Probability density function for specified values x

Parameters:: x (array_like) – values
Returns:: pdf – probability density function for specified values x
Return type:: array

Notes

A range of x values [location, location+3*std] are applied if x is not specified.

plot(showfig=True, save=None)#

Plot data on regular scales

Parameters:

showfig (bool) – show figure immediately on screen, default True
save (filename including suffix) – save figure to file, default None

rnd(size=None, seed=None)#

Draw random samples from probability distribution

Parameters:

size (int|numpy shape, optional) – sample size (default 1 random value is returned)
seed (int) – seed for random number generator (default seed is random)

Returns:

x – random sample

Return type:

array

Functions API

lse(x)#

Fit distribution parameters to sample by method of least square fit to empirical cdf

Parameters:: x (array_like) – sample

Notes

Uses an approximate median rank estimate for the empirical cdf.

mle(x)#

Fit distribution parameters to sample by maximum likelihood estimation

Parameters:: x (array_like) – sample

Notes

MLE equation set is given in ‘Statistical Distributions’ by Forbes et.al. (2010) and referred at [4]

msm(x)#

Fit distribution parameters to sample by method of sample moments

Parameters:: x (array_like) – sample

Notes

See description in [1] and [2].

`qats.stats.weibull`#

Weibull class and functions related to Weibull distribution.

Classes and functions overview

`Weibull`(loc, scale, shape[, data])	The Weibull class offers miscellaneous functions for working with the Weibull distribution, defined as (cumulative distribution function).
`bootstrap`(loc, scale, shape, size, repetitions)	Quantify mean and coefficient of variation of Weibull distribution parameters using parametric bootstrapping
`lse`(x[, threshold])	Fit Weibull distribution parameters to sample by method of least square fit to empirical cdf.
`mle`(x)	Fit Weibull distribution parameters to sample by maximum likelihood estimation
`mlj`(sample, l, j)	Probability weighted moment Mljk of observation order l, order of cdf j, with emphasize on the right/upper tail (k=0).
`msm`(x)	Fit Weibull distribution parameters to sample by method of sample moments
`plot_fit`(x, params[, path])	Plot data sample versus empirical and fitted cumulative distribution function on linearized Weibull scales
`pwm`(x)	Fit distribution parameters to sample by method of probability weighted moments
`pwm2`(x)	Fit distribution parameters to sample by method of probability weighted moments assuming the location parameter is zero.
`weibull2gumbel`(loc, scale, shape, n)	Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent,Weibull distributed variables.

Class API

class Weibull(loc, scale, shape, data=None)#

The Weibull class offers miscellaneous functions for working with the Weibull distribution, defined as (cumulative distribution function):

F(x) = 1 - exp{-[(x-a)/b]^c}

where a is location parameter, b is scale parameter and c is shape parameter.

Parameters:

loc (float) – Weibull location parameter.
scale (float) – Weibull scale parameter.
shape (float) – Weibull shape parameter.
data (array_like, optional) – Sample data, used to establish empirical cdf and is included in plots. To fit the Weibull distribution to the sample data, use Weibull.fit().

Attributes:

loc (float) – Weibull location parameter.
scale (float) – Weibull scale parameter.
shape (float) – Weibull shape parameter.
data (array_like) – Sample data. Exists only if distribution parameters are estimated from a sample.

Notes

For a Weibull 2-parameter distribution, specify location parameter 0 (zero).

Examples

To initiate an instance based on parameters, use:

>>> from qats.stats.weibull import Weibull
>>> weib = Weibull(loc, scale, shape)

If you need to establish a Weibull instance based on a sample data set, use:

>>> weib = Weibull.fit(data, method='pwm')

References

Moment estimators for Weibull parameters and their asymptotic efficiencies, Waloddi Weibull, April 1969, Lausanne Switzerland, Technical report AFML-TR-69-135
Continuous univariate distributions, Volume 1, N.L.Johnson, S.Kotz and N.Balakrishnan, 1994, John Wiley and sons inc.
weibull.com, About location parameter
Plotting positions, About plotting positions
Bootstrapping, Bootstrapping statistics
Estimation of the generalized extreme value distribution by the method of probability weighted moments, Hosking, J. R. M., Wallis, J. R. and Wood, E. F., 1985, Technometrics, 27, pp. 251-261
Estimating the three-parameter Weibull distribution by the method of probability weighted moments with application to medical survival data, Bortolucci, A. A. et.al.
Theory and derivation for Weibull parameter probability weighted moment estimators, Grender, J.M., Dell, T.R., Reich, R.M., 1991 United Sates Department of Agriculture
Probability weighted moments, Greenwood, J. A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979, Water Resources Research. 15(5): 1049-1054.
Probability weighted moments compared with some traditional techniques in estimating gumbel parameters and quantiles., Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979., Water Resources Research. 15(5): 1063-1064.

Properties

`cov`	Distribution coefficient of variation (C.O.V.)
`ecdf`	Empirical cumulative distribution function associated with the sample.
`kurt`	Distribution kurtosis.
`mean`	Distribution mean value
`mse`	Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution
`params`	Distribution parameters.
`skew`	Distribution skewness
`std`	Distribution standard deviation

Methods

`cdf`([x])	Cumulative distribution function (cumulative probability) for specified values x
`fit`(data[, method, verbose])	Establish Weibull class instance by fit to sample.
`fromsignal`(x[, method, verbose])	Establish Weibull class instance by fit to global maxima from time series signal.
`gumbel_parameters`([n])	Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent, Weibull distributed variables.
`invcdf`([p])	Inverse cumulative distribution function for specified quantiles p
`pdf`([x])	Probability density function for specified values x
`plot`([filename])	Plot data on regular scales
`plot_linear`([filename])	Plot data on Weibull paper (linearized scales))
`rnd`([size, seed])	Draw random samples from probability distribution

property cov#

Distribution coefficient of variation (C.O.V.)

Returns:: distribution c.o.v.
Return type:: float

property ecdf#

Empirical cumulative distribution function associated with the sample.

Returns:: Empirical cumulative distribution function.
Return type:: array

Notes

A mean rank method is chosen to approximate the mean of the distribution [2].

The empirical cdf is also used as plotting positions when plotting the sample on probability paper.

property kurt#

Distribution kurtosis.

Returns:: distribution kurtosis
Return type:: float

property mean#

Distribution mean value

Returns:: distribution mean value
Return type:: float

property mse#

Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution

Returns:: mean squared error
Return type:: float

property params#

Distribution parameters.

Returns:: Distribution parameters: (loc, scale, shape).
Return type:: tuple

property skew#

Distribution skewness

Returns:: distribution skewness
Return type:: float

property std#

Distribution standard deviation

Returns:: distribution standard deviation
Return type:: float

gumbel_parameters(n=None)#

Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent, Weibull distributed variables.

Parameters:: n (int) – number of independent weibull distributed variables, default equal to number of peaks (self.data.size)
Returns:: Gumbel location and scale parameters
Return type:: tuple

See also

qats.stats.weibull.weibull2gumbel

Notes

If the sample x is based on lets say a 30-hour simulation but you seek an estimate of the e.g. 3-hour extreme value then n should be calculated as the nearest integer to:

n = (3 / 30) * nx

where nx is the total number of maxima during 30 hour.

References

Bury, K.V. (1975), “Statistical models in applied science”

cdf(x=None)#

Cumulative distribution function (cumulative probability) for specified values x

Parameters:: x (array_like) – values
Returns:: cumulative probabilities for specified values x
Return type:: array

Notes

A range of x values are applied if x is not specified.

classmethod fit(data, method='msm', verbose=False)#

Establish Weibull class instance by fit to sample.

Parameters:

data (array_like) – Sample.
method (str, optional) –
Method of fit. Available options:
- msm = method of sample moments (default)
- lse = least-square estimation
- mle = maximum likelihood estimation
- pwm = probability weighted moments
- pwm2 = probability weighted moments, 2-parameter distribution
verbose (bool) – If True, fitted parameters are printed to screen.

Returns:

Weibull class instance

Return type:

Weibull

See also

Weibull.fromsignal

Examples

Assuming data is a sample array/list:

>>> from qats.stats.weibull import Weibull
>>> weib = Weibull.fit(data, method="msm")

classmethod fromsignal(x, method='msm', verbose=False)#

Establish Weibull class instance by fit to global maxima from time series signal.

Parameters:

x (array_like) – Time series signal.
method (str, optional) – Method of fit. See Weibull.fit() for description of options.
verbose (bool, optional) – If True, fitted parameters are printed to screen.

Returns:

Class instance.

Return type:

Weibull

See also

Weibull.fit, qats.stats.find_maxima

Examples

Assuming x is a time series signal:

>>> from qats.stats.weibull import Weibull
>>> weib = Weibull.fromsignal(x, method='msm')

Note that the example above is equivalent to:

>>> from qats.signal import find_maxima
>>> sample, _ = find_maxima(x, local=False, threshold=None, up=True)
>>> weib = Weibull.fit(sample, method='msm')

invcdf(p=None)#

Inverse cumulative distribution function for specified quantiles p

Parameters:: p (array_like) – quantiles (or. cumulative probabilities if you like)
Returns:: values corresponding to the specified quantiles
Return type:: array

Notes

A range of quantiles from 0 to 1 are applied if quantiles are not specified

pdf(x=None)#

Probability density function for specified values x

Parameters:: x (array_like) – values
Returns:: probability density function for specified values x
Return type:: array

Notes

A range of x values are applied if x is not specified.

plot(filename=None)#

Plot data on regular scales

Parameters:: filename (str, optional) – Save plot as filename, default is to show plot on screen

Examples

Plot distribution and show the figure

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot()

Plot distribution and save the figure as png

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot(filename="plot.png")

plot_linear(filename=None)#

Plot data on Weibull paper (linearized scales))

Parameters:: filename (str, optional) – Save plot as filename, default is to show plot on screen

Examples

Plot distribution and show the figure

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot_linear()

Plot distribution and save the figure as png

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot_linear(filename="plot.png")

References

Continuous univariate distributions, Volume 1, N.L.Johnson, S.Kotz and N.Balakrishnan, 1994, John Wiley and sons inc.

rnd(size=None, seed=None)#

Draw random samples from probability distribution

Parameters:

size (int|numpy shape, optional) – sample size (default 1 random value is returned)
seed (int) – seed for random number generator (default seed is random)

Returns:

random sample

Return type:

array

Functions API

bootstrap(loc, scale, shape, size, repetitions, method='pwm')#

Quantify mean and coefficient of variation of Weibull distribution parameters using parametric bootstrapping

Parameters:

loc (float) – Source distribution location parameter
scale (float) – Source distribution scale parameter
shape (float) – Source distribution shape parameter
size (int) – Size of bootstrapped sample
method (str, optional) –
Method of fit. Available options:
- msm = method of sample moments
- lse = least-square estimation
- mle = maximum likelihood estimation
- pwm = probability weighted moments (default)
repetitions (int, optional) – Number of bootstrap samples. default equal to 100

Returns:

array – Mean distribution parameters
array – Coefficient of variation of distribution parameter

Notes

In statistics, bootstrapping is a method for assigning measures of accuracy to sample estimates (variance,quantiles). This technique allows estimation of the sampling distribution of almost any statistic using only very simple methods. Generally, it falls in the broader class of resampling methods. In this case a parametric model is fitted to the data, and samples of random numbers with the same size as the original data, are drawn from this fitted model. Then the quantity, or estimate, of interest is calculated from these data. This sampling process is repeated many times as for other bootstrap methods. If the results really matter, as many samples as is reasonable, given available computing power and time, should be used. Increasing the number of samples cannot increase the amount of information in the original data, it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself. See [5] about bootstrapping.

Examples

To quantify the uncertainty (coefficient of variation) of a Weibull distribution fitted to a sample with 5 values (using 100 repetition):

>>> from qats.stats.weibull import bootstrap
>>> m, cv = bootstrap(10., 5., 2.5, 5, 100)

lse(x, threshold: float | None = None)#

Fit Weibull distribution parameters to sample by method of least square fit to empirical cdf.

Parameters:

x (array_like) – sample data
threshold (float, optional) – Fit distribution to data points above this threshold. The threshold is defined as value <0, 1> in the empirical CDF. So with threshold=0.87 the distribution is fitted to the values exceeding the 0.87-quantile of the empirical cumulative distribution function.

Returns:

Distribution parameters (loc, scale, shape).

Return type:

tuple (floats)

Notes

Uses what are known as (approximate) mean rank estimates for the empirical cdf.

mle(x)#

Fit Weibull distribution parameters to sample by maximum likelihood estimation

Parameters:: x (array_like) – sample data
Returns:: Distribution parameters (loc, scale, shape).
Return type:: tuple (floats)

mlj(sample, l, j)#

Probability weighted moment Mljk of observation order l, order of cdf j, with emphasize on the right/upper tail (k=0).

Parameters:

sample (array) – Sample.
l (int) – Order of observation (sample).
j (int) – Order of cumulative distribution function.

Returns:

Probability weighted moment

Return type:

float

Notes

The probability weighted moment Mljk is defined by Greenwood and others (1979) as:

M_{l,j,k} = E[X^l * F^j * (1-F)^k]

, where X(F) is the inverse form of the distribution and F is the cumulative distribution function. When j=k=0 and l is a non-negative integer, then M_{l,0,0} represents the conventional moment of order l about the origin.

PWMs can be applied either when the small observations are more important than the large observations (k=0), as in strength properties of materials, or when the large observations should have more influence than the smaller observations (k=0) as with three diameter distribution modelling. Here we have chosen the latter and derived unbiased estimators for moments M_{l,j,0}(k=0), see eq. 32 in [8]:

M_{l,j,0} = (1 / n) * sum(x[i]^l * binom(i-1, j) / binom(n-1, j))

where i is a counter from j+1 to n and binom() is the binomial coefficient.

msm(x)#

Fit Weibull distribution parameters to sample by method of sample moments

Parameters:: x (array_like) – sample data
Returns:: The loc, scale and shape distribution parameters
Return type:: floats

Notes

See description in [1].

plot_fit(x: ndarray, params: tuple, path: str | None = None)#

Plot data sample versus empirical and fitted cumulative distribution function on linearized Weibull scales

Parameters:

x (array_like) – Data sample
params (tuple) – location, scale and shape parameter of the Weibull distribution
path (str, optional) – Save figure to file instead of displaying it.

pwm(x)#

Fit distribution parameters to sample by method of probability weighted moments

Parameters:: x (array_like) – sample data
Returns:: The loc, scale and shape distribution parameters
Return type:: floats

Notes

Details on probability weighted moments are provided in [8].

See also

mlj

pwm2(x)#

Fit distribution parameters to sample by method of probability weighted moments assuming the location parameter is zero.

Parameters:: x (array_like) – sample data
Returns:: The scale and shape distribution parameters
Return type:: floats

Notes

Details on probability weighted moments are provided on p.14-15 in [8]. Note that only the scale and parameters are estimated, the location parameter is assumed zero.

See also

mlj, pwm

weibull2gumbel(loc, scale, shape, n)#

Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent,Weibull distributed variables.

Parameters:

loc (float) – Weibull distribution location parameter
scale (float) – Weibull distribution scale parameter
shape (float) – Weibull distribution shape parameter
n (int) – Number of independent weibull distributed variables

Returns:

Gumbel location and scale parameters

Return type:

tuple

Notes

If the sample x is based on lets say a 30-hour simulation but you seek an estimate of the e.g. 3-hour extreme value then n should be calculated as the nearest integer to:

n = (3 / 30) * nx

where nx is the total number of maxima during 30 hour.

References

Bury, K.V. (1975), “Statistical models in applied science”

qats.stats#

qats.stats.empirical#

qats.stats.gumbel#

qats.stats.gumbelmin#

qats.stats.weibull#

`qats.stats`#

`qats.stats.empirical`#

`qats.stats.gumbel`#

`qats.stats.gumbelmin`#

`qats.stats.weibull`#