qats.stats#

Sub-package for statistics/distributions.

qats.stats.empirical#

Basic functions for statistical inference.

Functions overview

empirical_cdf(n[, kind])

Empirical cumulative distribution function given a sample size.

API

empirical_cdf(n, kind='mean')#

Empirical cumulative distribution function given a sample size.

Parameters:
  • n (int) – sample size

  • kind (str, optional) –

    • ‘mean’: i/(n+1) (aka. Weibull method)

    • ’median’ (i-0.3)/(n+0.4)

    • ’symmetrical’: (i-0.5)/n

    • ’beard’: (i - 0.31)/(n + 0.38) (Jenkinson’s/Beard’s method)

    • ’gringorten’: (i - 0.44)/(n + 0.12) (Gringorten’s method)

Returns:

Empirical cumulative distribution function

Return type:

array

Notes

Gumbel recommended the following quantile formulation Pi = i/(n+1). This formulation produces a symmetrical CDF in the sense that the same plotting positions will result from the data regardless of whether they are assembled in ascending or descending order.

Jenkinson’s/Beard’s method is based on the “idea that a natural estimate for the plotting position is the median of its probability density distribution”.

A more sophisticated formulation Pi = (i-0.3)/(n+0.4) approximates the median of the distribution free estimate of the sample variate to about 0.1% and, even for small values of n, produces parameter estimations comparable to the result obtained by maximum likelihood estimations (Bury, 1999, p43)

The probability corresponding to the unbiased plotting position can be approximated by the Gringorten formula in the case of type 1 Extreme value distribution.

References

  1. Plotting positions, About plotting positions

qats.stats.gumbel#

Gumbel class and functions related to Gumbel distribution.

Classes and functions overview

Gumbel(loc, scale[, data])

The Gumbel maxima distribution.

bootstrap(loc, scale, size, repetitions[, ...])

Quantify mean and coefficient of variation of Gumbel distribution parameters using parametric bootstrapping

lse(x)

Fit Gumbel distribution parameters to sample by method of least square fit to empirical cdf

mle(x)

Fit distribution parameters to sample by maximum likelihood estimation

msm(x)

Fit Gumbel distribution parameters to sample by method of sample moments

plot_fits(data[, filename, methods])

Plot data sample versus empirical and fitted cumulative distribution function on linearized Gumbel scales

pwm(x)

Fit Gumbel distribution parameters to sample by method of probability weighted moments [7].

Class API

class Gumbel(loc, scale, data=None)#

The Gumbel maxima distribution.

The cumulative distribution function is defined as:

F(x) = exp{-exp[-(x-a)/b]}

where a is location parameter and b is the scale parameter.

Parameters:
  • loc (float) – Gumbel location parameter.

  • scale (float) – Gumbel scale parameter.

  • data (array_like, optional) – Sample data, used to establish empirical cdf and is included in plots. To fit the Gumbel distribution to the sample data, use Gumbel.fit().

Attributes:
  • loc (float) – Gumbel location parameter.

  • scale (float) – Gumbel scale parameter.

  • data (array_like) – Sample data.

Examples

To initiate an instance based on parameters, use:

>>> from qats.stats.gumbel import Gumbel
>>> gumb = Gumbel(loc, scale)

If you need to establish a Gumbel instance based on a sample data set, use:

>>> gumb = Gumbel.fit(data, method='msm')

References

  1. Statistical models in applied science., Bury, K.V. (1975), Wiley, New York

  2. Bruk av asymptotiske ekstremverdifordelinger, Haver, S. (2007)

  3. Plotting positions, About plotting positions

  4. Usable estimators for parameters in Gumbel distribution

  5. Bootstrapping statistics

  6. Probability weighted moments, Greenwood, J. A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979, Water Resources Research. 15(5): 1049-1054.

  7. Probability weighted moments compared with some traditional techniques in estimating gumbel parameters and quantiles., Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979., Water Resources Research. 15(5): 1063-1064.

Properties

cov

Distribution coefficient of variation (C.O.V.)

ecdf

Median rank empirical cumulative distribution function associated with the sample

kurt

Distribution kurtosis

mean

Distribution mean value

median

Distribution median value

mode

Distribution mode value

mse

Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution

params

Distribution parameters.

skew

Distribution skewness

std

Distribution standard deviation

Methods

cdf([x])

Cumulative distribution function (cumulative probability) for specified values x

fit(data[, method, verbose])

Determine distribution parameters by fit to sample.

fit_from_weibull_parameters(wa, wb, wc, n[, ...])

Calculate Gumbel distribution parameters from n independent Weibull distributed variables.

invcdf([p])

Inverse cumulative distribution function for specified probabilities

pdf([x])

Probability density function for specified values x

plot([filename])

Plot cumulative distribution function

plot_linear([filename])

Plot cumulative distribution function on linearized Gumbel scales

rnd([size, seed])

Draw random samples from probability distribution

property cov#

Distribution coefficient of variation (C.O.V.)

Returns:

Distribution c.o.v.

Return type:

float

property ecdf#

Median rank empirical cumulative distribution function associated with the sample

Returns:

Empirical cumulative distribution function

Return type:

array

Notes

Requires data/sample to be specified.

Gumbel recommended the following mean rank quantile formulation Pi = i/(n+1). This formulation produces a symmetrical CDF in the sense that the same plotting positions will result from the data regardless of whether they are assembled in ascending or descending order.

A more sophisticated median rank formulation Pi = (i-0.3)/(n+0.4) approximates the median of the distribution free estimate of the sample variate to about 0.1% and, even for small values of n, produces parameter estimations comparable to the result obtained by maximum likelihood estimations (Bury, 1999, p43). A median rank method, pi=(i-0.3)/(n+0.4), is chosen to approximate the mean of the distribution [2].

The empirical cdf is also used as plotting positions when plotting the sample on probability paper.

property kurt#

Distribution kurtosis

Returns:

Distribution kurtosis

Return type:

float

property mean#

Distribution mean value

Returns:

Distribution mean value

Return type:

float

property median#

Distribution median value

Returns:

Distribution median value

Return type:

float

property mode#

Distribution mode value

Returns:

Distribution mode value

Return type:

float

property mse#

Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution

Returns:

mean squared error

Return type:

float

Notes

Requires data/sample to be specified.

property params#

Distribution parameters.

Returns:

Distribution parameters: (loc, scale).

Return type:

tuple

property std#

Distribution standard deviation

Returns:

Distribution standard deviation

Return type:

float

property skew#

Distribution skewness

Returns:

Distribution skewness

Return type:

float

Notes

zetac is the complementary Riemann zeta function (zeta function minus 1). See http://docs.scipy.org/doc/scipy/reference/generated/scipy.special.zetac.html

cdf(x=None)#

Cumulative distribution function (cumulative probability) for specified values x

Parameters:

x (array_like, optional) – Calculate cumulative probability for these values

Returns:

Cumulative probabilities for specified values x

Return type:

array

Notes

A range of x values [loc, loc+3*std] are applied if x is not specified.

classmethod fit(data, method='msm', verbose=False)#

Determine distribution parameters by fit to sample.

Parameters:
  • data (array_like) – Sample

  • method (str, optional) –

    Method of fit. Options:

    • msm = method of sample moments (default)

    • lse = least-square estimation

    • mle = maximum likelihood estimation

    • pwm = probability weighted moments

  • verbose (bool, optional) – If true, fitted parameters are written to screen.

Examples

Assuming data is a sample array/list:

>>> from qats.stats.gumbel import Gumbel
>>> gumb = Gumbel.fit(data, method="msm")
classmethod fit_from_weibull_parameters(wa, wb, wc, n, verbose=False)#

Calculate Gumbel distribution parameters from n independent Weibull distributed variables.

Parameters:
  • wa (float) – Weibull loc parameter

  • wb (float) – Weibull scale parameter

  • wc (float) – Weibull shape parameter

  • n (int) – Number independently distributed variables

  • verbose (bool) – Print fitted parameters

Notes

A warning is issued if Weibull shape parameter less than 1. In this case, the convergence towards asymptotic extreme value distribution is slow , and the asymptotic distribution will be non-conservative relative to the exact distribution. The asymptotic distribution is correct with Weibull shape equal to 1 and conservative with Weibull shape larger than 1. These deviations diminish with larger samples. See [1, p. 380].

References

  1. Bury, Karl V., 1975, “Statistical Models in Applied Science”, University of British Columbia, John Wiley & Sons

Examples

>>> from qats.stats.gumbel import Gumbel
>>> gumb = Gumbel.fit_from_weibull_parameters(wa, wb, wc, n)
invcdf(p=None)#

Inverse cumulative distribution function for specified probabilities

Parameters:

p (array_like, optional) – Calculate the inverse cumulative distribution function for these probabilities

Returns:

Values corresponding to the specified quantiles

Return type:

array

Notes

A range of quantiles from 0.001 to 0.999 are applied if quantiles are not specified

pdf(x=None)#

Probability density function for specified values x

Parameters:

x (array_like, optional) – Cumulative probabilities for specified values x

Returns:

Calculate probability density for these values x

Return type:

array

Notes

A range of x values [loc, loc+3*std] are applied if x is not specified.

plot(filename=None)#

Plot cumulative distribution function

Parameters:

filename (str, optional) – Save plot as filename, default is to show plot on screen

plot_linear(filename=None)#

Plot cumulative distribution function on linearized Gumbel scales

Parameters:

filename (str, optional) – Save plot as filename, default is to show plot on screen

rnd(size=None, seed=None)#

Draw random samples from probability distribution

Parameters:
  • size (int|numpy shape, optional) – Sample size (default 1 random value is returned)

  • seed (int, optional) – Seed for random number generator (default seed is random)

Returns:

Random sample

Return type:

array

Examples

Pick 1000 values randomly from a Gumbel distribution

>>> from qats.stats.gumbel import Gumbel
>>> g = Gumbel(loc, scale)
>>> sample = g.rnd(size=1000)

If you want to preset the seed for the random sampling (to be able to repeat the sampling)

>>> from qats.stats.gumbel import Gumbel
>>> g = Gumbel(loc, scale)
>>> sample = g.rnd(size=1000, seed=3)

Functions API

bootstrap(loc, scale, size, repetitions, method='pwm')#

Quantify mean and coefficient of variation of Gumbel distribution parameters using parametric bootstrapping

Parameters:
  • loc (float) – Source distribution location parameter

  • scale (float) – Source distribution scale parameter

  • size (int) – Size of bootstrapped sample

  • method (str, optional) – method of fit, optional ‘msm’ = method of sample moments ‘lse’ = least-square estimation ‘mle’ = maximum likelihood estimation ‘pwm’ = probability weighted moments (default)

  • repetitions (int, optional) – Number of bootstrap samples. default equal to 100

Returns:

  • array – Mean distribution parameters

  • array – Coefficient of variation of distribution parameter

Notes

In statistics, bootstrapping is a method for assigning measures of accuracy to sample estimates (variance, quantiles). This technique allows estimation of the sampling distribution of almost any statistic using only very simple methods. Generally, it falls in the broader class of resampling methods. In this case a parametric model is fitted to the data, and samples of random numbers with the same size as the original data, are drawn from this fitted model. Then the quantity, or estimate, of interest is calculated from these data. This sampling process is repeated many times as for other bootstrap methods. If the results really matter, as many samples as is reasonable, given available computing power and time, should be used. Increasing the number of samples cannot increase the amount of information in the original data, it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself. See [5] about bootstrapping.

Examples

To quantify the uncertainty (coefficient of variation) of a Gumbel distribution fitted to a sample with 5 values (using 100 repetition):

>>> from qats.stats.gumbel import bootstrap
>>> m, cv = bootstrap(10, 2.5, 5, 100)
lse(x)#

Fit Gumbel distribution parameters to sample by method of least square fit to empirical cdf

Parameters:

x (array_like) – data sample

Returns:

distribution loc and scale parameters

Return type:

floats

Notes

Uses an approximate median rank estimate for the empirical cdf.

mle(x)#

Fit distribution parameters to sample by maximum likelihood estimation

Parameters:

x (array_like) – data sample

Returns:

distribution loc and scale parameters

Return type:

floats

Notes

MLE equation set is given in ‘Statistical Distributions’ by Forbes et.al. (2010) and referred at [4]

msm(x)#

Fit Gumbel distribution parameters to sample by method of sample moments

Parameters:

x (array_like) – data sample

Returns:

distribution loc and scale parameters

Return type:

floats

Notes

See description in [1] and [2].

plot_fits(data, filename=None, methods=None)#

Plot data sample versus empirical and fitted cumulative distribution function on linearized Gumbel scales

Parameters:
  • data (array_like) – Data sample

  • filename (str, optional) – Save plot as filename, default is to show plot on sc

  • methods (tuple, optional) –

    Methods of fit. Options (default all):

    • msm = method of sample moments

    • lse = least-square estimation

    • mle = maximum likelihood estimation

    • pwm = probability weighted moments

pwm(x)#

Fit Gumbel distribution parameters to sample by method of probability weighted moments [7].

Parameters:

x (array_like) – data sample

Returns:

distribution parameters location and scale

Return type:

tuple

qats.stats.gumbelmin#

GumbelMin class and functions related to Gumbel (minima) distribution.

Classes and functions overview

GumbelMin([loc, scale, data])

The Gumbel minima distribution.

lse(x)

Fit distribution parameters to sample by method of least square fit to empirical cdf

mle(x)

Fit distribution parameters to sample by maximum likelihood estimation

msm(x)

Fit distribution parameters to sample by method of sample moments

Class API

class GumbelMin(loc=None, scale=None, data=None)#

The Gumbel minima distribution.

The cumulative distribution function is defined as:

F(x) = 1 - exp{-exp[(x-a)/b]}

where a is location parameter and b is the scale parameter.

Parameters:
  • loc (float) – Gumbel location parameter.

  • scale (float) – Gumbel scale parameter.

  • data (array_like, optional) – Sample data, used to establish empirical cdf and is included in plots. To fit the Gumbel distribution to the sample data, use GumbelMin.fit().

Attributes:
  • loc (float) – Gumbel location parameter.

  • scale (float) – Gumbel scale parameter.

  • data (array_like) – Sample data.

Examples

To initiate an instance based on parameters, use:

>>> from qats.stats.gumbelmin import GumbelMin
>>> gumb = GumbelMin(loc, scale)

If you need to establish a Gumbel instance based on a sample data set, use:

>>> gumb = GumbelMin.fit(data, method='msm')

References

  1. Bury, K.V. (1975) Statistical models in applied science. Wiley, New York

  2. Haver, S. (2007), “Bruk av asymptotiske ekstremverdifordelinger”

  3. Plotting positions, About plotting positions

  4. Usable estimators for parameters in Gumbel distribution

  5. Bootstrapping statistics

Properties

cov

Distribution coefficient of variation (C.O.V.)

ecdf

Median rank empirical cumulative distribution function associated with the sample

kurt

Distribution kurtosis

mean

Distribution mean value

median

Distribution median value

mode

Distribution mode value

skew

Distribution skewness

std

Distribution standard deviation

Methods

bootstrap([size, method, N])

Parametric bootstrapping of source distribution

cdf([x])

Cumulative distribution function (cumulative probability) for specified values x

fit([data, method, verbose])

Determine distribution parameters by fit to sample.

fit_from_weibull_parameters(wa, wb, wc, n[, ...])

Calculate Gumbel distribution parameters from n independent Weibull distributed variables.

gp_plot([showfig, save])

Plot data on Gumbel paper (linearized scales))

invcdf([p])

Inverse cumulative distribution function for specified quantiles p

pdf([x])

Probability density function for specified values x

plot([showfig, save])

Plot data on regular scales

rnd([size, seed])

Draw random samples from probability distribution

property cov#

Distribution coefficient of variation (C.O.V.)

Returns:

c – distribution c.o.v.

Return type:

float

property ecdf#

Median rank empirical cumulative distribution function associated with the sample

Notes

Gumbel recommended the following mean rank quantile formulation Pi = i/(n+1). This formulation produces a symmetrical CDF in the sense that the same plotting positions will result from the data regardless of whether they are assembled in ascending or descending order.

A more sophisticated median rank formulation Pi = (i-0.3)/(n+0.4) approximates the median of the distribution free estimate of the sample variate to about 0.1% and, even for small values of n, produces parameter estimations comparable to the result obtained by maximum likelihood estimations (Bury, 1999, p43) A median rank method, pi=(i-0.3)/(n+0.4), is chosen to approximate the mean of the distribution [2]

The empirical cdf is also used as plotting positions when plotting the sample on probability paper.

property kurt#

Distribution kurtosis

Returns:

k – distribution kurtosis

Return type:

float

property mean#

Distribution mean value

Returns:

m – distribution mean value

Return type:

float

property median#

Distribution median value

Returns:

m – distribution median value

Return type:

float

property mode#

Distribution mode value

Returns:

m – distribution mode value

Return type:

float

property std#

Distribution standard deviation

Returns:

s – distribution standard deviation

Return type:

float

property skew#

Distribution skewness

Returns:

s – distribution skewness

Return type:

float

bootstrap(size=None, method='msm', N=100)#

Parametric bootstrapping of source distribution

Parameters:
  • size (int) – bootstrap sample size. default equal to source sample size

  • method ({'msm','lse','mle'}) – method of fit, optional ‘msm’ = method of sample moments ‘lse’ = least-square estimation ‘mle’ = maximum likelihood estimation

  • N (int) – number of bootstrap samples. default equal to 10

Returns:

  • array-like – m - mean distribution parameters

  • array_like – cv - coefficient of variation of distribution parameter

Notes

In statistics, bootstrapping is a method for assigning measures of accuracy to sample estimates (variance,quantiles). This technique allows estimation of the sampling distribution of almost any statistic using only very simple methods. Generally, it falls in the broader class of resampling methods. In this case a parametric model is fitted to the data, and samples of random numbers with the same size as the original data, are drawn from this fitted model. Then the quantity, or estimate, of interest is calculated from these data. This sampling process is repeated many times as for other bootstrap methods. If the results really matter, as many samples as is reasonable, given available computing power and time, should be used. Increasing the number of samples cannot increase the amount of information in the original data, it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself. See [5] about bootstrapping.

cdf(x=None)#

Cumulative distribution function (cumulative probability) for specified values x

Parameters:

x (array_like) – values

Returns:

cdf – cumulative probabilities for specified values x

Return type:

array

Notes

A range of x values [location, location+3*std] are applied if x is not specified.

fit(data=None, method='msm', verbose=False)#

Determine distribution parameters by fit to sample.

Parameters:
  • data (array_like) – sample, optional

  • method ({'msm','lse','mle'}) – method of fit, optional ‘msm’ = method of sample moments ‘lse’ = least-square estimation ‘mle’ = maximum likelihood estimation

  • verbose (bool) – turn on output of fitted parameters

Notes

If data is not input any data stored in object (self.data) will be used.

fit_from_weibull_parameters(wa, wb, wc, n, verbose=False)#

Calculate Gumbel distribution parameters from n independent Weibull distributed variables.

Parameters:
  • wa (float) – Weibull location parameter

  • wb (float) – Weibull scale parameter

  • wc (float) – Weibull shape parameter

  • n (int) – Number independently distributed variables

  • verbose (bool) – print fitted parameters

Notes

A warning is issued if Weibull shape parameter less than 1. In this case, the convergence towards asymptotic extreme value distribution is slow , and the asymptotic distribution will be non-conservative relative to the exact distribution. The asymptotic distribution is correct with Weibull shape equal to 1 and conservative with Weibull shape larger than 1. These deviations diminish with larger samples. See [1, p. 380].

gp_plot(showfig=True, save=None)#

Plot data on Gumbel paper (linearized scales))

Parameters:
  • showfig (bool) – show figure immediately on screen, default True

  • save (filename) – save figure to file, default None

invcdf(p=None)#

Inverse cumulative distribution function for specified quantiles p

Parameters:

p (array_like) – quantiles (or. cumulative probabilities if you like)

Returns:

x – values corresponding to the specified quantiles

Return type:

array

Notes

A range of quantiles from 0.001 to 0.999 are applied if quantiles are not specified

pdf(x=None)#

Probability density function for specified values x

Parameters:

x (array_like) – values

Returns:

pdf – probability density function for specified values x

Return type:

array

Notes

A range of x values [location, location+3*std] are applied if x is not specified.

plot(showfig=True, save=None)#

Plot data on regular scales

Parameters:
  • showfig (bool) – show figure immediately on screen, default True

  • save (filename including suffix) – save figure to file, default None

rnd(size=None, seed=None)#

Draw random samples from probability distribution

Parameters:
  • size (int|numpy shape, optional) – sample size (default 1 random value is returned)

  • seed (int) – seed for random number generator (default seed is random)

Returns:

x – random sample

Return type:

array

Functions API

lse(x)#

Fit distribution parameters to sample by method of least square fit to empirical cdf

Parameters:

x (array_like) – sample

Notes

Uses an approximate median rank estimate for the empirical cdf.

mle(x)#

Fit distribution parameters to sample by maximum likelihood estimation

Parameters:

x (array_like) – sample

Notes

MLE equation set is given in ‘Statistical Distributions’ by Forbes et.al. (2010) and referred at [4]

msm(x)#

Fit distribution parameters to sample by method of sample moments

Parameters:

x (array_like) – sample

Notes

See description in [1] and [2].

qats.stats.weibull#

Weibull class and functions related to Weibull distribution.

Classes and functions overview

Weibull(loc, scale, shape[, data])

The Weibull class offers miscellaneous functions for working with the Weibull distribution, defined as (cumulative distribution function).

bootstrap(loc, scale, shape, size, repetitions)

Quantify mean and coefficient of variation of Weibull distribution parameters using parametric bootstrapping

lse(x[, threshold])

Fit Weibull distribution parameters to sample by method of least square fit to empirical cdf.

mle(x)

Fit Weibull distribution parameters to sample by maximum likelihood estimation

mlj(sample, l, j)

Probability weighted moment Mljk of observation order l, order of cdf j, with emphasize on the right/upper tail (k=0).

msm(x)

Fit Weibull distribution parameters to sample by method of sample moments

plot_fit(x, params[, path])

Plot data sample versus empirical and fitted cumulative distribution function on linearized Weibull scales

pwm(x)

Fit distribution parameters to sample by method of probability weighted moments

pwm2(x)

Fit distribution parameters to sample by method of probability weighted moments assuming the location parameter is zero.

weibull2gumbel(loc, scale, shape, n)

Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent,Weibull distributed variables.

Class API

class Weibull(loc, scale, shape, data=None)#

The Weibull class offers miscellaneous functions for working with the Weibull distribution, defined as (cumulative distribution function):

F(x) = 1 - exp{-[(x-a)/b]^c}

where a is location parameter, b is scale parameter and c is shape parameter.

Parameters:
  • loc (float) – Weibull location parameter.

  • scale (float) – Weibull scale parameter.

  • shape (float) – Weibull shape parameter.

  • data (array_like, optional) – Sample data, used to establish empirical cdf and is included in plots. To fit the Weibull distribution to the sample data, use Weibull.fit().

Attributes:
  • loc (float) – Weibull location parameter.

  • scale (float) – Weibull scale parameter.

  • shape (float) – Weibull shape parameter.

  • data (array_like) – Sample data. Exists only if distribution parameters are estimated from a sample.

Notes

For a Weibull 2-parameter distribution, specify location parameter 0 (zero).

Examples

To initiate an instance based on parameters, use:

>>> from qats.stats.weibull import Weibull
>>> weib = Weibull(loc, scale, shape)

If you need to establish a Weibull instance based on a sample data set, use:

>>> weib = Weibull.fit(data, method='pwm')

References

  1. Moment estimators for Weibull parameters and their asymptotic efficiencies, Waloddi Weibull, April 1969, Lausanne Switzerland, Technical report AFML-TR-69-135

  2. Continuous univariate distributions, Volume 1, N.L.Johnson, S.Kotz and N.Balakrishnan, 1994, John Wiley and sons inc.

  3. weibull.com, About location parameter

  4. Plotting positions, About plotting positions

  5. Bootstrapping, Bootstrapping statistics

  6. Estimation of the generalized extreme value distribution by the method of probability weighted moments, Hosking, J. R. M., Wallis, J. R. and Wood, E. F., 1985, Technometrics, 27, pp. 251-261

  7. Estimating the three-parameter Weibull distribution by the method of probability weighted moments with application to medical survival data, Bortolucci, A. A. et.al.

  8. Theory and derivation for Weibull parameter probability weighted moment estimators, Grender, J.M., Dell, T.R., Reich, R.M., 1991 United Sates Department of Agriculture

  9. Probability weighted moments, Greenwood, J. A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979, Water Resources Research. 15(5): 1049-1054.

  10. Probability weighted moments compared with some traditional techniques in estimating gumbel parameters and quantiles., Landwehr, J.M.; Matalas, N.C.; Wallis, J.R., 1979., Water Resources Research. 15(5): 1063-1064.

Properties

cov

Distribution coefficient of variation (C.O.V.)

ecdf

Empirical cumulative distribution function associated with the sample.

kurt

Distribution kurtosis.

mean

Distribution mean value

mse

Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution

params

Distribution parameters.

skew

Distribution skewness

std

Distribution standard deviation

Methods

cdf([x])

Cumulative distribution function (cumulative probability) for specified values x

fit(data[, method, verbose])

Establish Weibull class instance by fit to sample.

fromsignal(x[, method, verbose])

Establish Weibull class instance by fit to global maxima from time series signal.

gumbel_parameters([n])

Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent, Weibull distributed variables.

invcdf([p])

Inverse cumulative distribution function for specified quantiles p

pdf([x])

Probability density function for specified values x

plot([filename])

Plot data on regular scales

plot_linear([filename])

Plot data on Weibull paper (linearized scales))

rnd([size, seed])

Draw random samples from probability distribution

property cov#

Distribution coefficient of variation (C.O.V.)

Returns:

distribution c.o.v.

Return type:

float

property ecdf#

Empirical cumulative distribution function associated with the sample.

Returns:

Empirical cumulative distribution function.

Return type:

array

Notes

A mean rank method is chosen to approximate the mean of the distribution [2].

The empirical cdf is also used as plotting positions when plotting the sample on probability paper.

property kurt#

Distribution kurtosis.

Returns:

distribution kurtosis

Return type:

float

property mean#

Distribution mean value

Returns:

distribution mean value

Return type:

float

property mse#

Mean squared error of fitted cumulative distribution (a,b,c) and empirical distribution

Returns:

mean squared error

Return type:

float

property params#

Distribution parameters.

Returns:

Distribution parameters: (loc, scale, shape).

Return type:

tuple

property skew#

Distribution skewness

Returns:

distribution skewness

Return type:

float

property std#

Distribution standard deviation

Returns:

distribution standard deviation

Return type:

float

gumbel_parameters(n=None)#

Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent, Weibull distributed variables.

Parameters:

n (int) – number of independent weibull distributed variables, default equal to number of peaks (self.data.size)

Returns:

Gumbel location and scale parameters

Return type:

tuple

Notes

If the sample x is based on lets say a 30-hour simulation but you seek an estimate of the e.g. 3-hour extreme value then n should be calculated as the nearest integer to:

n = (3 / 30) * nx

where nx is the total number of maxima during 30 hour.

References

  1. Bury, K.V. (1975), “Statistical models in applied science”

cdf(x=None)#

Cumulative distribution function (cumulative probability) for specified values x

Parameters:

x (array_like) – values

Returns:

cumulative probabilities for specified values x

Return type:

array

Notes

A range of x values are applied if x is not specified.

classmethod fit(data, method='msm', verbose=False)#

Establish Weibull class instance by fit to sample.

Parameters:
  • data (array_like) – Sample.

  • method (str, optional) –

    Method of fit. Available options:

    • msm = method of sample moments (default)

    • lse = least-square estimation

    • mle = maximum likelihood estimation

    • pwm = probability weighted moments

    • pwm2 = probability weighted moments, 2-parameter distribution

  • verbose (bool) – If True, fitted parameters are printed to screen.

Returns:

Weibull class instance

Return type:

Weibull

Examples

Assuming data is a sample array/list:

>>> from qats.stats.weibull import Weibull
>>> weib = Weibull.fit(data, method="msm")
classmethod fromsignal(x, method='msm', verbose=False)#

Establish Weibull class instance by fit to global maxima from time series signal.

Parameters:
  • x (array_like) – Time series signal.

  • method (str, optional) – Method of fit. See Weibull.fit() for description of options.

  • verbose (bool, optional) – If True, fitted parameters are printed to screen.

Returns:

Class instance.

Return type:

Weibull

See also

Weibull.fit, qats.stats.find_maxima

Examples

Assuming x is a time series signal:

>>> from qats.stats.weibull import Weibull
>>> weib = Weibull.fromsignal(x, method='msm')

Note that the example above is equivalent to:

>>> from qats.signal import find_maxima
>>> sample, _ = find_maxima(x, local=False, threshold=None, up=True)
>>> weib = Weibull.fit(sample, method='msm')
invcdf(p=None)#

Inverse cumulative distribution function for specified quantiles p

Parameters:

p (array_like) – quantiles (or. cumulative probabilities if you like)

Returns:

values corresponding to the specified quantiles

Return type:

array

Notes

A range of quantiles from 0 to 1 are applied if quantiles are not specified

pdf(x=None)#

Probability density function for specified values x

Parameters:

x (array_like) – values

Returns:

probability density function for specified values x

Return type:

array

Notes

A range of x values are applied if x is not specified.

plot(filename=None)#

Plot data on regular scales

Parameters:

filename (str, optional) – Save plot as filename, default is to show plot on screen

Examples

Plot distribution and show the figure

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot()

Plot distribution and save the figure as png

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot(filename="plot.png")
plot_linear(filename=None)#

Plot data on Weibull paper (linearized scales))

Parameters:

filename (str, optional) – Save plot as filename, default is to show plot on screen

Examples

Plot distribution and show the figure

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot_linear()

Plot distribution and save the figure as png

>>> from qats.stats.weibull import Weibull
>>> distribution = Weibull(100., 15., 2.5)
>>> distribution.plot_linear(filename="plot.png")

References

  1. Continuous univariate distributions, Volume 1, N.L.Johnson, S.Kotz and N.Balakrishnan, 1994, John Wiley and sons inc.

rnd(size=None, seed=None)#

Draw random samples from probability distribution

Parameters:
  • size (int|numpy shape, optional) – sample size (default 1 random value is returned)

  • seed (int) – seed for random number generator (default seed is random)

Returns:

random sample

Return type:

array

Functions API

bootstrap(loc, scale, shape, size, repetitions, method='pwm')#

Quantify mean and coefficient of variation of Weibull distribution parameters using parametric bootstrapping

Parameters:
  • loc (float) – Source distribution location parameter

  • scale (float) – Source distribution scale parameter

  • shape (float) – Source distribution shape parameter

  • size (int) – Size of bootstrapped sample

  • method (str, optional) –

    Method of fit. Available options:

    • msm = method of sample moments

    • lse = least-square estimation

    • mle = maximum likelihood estimation

    • pwm = probability weighted moments (default)

  • repetitions (int, optional) – Number of bootstrap samples. default equal to 100

Returns:

  • array – Mean distribution parameters

  • array – Coefficient of variation of distribution parameter

Notes

In statistics, bootstrapping is a method for assigning measures of accuracy to sample estimates (variance,quantiles). This technique allows estimation of the sampling distribution of almost any statistic using only very simple methods. Generally, it falls in the broader class of resampling methods. In this case a parametric model is fitted to the data, and samples of random numbers with the same size as the original data, are drawn from this fitted model. Then the quantity, or estimate, of interest is calculated from these data. This sampling process is repeated many times as for other bootstrap methods. If the results really matter, as many samples as is reasonable, given available computing power and time, should be used. Increasing the number of samples cannot increase the amount of information in the original data, it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself. See [5] about bootstrapping.

Examples

To quantify the uncertainty (coefficient of variation) of a Weibull distribution fitted to a sample with 5 values (using 100 repetition):

>>> from qats.stats.weibull import bootstrap
>>> m, cv = bootstrap(10., 5., 2.5, 5, 100)
lse(x, threshold: float | None = None)#

Fit Weibull distribution parameters to sample by method of least square fit to empirical cdf.

Parameters:
  • x (array_like) – sample data

  • threshold (float, optional) – Fit distribution to data points above this threshold. The threshold is defined as value <0, 1> in the empirical CDF. So with threshold=0.87 the distribution is fitted to the values exceeding the 0.87-quantile of the empirical cumulative distribution function.

Returns:

Distribution parameters (loc, scale, shape).

Return type:

tuple (floats)

Notes

Uses what are known as (approximate) mean rank estimates for the empirical cdf.

mle(x)#

Fit Weibull distribution parameters to sample by maximum likelihood estimation

Parameters:

x (array_like) – sample data

Returns:

Distribution parameters (loc, scale, shape).

Return type:

tuple (floats)

mlj(sample, l, j)#

Probability weighted moment Mljk of observation order l, order of cdf j, with emphasize on the right/upper tail (k=0).

Parameters:
  • sample (array) – Sample.

  • l (int) – Order of observation (sample).

  • j (int) – Order of cumulative distribution function.

Returns:

Probability weighted moment

Return type:

float

Notes

The probability weighted moment Mljk is defined by Greenwood and others (1979) as:

M_{l,j,k} = E[X^l * F^j * (1-F)^k]

, where X(F) is the inverse form of the distribution and F is the cumulative distribution function. When j=k=0 and l is a non-negative integer, then M_{l,0,0} represents the conventional moment of order l about the origin.

PWMs can be applied either when the small observations are more important than the large observations (k=0), as in strength properties of materials, or when the large observations should have more influence than the smaller observations (k=0) as with three diameter distribution modelling. Here we have chosen the latter and derived unbiased estimators for moments M_{l,j,0}(k=0), see eq. 32 in [8]:

M_{l,j,0} = (1 / n) * sum(x[i]^l * binom(i-1, j) / binom(n-1, j))

where i is a counter from j+1 to n and binom() is the binomial coefficient.

msm(x)#

Fit Weibull distribution parameters to sample by method of sample moments

Parameters:

x (array_like) – sample data

Returns:

The loc, scale and shape distribution parameters

Return type:

floats

Notes

See description in [1].

plot_fit(x: ndarray, params: tuple, path: str | None = None)#

Plot data sample versus empirical and fitted cumulative distribution function on linearized Weibull scales

Parameters:
  • x (array_like) – Data sample

  • params (tuple) – location, scale and shape parameter of the Weibull distribution

  • path (str, optional) – Save figure to file instead of displaying it.

pwm(x)#

Fit distribution parameters to sample by method of probability weighted moments

Parameters:

x (array_like) – sample data

Returns:

The loc, scale and shape distribution parameters

Return type:

floats

Notes

Details on probability weighted moments are provided in [8].

See also

mlj

pwm2(x)#

Fit distribution parameters to sample by method of probability weighted moments assuming the location parameter is zero.

Parameters:

x (array_like) – sample data

Returns:

The scale and shape distribution parameters

Return type:

floats

Notes

Details on probability weighted moments are provided on p.14-15 in [8]. Note that only the scale and parameters are estimated, the location parameter is assumed zero.

See also

mlj, pwm

weibull2gumbel(loc, scale, shape, n)#

Calculate parameters of the asymptotic Gumbel extreme value distribution (Type 1) for the extreme value of N independent,Weibull distributed variables.

Parameters:
  • loc (float) – Weibull distribution location parameter

  • scale (float) – Weibull distribution scale parameter

  • shape (float) – Weibull distribution shape parameter

  • n (int) – Number of independent weibull distributed variables

Returns:

Gumbel location and scale parameters

Return type:

tuple

Notes

If the sample x is based on lets say a 30-hour simulation but you seek an estimate of the e.g. 3-hour extreme value then n should be calculated as the nearest integer to:

n = (3 / 30) * nx

where nx is the total number of maxima during 30 hour.

References

  1. Bury, K.V. (1975), “Statistical models in applied science”