public class EmpiricalDistribution extends AbstractRealDistribution
Represents an empirical probability distribution -- a probability distribution derived from observed data without making any assumptions about the functional form of the population distribution that the data come from.
An EmpiricalDistribution maintains data structures, called
 distribution digests, that describe empirical distributions and
 support the following operations: 
EmpiricalDistribution to build grouped
 frequency histograms representing the input data or to generate random values
 "like" those in the input file -- i.e., the values generated will follow the
 distribution of the values in the file.
 The implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:
Digesting the input file
binCount "bins."EmpiricalDistribution implements the RealDistribution interface
 as follows.  Given x within the range of values in the dataset, let B
 be the bin containing x and let K be the within-bin kernel for B.  Let P(B-)
 be the sum of the probabilities of the bins below B and let K(B) be the
 mass of B under K (i.e., the integral of the kernel density over B).  Then
 set P(X < x) = P(B-) + P(B) * K(x) / K(B) where K(x) is the kernel distribution
 evaluated at x. This results in a cdf that matches the grouped frequency
 distribution at the bin endpoints and interpolates within bins using
 within-bin kernels.
binCount is set by default to 1000.  A good rule of thumb
    is to set the bin count to approximately the length of the input file divided
    by 10. | Modifier and Type | Field and Description | 
|---|---|
| static int | DEFAULT_BIN_COUNTDefault bin count | 
| protected RandomDataGenerator | randomDataRandomDataGenerator instance to use in repeated calls to getNext() | 
random, SOLVER_DEFAULT_ABSOLUTE_ACCURACY| Constructor and Description | 
|---|
| EmpiricalDistribution()Creates a new EmpiricalDistribution with the default bin count. | 
| EmpiricalDistribution(int binCount)Creates a new EmpiricalDistribution with the specified bin count. | 
| EmpiricalDistribution(int binCount,
                     RandomDataImpl randomData)Deprecated. 
 As of 3.1. Please use  EmpiricalDistribution(int,RandomGenerator)instead. | 
| EmpiricalDistribution(int binCount,
                     RandomGenerator generator)Creates a new EmpiricalDistribution with the specified bin count using the
 provided  RandomGeneratoras the source of random data. | 
| EmpiricalDistribution(RandomDataImpl randomData)Deprecated. 
 As of 3.1. Please use  EmpiricalDistribution(RandomGenerator)instead. | 
| EmpiricalDistribution(RandomGenerator generator)Creates a new EmpiricalDistribution with default bin count using the
 provided  RandomGeneratoras the source of random data. | 
| Modifier and Type | Method and Description | 
|---|---|
| double | cumulativeProbability(double x)For a random variable  Xwhose values are distributed according
 to this distribution, this method returnsP(X <= x). | 
| double | density(double x)Returns the probability density function (PDF) of this distribution
 evaluated at the specified point  x. | 
| int | getBinCount()Returns the number of bins. | 
| List<SummaryStatistics> | getBinStats()Returns a List of  SummaryStatisticsinstances containing
 statistics describing the values in each of the bins. | 
| double[] | getGeneratorUpperBounds()Returns a fresh copy of the array of upper bounds of the subintervals
 of [0,1] used in generating data from the empirical distribution. | 
| protected RealDistribution | getKernel(SummaryStatistics bStats)The within-bin smoothing kernel. | 
| double | getNextValue()Generates a random value from this distribution. | 
| double | getNumericalMean()Use this method to get the numerical value of the mean of this
 distribution. | 
| double | getNumericalVariance()Use this method to get the numerical value of the variance of this
 distribution. | 
| StatisticalSummary | getSampleStats()Returns a  StatisticalSummarydescribing this distribution. | 
| double | getSupportLowerBound()Access the lower bound of the support. | 
| double | getSupportUpperBound()Access the upper bound of the support. | 
| double[] | getUpperBounds()Returns a fresh copy of the array of upper bounds for the bins. | 
| double | inverseCumulativeProbability(double p)Computes the quantile function of this distribution. | 
| boolean | isLoaded()Property indicating whether or not the distribution has been loaded. | 
| boolean | isSupportConnected()Use this method to get information about whether the support is connected,
 i.e. | 
| boolean | isSupportLowerBoundInclusive()Whether or not the lower bound of support is in the domain of the density
 function. | 
| boolean | isSupportUpperBoundInclusive()Whether or not the upper bound of support is in the domain of the density
 function. | 
| void | load(double[] in)Computes the empirical distribution from the provided
 array of numbers. | 
| void | load(File file)Computes the empirical distribution from the input file. | 
| void | load(URL url)Computes the empirical distribution using data read from a URL. | 
| double | probability(double x)For a random variable  Xwhose values are distributed according
 to this distribution, this method returnsP(X = x). | 
| void | reSeed(long seed)Reseeds the random number generator used by  getNextValue(). | 
| void | reseedRandomGenerator(long seed)Reseed the random generator used to generate samples. | 
cumulativeProbability, getSolverAbsoluteAccuracy, logDensity, probability, sample, samplepublic static final int DEFAULT_BIN_COUNT
protected final RandomDataGenerator randomData
public EmpiricalDistribution()
public EmpiricalDistribution(int binCount)
binCount - number of bins. Must be strictly positive.NotStrictlyPositiveException - if binCount <= 0.public EmpiricalDistribution(int binCount,
                     RandomGenerator generator)
RandomGenerator as the source of random data.binCount - number of bins. Must be strictly positive.generator - random data generator (may be null, resulting in default JDK generator)NotStrictlyPositiveException - if binCount <= 0.public EmpiricalDistribution(RandomGenerator generator)
RandomGenerator as the source of random data.generator - random data generator (may be null, resulting in default JDK generator)@Deprecated public EmpiricalDistribution(int binCount, RandomDataImpl randomData)
EmpiricalDistribution(int,RandomGenerator) instead.RandomDataImpl instance as the source of random data.binCount - number of binsrandomData - random data generator (may be null, resulting in default JDK generator)@Deprecated public EmpiricalDistribution(RandomDataImpl randomData)
EmpiricalDistribution(RandomGenerator) instead.RandomDataImpl as the source of random data.randomData - random data generator (may be null, resulting in default JDK generator)public void load(double[] in)
          throws NullArgumentException
in - the input data arrayNullArgumentException - if in is nullpublic void load(URL url) throws IOException, NullArgumentException, ZeroException
The input file must be an ASCII text file containing one valid numeric entry per line.
url - url of the input fileIOException - if an IO error occursNullArgumentException - if url is nullZeroException - if URL contains no datapublic void load(File file) throws IOException, NullArgumentException
The input file must be an ASCII text file containing one valid numeric entry per line.
file - the input fileIOException - if an IO error occursNullArgumentException - if file is nullpublic double getNextValue()
                    throws MathIllegalStateException
MathIllegalStateException - if the distribution has not been loadedpublic StatisticalSummary getSampleStats()
StatisticalSummary describing this distribution.
 Preconditions:IllegalStateException - if the distribution has not been loadedpublic int getBinCount()
public List<SummaryStatistics> getBinStats()
SummaryStatistics instances containing
 statistics describing the values in each of the bins.  The list is
 indexed on the bin number.public double[] getUpperBounds()
Returns a fresh copy of the array of upper bounds for the bins.
 Bins are: 
 [min,upperBounds[0]],(upperBounds[0],upperBounds[1]],...,
  (upperBounds[binCount-2], upperBounds[binCount-1] = max].
Note: In versions 1.0-2.0 of commons-math, this method
 incorrectly returned the array of probability generator upper
 bounds now returned by getGeneratorUpperBounds().
public double[] getGeneratorUpperBounds()
Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution. Subintervals correspond to bins with lengths proportional to bin counts.
Preconditions:In versions 1.0-2.0 of commons-math, this array was (incorrectly) returned
 by getUpperBounds().
NullPointerException - unless a load method has been
 called beforehand.public boolean isLoaded()
public void reSeed(long seed)
getNextValue().seed - random generator seedpublic double probability(double x)
X whose values are distributed according
 to this distribution, this method returns P(X = x). In other
 words, this method represents the probability mass function (PMF)
 for the distribution.probability in interface RealDistributionprobability in class AbstractRealDistributionx - the point at which the PMF is evaluatedpublic double density(double x)
x. In general, the PDF is
 the derivative of the CDF.
 If the derivative does not exist at x, then an appropriate
 replacement should be returned, e.g. Double.POSITIVE_INFINITY,
 Double.NaN, or  the limit inferior or limit superior of the
 difference quotient.
 Returns the kernel density normalized so that its integral over each bin equals the bin mass.
Algorithm description:
x - the point at which the PDF is evaluatedxpublic double cumulativeProbability(double x)
X whose values are distributed according
 to this distribution, this method returns P(X <= x). In other
 words, this method represents the (cumulative) distribution function
 (CDF) for this distribution.
 Algorithm description:
x - the point at which the CDF is evaluatedxpublic double inverseCumulativeProbability(double p)
                                    throws OutOfRangeException
X distributed according to this distribution, the
 returned value is
 inf{x in R | P(X<=x) >= p} for 0 < p <= 1,inf{x in R | P(X<=x) > 0} for p = 0.RealDistribution.getSupportLowerBound() for p = 0,RealDistribution.getSupportUpperBound() for p = 1.Algorithm description:
inverseCumulativeProbability in interface RealDistributioninverseCumulativeProbability in class AbstractRealDistributionp - the cumulative probabilityp-quantile of this distribution
 (largest 0-quantile for p = 0)OutOfRangeException - if p < 0 or p > 1public double getNumericalMean()
Double.NaN if it is not definedpublic double getNumericalVariance()
Double.POSITIVE_INFINITY as
 for certain cases in TDistribution) or Double.NaN if it
 is not definedpublic double getSupportLowerBound()
inverseCumulativeProbability(0). In other words, this
 method must return
 inf {x in R | P(X <= x) > 0}.
Double.NEGATIVE_INFINITY)public double getSupportUpperBound()
inverseCumulativeProbability(1). In other words, this
 method must return
 inf {x in R | P(X <= x) = 1}.
Double.POSITIVE_INFINITY)public boolean isSupportLowerBoundInclusive()
getSupporLowerBound() is finite and
 density(getSupportLowerBound()) returns a non-NaN, non-infinite
 value.public boolean isSupportUpperBoundInclusive()
getSupportUpperBound() is finite and
 density(getSupportUpperBound()) returns a non-NaN, non-infinite
 value.public boolean isSupportConnected()
public void reseedRandomGenerator(long seed)
reseedRandomGenerator in interface RealDistributionreseedRandomGenerator in class AbstractRealDistributionseed - the new seedprotected RealDistribution getKernel(SummaryStatistics bStats)
bStats, unless the bin contains only one
 observation, in which case a constant distribution is returned.bStats - summary statistics for the binCopyright © 2003–2016 The Apache Software Foundation. All rights reserved.