Class Distribution
source code
Holds a distribution of the values f(x) associated with a variable x.
A Distribution is a histogram-like object that is a dictionary of
samples. Each sample is an x:f(x) pair, where x is called the bin
and f(x) is called the value(). Each bin's value is typically
maintained as the sum of all the values that have been placed into
it.
The bin axis is continuous, and can represent a continuous
quantity without discretization. Alternatively, this class can be
used as a traditional histogram by either discretizing the bin
number before adding each sample, or by binning the values in the
final Distribution.
Distributions are bounded by the specified axis_bounds, and can
either be cyclic (like directions or hues) or non-cyclic. For
cyclic distributions, samples provided outside the axis_bounds
will be wrapped back into the bound range, as is appropriate for
quantities like directions. For non-cyclic distributions,
providing samples outside the axis_bounds will result in a
ValueError.
In addition to the values, can also return the counts, i.e., the
number of times that a sample has been added with the given bin.
Not all instances of this class will be a true distribution in the
mathematical sense; e.g. the values will have to be normalized
before they can be considered a probability distribution.
If keep_peak=True, the value stored in each bin will be the
maximum of all values ever added, instead of the sum. The
distribution will thus be a record of the maximum value
seen at each bin, also known as an envelope.
|
|
__init__(self,
axis_bounds=(0.0, 6.28318530718),
cyclic=False,
keep_peak=False)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature |
source code
|
|
|
|
__add__(self,
a)
Allows add() method to be used via the '+' operator; i.e.,
Distribution + dictionary does Distribution.add(dictionary). |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
bins(self)
Return a list of bins that have been populated. |
source code
|
|
|
|
add(self,
new_data)
Add a set of new data in the form of a dictionary of (bin,
value) pairs. |
source code
|
|
|
|
max_value_bin(self)
Return the bin with the largest value. |
source code
|
|
|
|
|
|
|
|
|
|
weighted_sum(self)
Return the sum of each value times its bin. |
source code
|
|
|
|
_weighted_average(self)
Return the weighted_sum divided by the sum of the values |
source code
|
|
|
|
|
|
|
|
|
|
|
|
|
value_mag(self,
bin)
Return the value of a single bin as a proportion of total_value. |
source code
|
|
|
|
count_mag(self,
bin)
Return the count of a single bin as a proportion of total_count. |
source code
|
|
|
|
|
|
|
|
|
|
_safe_divide(self,
numerator,
denominator)
Division routine that avoids division-by-zero errors
(returning zero in such cases) but keeps track of them
for undefined_values(). |
source code
|
|
|
Inherited from object:
__delattr__,
__format__,
__getattribute__,
__hash__,
__new__,
__reduce__,
__reduce_ex__,
__repr__,
__setattr__,
__sizeof__,
__str__,
__subclasshook__
|
|
Inherited from object:
__class__
|
__init__(self,
axis_bounds=(0.0, 6.28318530718),
cyclic=False,
keep_peak=False)
(Constructor)
| source code
|
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
- Overrides:
object.__init__
- (inherited documentation)
|
|
Return the value of the specified bin.
(Return None if there is no such bin.)
|
|
Return the count from the specified bin.
(Return None if there is no such bin.)
|
|
Return a list of values.
Various statistics can then be calculated if desired:
sum(vals) (total of all values)
max(vals) (highest value in any bin)
Note that the bin-order of values returned does not necessarily
match that returned by counts().
|
|
Return a list of values.
Various statistics can then be calculated if desired:
sum(counts) (total of all counts)
max(counts) (highest count in any bin)
Note that the bin-order of values returned does not necessarily
match that returned by values().
|
|
Add a set of new data in the form of a dictionary of (bin,
value) pairs. If the bin already exists, the value is added
to the current value. If the bin doesn't exist, one is created
with that value.
Bin numbers outside axis_bounds are allowed for cyclic=True,
but otherwise a ValueError is raised.
If keep_peak=True, the value of the bin is the maximum of the
current value and the supplied value. That is, the bin stores
the peak value seen so far. Note that each call will increase
the total_value and total_count (and thus decrease the
value_mag() and count_mag()) even if the value doesn't happen
to be the maximum seen so far, since each data point still
helps improve the sampling and thus the confidence.
|
|
Return a continuous, interpolated equivalent of the max_value_bin().
For a cyclic distribution, this is the direction of the vector
sum (see vector_sum()).
For a non-cyclic distribution, this is the arithmetic average
of the data on the bin_axis, where each bin is weighted by its
value.
|
|
Return the vector sum of the data as a tuple (magnitude, avgbinnum).
Each bin contributes a vector of length equal to its value, at
a direction corresponding to the bin number. Specifically,
the total bin number range is mapped into a direction range
[0,2pi].
For a cyclic distribution, the avgbinnum will be a continuous
measure analogous to the max_value_bin() of the distribution.
But this quantity has more precision than max_value_bin()
because it is computed from the entire distribution instead of
just the peak bin. However, it is likely to be useful only
for uniform or very dense sampling; with sparse, non-uniform
sampling the estimates will be biased significantly by the
particular samples chosen.
The avgbinnum is not meaningful when the magnitude is 0,
because a zero-length vector has no direction. To find out
whether such cases occurred, you can compare the value of
undefined_vals before and after a series of calls to this
function.
|
Return a measure of the peakedness of the distribution. The
calculation differs depending on whether this is a cyclic
variable. For a cyclic variable, returns the magnitude of the
vector_sum() divided by the sum_value() (see
_vector_selectivity for more details). For a non-cyclic
variable, returns the max_value_bin()) as a proportion of the
sum_value() (see _relative_selectivity for more details).
|
|
Return max_value_bin()) as a proportion of the sum_value().
This quantity is a measure of how strongly the distribution is
biased towards the max_value_bin(). For a smooth,
single-lobed distribution with an inclusive, non-cyclic range,
this quantity is an analog to vector_selectivity. To be a
precise analog for arbitrary distributions, it would need to
compute some measure of the selectivity that works like the
weighted_average() instead of the max_value_bin(). The result
is scaled such that if all bins are identical, the selectivity
is 0.0, and if all bins but one are zero, the selectivity is
1.0.
|
|
Return the magnitude of the vector_sum() divided by the sum_value().
This quantity is a vector-based measure of the peakedness of
the distribution. If only a single bin has a non-zero value(),
the selectivity will be 1.0, and if all bins have the same
value() then the selectivity will be 0.0. Other distributions
will result in intermediate values.
For a distribution with a sum_value() of zero (i.e. all bins
empty), the selectivity is undefined. Assuming that one will
usually be looking for high selectivity, we return zero in such
a case so that high selectivity will not mistakenly be claimed.
To find out whether such cases occurred, you can compare the
value of undefined_values() before and after a series of
calls to this function.
|
|
Convert a bin number to a direction in radians.
Works for NumPy arrays of bin numbers, returning
an array of directions.
|
|
Convert a direction in radians into a bin number.
Works for NumPy arrays of direction, returning
an array of bin numbers.
|
undefined_vals
int(x[, base]) -> integer
Convert a string or number to an integer, if possible. A floating point
argument will be truncated towards zero (this does not include a string
representation of a floating point number!) When converting a string, use
the optional base. It is an error to supply a base when converting a
non-string. If base is zero, the proper base is guessed based on the
string content. If the argument is outside the integer range a
long object will be returned instead.
- Value:
-
|