Package topo :: Package misc :: Module distribution :: Class Distribution
[hide private]
[frames] | no frames]

Class Distribution

source code


Holds a distribution of the values f(x) associated with a variable x.

A Distribution is a histogram-like object that is a dictionary of samples. Each sample is an x:f(x) pair, where x is called the bin and f(x) is called the value(). Each bin's value is typically maintained as the sum of all the values that have been placed into it.

The bin axis is continuous, and can represent a continuous quantity without discretization. Alternatively, this class can be used as a traditional histogram by either discretizing the bin number before adding each sample, or by binning the values in the final Distribution.

Distributions are bounded by the specified axis_bounds, and can either be cyclic (like directions or hues) or non-cyclic. For cyclic distributions, samples provided outside the axis_bounds will be wrapped back into the bound range, as is appropriate for quantities like directions. For non-cyclic distributions, providing samples outside the axis_bounds will result in a ValueError.

In addition to the values, can also return the counts, i.e., the number of times that a sample has been added with the given bin.

Not all instances of this class will be a true distribution in the mathematical sense; e.g. the values will have to be normalized before they can be considered a probability distribution.

If keep_peak=True, the value stored in each bin will be the maximum of all values ever added, instead of the sum. The distribution will thus be a record of the maximum value seen at each bin, also known as an envelope.

Instance Methods [hide private]
 
__init__(self, axis_bounds=(0.0, 6.28318530718), cyclic=False, keep_peak=False)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
__add__(self, a)
Allows add() method to be used via the '+' operator; i.e., Distribution + dictionary does Distribution.add(dictionary).
source code
 
get_value(self, bin)
Return the value of the specified bin.
source code
 
get_count(self, bin)
Return the count from the specified bin.
source code
 
values(self)
Return a list of values.
source code
 
counts(self)
Return a list of values.
source code
 
bins(self)
Return a list of bins that have been populated.
source code
 
add(self, new_data)
Add a set of new data in the form of a dictionary of (bin, value) pairs.
source code
 
max_value_bin(self)
Return the bin with the largest value.
source code
 
weighted_average(self)
Return a continuous, interpolated equivalent of the max_value_bin().
source code
 
vector_sum(self)
Return the vector sum of the data as a tuple (magnitude, avgbinnum).
source code
 
weighted_sum(self)
Return the sum of each value times its bin.
source code
 
_weighted_average(self)
Return the weighted_sum divided by the sum of the values
source code
 
selectivity(self)
Return a measure of the peakedness of the distribution.
source code
 
_relative_selectivity(self)
Return max_value_bin()) as a proportion of the sum_value().
source code
 
_vector_selectivity(self)
Return the magnitude of the vector_sum() divided by the sum_value().
source code
 
value_mag(self, bin)
Return the value of a single bin as a proportion of total_value.
source code
 
count_mag(self, bin)
Return the count of a single bin as a proportion of total_count.
source code
 
_bins_to_radians(self, bin)
Convert a bin number to a direction in radians.
source code
 
_radians_to_bins(self, direction)
Convert a direction in radians into a bin number.
source code
 
_safe_divide(self, numerator, denominator)
Division routine that avoids division-by-zero errors (returning zero in such cases) but keeps track of them for undefined_values().
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  undefined_vals = 0
int(x[, base]) -> integer
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, axis_bounds=(0.0, 6.28318530718), cyclic=False, keep_peak=False)
(Constructor)

source code 
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
Overrides: object.__init__
(inherited documentation)

get_value(self, bin)

source code 

Return the value of the specified bin.

(Return None if there is no such bin.)

get_count(self, bin)

source code 

Return the count from the specified bin.

(Return None if there is no such bin.)

values(self)

source code 

Return a list of values.

Various statistics can then be calculated if desired:

sum(vals) (total of all values) max(vals) (highest value in any bin)

Note that the bin-order of values returned does not necessarily match that returned by counts().

counts(self)

source code 

Return a list of values.

Various statistics can then be calculated if desired:

sum(counts) (total of all counts) max(counts) (highest count in any bin)

Note that the bin-order of values returned does not necessarily match that returned by values().

add(self, new_data)

source code 

Add a set of new data in the form of a dictionary of (bin, value) pairs. If the bin already exists, the value is added to the current value. If the bin doesn't exist, one is created with that value.

Bin numbers outside axis_bounds are allowed for cyclic=True, but otherwise a ValueError is raised.

If keep_peak=True, the value of the bin is the maximum of the current value and the supplied value. That is, the bin stores the peak value seen so far. Note that each call will increase the total_value and total_count (and thus decrease the value_mag() and count_mag()) even if the value doesn't happen to be the maximum seen so far, since each data point still helps improve the sampling and thus the confidence.

weighted_average(self)

source code 

Return a continuous, interpolated equivalent of the max_value_bin().

For a cyclic distribution, this is the direction of the vector sum (see vector_sum()).

For a non-cyclic distribution, this is the arithmetic average of the data on the bin_axis, where each bin is weighted by its value.

vector_sum(self)

source code 

Return the vector sum of the data as a tuple (magnitude, avgbinnum).

Each bin contributes a vector of length equal to its value, at a direction corresponding to the bin number. Specifically, the total bin number range is mapped into a direction range [0,2pi].

For a cyclic distribution, the avgbinnum will be a continuous measure analogous to the max_value_bin() of the distribution. But this quantity has more precision than max_value_bin() because it is computed from the entire distribution instead of just the peak bin. However, it is likely to be useful only for uniform or very dense sampling; with sparse, non-uniform sampling the estimates will be biased significantly by the particular samples chosen.

The avgbinnum is not meaningful when the magnitude is 0, because a zero-length vector has no direction. To find out whether such cases occurred, you can compare the value of undefined_vals before and after a series of calls to this function.

selectivity(self)

source code 
Return a measure of the peakedness of the distribution. The calculation differs depending on whether this is a cyclic variable. For a cyclic variable, returns the magnitude of the vector_sum() divided by the sum_value() (see _vector_selectivity for more details). For a non-cyclic variable, returns the max_value_bin()) as a proportion of the sum_value() (see _relative_selectivity for more details).

_relative_selectivity(self)

source code 

Return max_value_bin()) as a proportion of the sum_value().

This quantity is a measure of how strongly the distribution is biased towards the max_value_bin(). For a smooth, single-lobed distribution with an inclusive, non-cyclic range, this quantity is an analog to vector_selectivity. To be a precise analog for arbitrary distributions, it would need to compute some measure of the selectivity that works like the weighted_average() instead of the max_value_bin(). The result is scaled such that if all bins are identical, the selectivity is 0.0, and if all bins but one are zero, the selectivity is 1.0.

_vector_selectivity(self)

source code 

Return the magnitude of the vector_sum() divided by the sum_value().

This quantity is a vector-based measure of the peakedness of the distribution. If only a single bin has a non-zero value(), the selectivity will be 1.0, and if all bins have the same value() then the selectivity will be 0.0. Other distributions will result in intermediate values.

For a distribution with a sum_value() of zero (i.e. all bins empty), the selectivity is undefined. Assuming that one will usually be looking for high selectivity, we return zero in such a case so that high selectivity will not mistakenly be claimed. To find out whether such cases occurred, you can compare the value of undefined_values() before and after a series of calls to this function.

_bins_to_radians(self, bin)

source code 

Convert a bin number to a direction in radians.

Works for NumPy arrays of bin numbers, returning an array of directions.

_radians_to_bins(self, direction)

source code 

Convert a direction in radians into a bin number.

Works for NumPy arrays of direction, returning an array of bin numbers.


Class Variable Details [hide private]

undefined_vals

int(x[, base]) -> integer

Convert a string or number to an integer, if possible. A floating point argument will be truncated towards zero (this does not include a string representation of a floating point number!) When converting a string, use the optional base. It is an error to supply a base when converting a non-string. If base is zero, the proper base is guessed based on the string content. If the argument is outside the integer range a long object will be returned instead.

Value:
0