`VectorDiffeomixture`

Inherits From: `RandomVariable`

VectorDiffeomixture distribution.

A vector diffeomixture (VDM) is a distribution parameterized by a convex combination of `K`

component `loc`

vectors, `loc[k], k = 0,...,K-1`

, and `K`

`scale`

matrices `scale[k], k = 0,..., K-1`

. It approximates the following [compound distribution] (https://en.wikipedia.org/wiki/Compound_probability_distribution)

```
p(x) = int p(x | z) p(z) dz,
where z is in the K-simplex, and
p(x | z) := p(x | loc=sum_k z[k] loc[k], scale=sum_k z[k] scale[k])
```

The integral `int p(x | z) p(z) dz`

is approximated with a quadrature scheme adapted to the mixture density `p(z)`

. The `N`

quadrature points `z_{N, n}`

and weights `w_{N, n}`

(which are non-negative and sum to 1) are chosen such that

`q_N(x) := sum_{n=1}^N w_{n, N} p(x | z_{N, n}) --> p(x)`

as `N --> infinity`

.

Since `q_N(x)`

is in fact a mixture (of `N`

points), we may sample from `q_N`

exactly. It is important to note that the VDM is *defined* as `q_N`

above, and *not* `p(x)`

. Therefore, sampling and pdf may be implemented as exact (up to floating point error) methods.

A common choice for the conditional `p(x | z)`

is a multivariate Normal.

The implemented marginal `p(z)`

is the `SoftmaxNormal`

, which is a `K-1`

dimensional Normal transformed by a `SoftmaxCentered`

bijector, making it a density on the `K`

-simplex. That is,

```
Z = SoftmaxCentered(X),
X = Normal(mix_loc / temperature, 1 / temperature)
```

The default quadrature scheme chooses `z_{N, n}`

as `N`

midpoints of the quantiles of `p(z)`

(generalized quantiles if `K > 2`

).

See [1] for more details.

[1]. “Quadrature Compound: An approximating family of distributions” Joshua Dillon, Ian Langmore, arXiv preprints https://arxiv.org/abs/1801.03080

`Vector`

distributions in TensorFlow.The `VectorDiffeomixture`

is a non-standard distribution that has properties particularly useful in variational Bayesian methods.

Conditioned on a draw from the SoftmaxNormal, `X|z`

is a vector whose components are linear combinations of affine transformations, thus is itself an affine transformation.

Note: The marginals `X_1|v, ..., X_d|v`

are *not* generally identical to some parameterization of `distribution`

. This is due to the fact that the sum of draws from `distribution`

are not generally itself the same `distribution`

.

`Diffeomixture`

s and reparameterization.The `VectorDiffeomixture`

is designed to be reparameterized, i.e., its parameters are only used to transform samples from a distribution which has no trainable parameters. This property is important because backprop stops at sources of stochasticity. That is, as long as the parameters are used *after* the underlying source of stochasticity, the computed gradient is accurate.

Reparametrization means that we can use gradient-descent (via backprop) to optimize Monte-Carlo objectives. Such objectives are a finite-sample approximation of an expectation and arise throughout scientific computing.

WARNING: If you backprop through a VectorDiffeomixture sample and the “base” distribution is both: not `FULLY_REPARAMETERIZED`

and a function of trainable variables, then the gradient is not guaranteed correct!

```python tfd = tf.contrib.distributions

`K=2`

and the affine`np.zeros(dims, dtype=np.float32)`

. np.float32([2.]*dims), ], scale=[ tf.linalg.LinearOperatorScaledIdentity( num_rows=dims, multiplier=np.float32(1.1), is_positive_definite=True), tf.linalg.LinearOperatorDiag( diag=np.linspace(2.5, 3.5, dims, dtype=np.float32), is_positive_definite=True), ], validate_args=True)`allow_nan_stats`

Python `bool`

describing behavior when a stat is undefined.

Stats return +/- infinity when it makes sense. E.g., the variance of a Cauchy distribution is infinity. However, sometimes the statistic is undefined, e.g., if a distribution’s pdf does not achieve a maximum within the support of the distribution, the mode is undefined. If the mean is undefined, then by definition the variance is undefined. E.g. the mean for Student’s T for df = 1 is undefined (no clear way to say it is either + or - infinity), so the variance = E[(X - mean)**2] is also undefined.

: Python`allow_nan_stats`

`bool`

.

`batch_shape`

Shape of a single sample from a single event index as a `TensorShape`

.

May be partially defined or unknown.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

:`batch_shape`

`TensorShape`

, possibly unknown.

`distribution`

Base scalar-event, scalar-batch distribution.

`dtype`

The `DType`

of `Tensor`

s handled by this `Distribution`

.

`endpoint_affine`

Affine transformation for each of `K`

components.

`event_shape`

Shape of a single sample from a single batch as a `TensorShape`

.

May be partially defined or unknown.

:`event_shape`

`TensorShape`

, possibly unknown.

`grid`

Grid of mixing probabilities, one for each grid point.

`interpolated_affine`

Affine transformation for each convex combination of `K`

components.

`mixture_distribution`

Distribution used to select a convex combination of affine transforms.

`name`

Name prepended to all ops created by this `Distribution`

.

`parameters`

Dictionary of parameters used to instantiate this `Distribution`

.

`reparameterization_type`

Describes how samples from the distribution are reparameterized.

Currently this is one of the static instances `distributions.FULLY_REPARAMETERIZED`

or `distributions.NOT_REPARAMETERIZED`

.

An instance of `ReparameterizationType`

.

`sample_shape`

Sample shape of random variable.

`shape`

Shape of random variable.

`validate_args`

Python `bool`

indicating possibly expensive checks are enabled.

**init**

```
__init__(
*args,
**kwargs
)
```

Constructs the VectorDiffeomixture on `R^d`

.

The vector diffeomixture (VDM) approximates the compound distribution

```
p(x) = int p(x | z) p(z) dz,
where z is in the K-simplex, and
p(x | z) := p(x | loc=sum_k z[k] loc[k], scale=sum_k z[k] scale[k])
```

:`mix_loc`

`float`

-like`Tensor`

with shape`[b1, ..., bB, K-1]`

. In terms of samples, larger`mix_loc[..., k]`

==>`Z`

is more likely to put more weight on its`kth`

component.:`temperature`

`float`

-like`Tensor`

. Broadcastable with`mix_loc`

. In terms of samples, smaller`temperature`

means one component is more likely to dominate. I.e., smaller`temperature`

makes the VDM look more like a standard mixture of`K`

components.:`distribution`

`tf.Distribution`

-like instance. Distribution from which`d`

iid samples are used as input to the selected affine transformation. Must be a scalar-batch, scalar-event distribution. Typically`distribution.reparameterization_type = FULLY_REPARAMETERIZED`

or it is a function of non-trainable parameters. WARNING: If you backprop through a VectorDiffeomixture sample and the`distribution`

is not`FULLY_REPARAMETERIZED`

yet is a function of trainable variables, then the gradient will be incorrect!: Length-`loc`

`K`

list of`float`

-type`Tensor`

s. The`k`

-th element represents the`shift`

used for the`k`

-th affine transformation. If the`k`

-th item is`None`

,`loc`

is implicitly`0`

. When specified, must have shape`[B1, ..., Bb, d]`

where`b >= 0`

and`d`

is the event size.: Length-`scale`

`K`

list of`LinearOperator`

s. Each should be positive-definite and operate on a`d`

-dimensional vector space. The`k`

-th element represents the`scale`

used for the`k`

-th affine transformation.`LinearOperator`

s must have shape`[B1, ..., Bb, d, d]`

,`b >= 0`

, i.e., characterizes`b`

-batches of`d x d`

matrices: Python`quadrature_size`

`int`

scalar representing number of quadrature points. Larger`quadrature_size`

means`q_N(x)`

better approximates`p(x)`

.: Python callable taking`quadrature_fn`

`normal_loc`

,`normal_scale`

,`quadrature_size`

,`validate_args`

and returning`tuple(grid, probs)`

representing the SoftmaxNormal grid and corresponding normalized weight. normalized) weight. Default value:`quadrature_scheme_softmaxnormal_quantiles`

.: Python`validate_args`

`bool`

, default`False`

. When`True`

distribution parameters are checked for validity despite possibly degrading runtime performance. When`False`

invalid inputs may silently render incorrect outputs.: Python`allow_nan_stats`

`bool`

, default`True`

. When`True`

, statistics (e.g., mean, mode, variance) use the value “`NaN`

” to indicate the result is undefined. When`False`

, an exception is raised if one or more of the statistic’s batch members are undefined.: Python`name`

`str`

name prefixed to Ops created by this class.

: if`ValueError`

`not scale or len(scale) < 2`

.: if`ValueError`

`len(loc) != len(scale)`

: if`ValueError`

`quadrature_grid_and_probs is not None`

and`len(quadrature_grid_and_probs[0]) != len(quadrature_grid_and_probs[1])`

: if`ValueError`

`validate_args`

and any not scale.is_positive_definite.: if any scale.dtype != scale[0].dtype.`TypeError`

: if any loc.dtype != scale[0].dtype.`TypeError`

: if`NotImplementedError`

`len(scale) != 2`

.: if`ValueError`

`not distribution.is_scalar_batch`

.: if`ValueError`

`not distribution.is_scalar_event`

.

**abs**

```
__abs__(
a,
*args
)
```

Computes the absolute value of a tensor.

Given a tensor `x`

of complex numbers, this operation returns a tensor of type `float32`

or `float64`

that is the absolute value of each element in `x`

. All elements in `x`

must be complex numbers of the form \(a + bj\). The absolute value is computed as \( \). For example:

```
x = tf.constant([[-2.25 + 4.75j], [-3.25 + 5.75j]])
tf.abs(x) # [5.25594902, 6.60492229]
```

: A`x`

`Tensor`

or`SparseTensor`

of type`float32`

,`float64`

,`int32`

,`int64`

,`complex64`

or`complex128`

.: A name for the operation (optional).`name`

A `Tensor`

or `SparseTensor`

the same size and type as `x`

with absolute values. Note, for `complex64`

or `complex128`

input, the returned `Tensor`

will be of type `float32`

or `float64`

, respectively.

**add**

```
__add__(
a,
*args
)
```

Returns x + y element-wise.

*NOTE*: `Add`

supports broadcasting. `AddN`

does not. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`half`

,`bfloat16`

,`float32`

,`float64`

,`uint8`

,`int8`

,`int16`

,`int32`

,`int64`

,`complex64`

,`complex128`

,`string`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

. Has the same type as `x`

.

**and**

```
__and__(
a,
*args
)
```

Returns the truth value of x AND y element-wise.

*NOTE*: `LogicalAnd`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

of type`bool`

.: A`y`

`Tensor`

of type`bool`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**bool**

`__bool__()`

**div**

```
__div__(
a,
*args
)
```

Divide two values using Python 2 semantics. Used for Tensor.__div__.

:`x`

`Tensor`

numerator of real numeric type.:`y`

`Tensor`

denominator of real numeric type.: A name for the operation (optional).`name`

`x / y`

returns the quotient of x and y.

**eq**

`__eq__(other)`

**floordiv**

```
__floordiv__(
a,
*args
)
```

Divides `x / y`

elementwise, rounding toward the most negative integer.

The same as `tf.div(x,y)`

for integers, but uses `tf.floor(tf.div(x,y))`

for floating point arguments so that the result is always an integer (though possibly an integer represented as floating point). This op is generated by `x // y`

floor division in Python 3 and in Python 2.7 with `from __future__ import division`

.

Note that for efficiency, `floordiv`

uses C semantics for negative numbers (unlike Python and Numpy).

`x`

and `y`

must have the same type, and the result will have the same type as well.

:`x`

`Tensor`

numerator of real numeric type.:`y`

`Tensor`

denominator of real numeric type.: A name for the operation (optional).`name`

`x / y`

rounded down (except possibly towards zero for negative integers).

: If the inputs are complex.`TypeError`

**ge**

```
__ge__(
a,
*args
)
```

Returns the truth value of (x >= y) element-wise.

*NOTE*: `GreaterEqual`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`float32`

,`float64`

,`int32`

,`uint8`

,`int16`

,`int8`

,`int64`

,`bfloat16`

,`uint16`

,`half`

,`uint32`

,`uint64`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**getitem**

```
__getitem__(
a,
*args
)
```

Overload for Tensor.__getitem__.

This operation extracts the specified region from the tensor. The notation is similar to NumPy with the restriction that currently only support basic indexing. That means that using a non-scalar tensor as input is not currently allowed.

Some useful examples:

```
# strip leading and trailing 2 elements
foo = tf.constant([1,2,3,4,5,6])
print(foo[2:-2].eval()) # => [3,4]
# skip every row and reverse every column
foo = tf.constant([[1,2,3], [4,5,6], [7,8,9]])
print(foo[::2,::-1].eval()) # => [[3,2,1], [9,8,7]]
# Use scalar tensors as indices on both dimensions
print(foo[tf.constant(0), tf.constant(2)].eval()) # => 3
# Insert another dimension
foo = tf.constant([[1,2,3], [4,5,6], [7,8,9]])
print(foo[tf.newaxis, :, :].eval()) # => [[[1,2,3], [4,5,6], [7,8,9]]]
print(foo[:, tf.newaxis, :].eval()) # => [[[1,2,3]], [[4,5,6]], [[7,8,9]]]
print(foo[:, :, tf.newaxis].eval()) # => [[[1],[2],[3]], [[4],[5],[6]],
[[7],[8],[9]]]
# Ellipses (3 equivalent operations)
foo = tf.constant([[1,2,3], [4,5,6], [7,8,9]])
print(foo[tf.newaxis, :, :].eval()) # => [[[1,2,3], [4,5,6], [7,8,9]]]
print(foo[tf.newaxis, ...].eval()) # => [[[1,2,3], [4,5,6], [7,8,9]]]
print(foo[tf.newaxis].eval()) # => [[[1,2,3], [4,5,6], [7,8,9]]]
```

Notes: - `tf.newaxis`

is `None`

as in NumPy. - An implicit ellipsis is placed at the end of the `slice_spec`

- NumPy advanced indexing is currently not supported.

: An ops.Tensor object.`tensor`

: The arguments to Tensor.__getitem__.`slice_spec`

: In the case of variable slice assignment, the Variable object to slice (i.e. tensor is the read-only view of this variable).`var`

The appropriate slice of “tensor”, based on “slice_spec”.

: If a slice range is negative size.`ValueError`

: If the slice indices aren’t int, slice, or Ellipsis.`TypeError`

**gt**

```
__gt__(
a,
*args
)
```

Returns the truth value of (x > y) element-wise.

*NOTE*: `Greater`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`float32`

,`float64`

,`int32`

,`uint8`

,`int16`

,`int8`

,`int64`

,`bfloat16`

,`uint16`

,`half`

,`uint32`

,`uint64`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**invert**

```
__invert__(
a,
*args
)
```

Returns the truth value of NOT x element-wise.

: A`x`

`Tensor`

of type`bool`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**iter**

`__iter__()`

**le**

```
__le__(
a,
*args
)
```

Returns the truth value of (x <= y) element-wise.

*NOTE*: `LessEqual`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`float32`

,`float64`

,`int32`

,`uint8`

,`int16`

,`int8`

,`int64`

,`bfloat16`

,`uint16`

,`half`

,`uint32`

,`uint64`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**lt**

```
__lt__(
a,
*args
)
```

Returns the truth value of (x < y) element-wise.

*NOTE*: `Less`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`float32`

,`float64`

,`int32`

,`uint8`

,`int16`

,`int8`

,`int64`

,`bfloat16`

,`uint16`

,`half`

,`uint32`

,`uint64`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**matmul**

```
__matmul__(
a,
*args
)
```

Multiplies matrix `a`

by matrix `b`

, producing `a`

* `b`

.

The inputs must, following any transpositions, be tensors of rank >= 2 where the inner 2 dimensions specify valid matrix multiplication arguments, and any further outer dimensions match.

Both matrices must be of the same type. The supported types are: `float16`

, `float32`

, `float64`

, `int32`

, `complex64`

, `complex128`

.

Either matrix can be transposed or adjointed (conjugated and transposed) on the fly by setting one of the corresponding flag to `True`

. These are `False`

by default.

If one or both of the matrices contain a lot of zeros, a more efficient multiplication algorithm can be used by setting the corresponding `a_is_sparse`

or `b_is_sparse`

flag to `True`

. These are `False`

by default. This optimization is only available for plain matrices (rank-2 tensors) with datatypes `bfloat16`

or `float32`

.

For example:

```
# 2-D tensor `a`
# [[1, 2, 3],
# [4, 5, 6]]
a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3])
# 2-D tensor `b`
# [[ 7, 8],
# [ 9, 10],
# [11, 12]]
b = tf.constant([7, 8, 9, 10, 11, 12], shape=[3, 2])
# `a` * `b`
# [[ 58, 64],
# [139, 154]]
c = tf.matmul(a, b)
# 3-D tensor `a`
# [[[ 1, 2, 3],
# [ 4, 5, 6]],
# [[ 7, 8, 9],
# [10, 11, 12]]]
a = tf.constant(np.arange(1, 13, dtype=np.int32),
shape=[2, 2, 3])
# 3-D tensor `b`
# [[[13, 14],
# [15, 16],
# [17, 18]],
# [[19, 20],
# [21, 22],
# [23, 24]]]
b = tf.constant(np.arange(13, 25, dtype=np.int32),
shape=[2, 3, 2])
# `a` * `b`
# [[[ 94, 100],
# [229, 244]],
# [[508, 532],
# [697, 730]]]
c = tf.matmul(a, b)
# Since python >= 3.5 the @ operator is supported (see PEP 465).
# In TensorFlow, it simply calls the `tf.matmul()` function, so the
# following lines are equivalent:
d = a @ b @ [[10.], [11.]]
d = tf.matmul(tf.matmul(a, b), [[10.], [11.]])
```

:`a`

`Tensor`

of type`float16`

,`float32`

,`float64`

,`int32`

,`complex64`

,`complex128`

and rank > 1.:`b`

`Tensor`

with same type and rank as`a`

.: If`transpose_a`

`True`

,`a`

is transposed before multiplication.: If`transpose_b`

`True`

,`b`

is transposed before multiplication.: If`adjoint_a`

`True`

,`a`

is conjugated and transposed before multiplication.: If`adjoint_b`

`True`

,`b`

is conjugated and transposed before multiplication.: If`a_is_sparse`

`True`

,`a`

is treated as a sparse matrix.: If`b_is_sparse`

`True`

,`b`

is treated as a sparse matrix.: Name for the operation (optional).`name`

A `Tensor`

of the same type as `a`

and `b`

where each inner-most matrix is the product of the corresponding matrices in `a`

and `b`

, e.g. if all transpose or adjoint attributes are `False`

:

`output`

[…, i, j] = sum_k (`a`

[…, i, k] * `b`

[…, k, j]), for all indices i, j.

: This is matrix product, not element-wise product.`Note`

: If transpose_a and adjoint_a, or transpose_b and adjoint_b are both set to True.`ValueError`

**mod**

```
__mod__(
a,
*args
)
```

Returns element-wise remainder of division. When `x < 0`

xor `y < 0`

is

true, this follows Python semantics in that the result here is consistent with a flooring divide. E.g. `floor(x / y) * y + mod(x, y) = x`

.

*NOTE*: `FloorMod`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`int32`

,`int64`

,`bfloat16`

,`float32`

,`float64`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

. Has the same type as `x`

.

**mul**

```
__mul__(
a,
*args
)
```

Dispatches cwise mul for “Dense*Dense" and “Dense*Sparse“.

**neg**

```
__neg__(
a,
*args
)
```

Computes numerical negative value element-wise.

I.e., \(y = -x\).

: A`x`

`Tensor`

. Must be one of the following types:`half`

,`bfloat16`

,`float32`

,`float64`

,`int32`

,`int64`

,`complex64`

,`complex128`

.: A name for the operation (optional).`name`

A `Tensor`

. Has the same type as `x`

.

**nonzero**

`__nonzero__()`

**or**

```
__or__(
a,
*args
)
```

Returns the truth value of x OR y element-wise.

*NOTE*: `LogicalOr`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

of type`bool`

.: A`y`

`Tensor`

of type`bool`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**pow**

```
__pow__(
a,
*args
)
```

Computes the power of one value to another.

Given a tensor `x`

and a tensor `y`

, this operation computes \(x^y\) for corresponding elements in `x`

and `y`

. For example:

```
x = tf.constant([[2, 2], [3, 3]])
y = tf.constant([[8, 16], [2, 3]])
tf.pow(x, y) # [[256, 65536], [9, 27]]
```

: A`x`

`Tensor`

of type`float32`

,`float64`

,`int32`

,`int64`

,`complex64`

, or`complex128`

.: A`y`

`Tensor`

of type`float32`

,`float64`

,`int32`

,`int64`

,`complex64`

, or`complex128`

.: A name for the operation (optional).`name`

A `Tensor`

.

**radd**

```
__radd__(
a,
*args
)
```

Returns x + y element-wise.

*NOTE*: `Add`

supports broadcasting. `AddN`

does not. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`half`

,`bfloat16`

,`float32`

,`float64`

,`uint8`

,`int8`

,`int16`

,`int32`

,`int64`

,`complex64`

,`complex128`

,`string`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

. Has the same type as `x`

.

**rand**

```
__rand__(
a,
*args
)
```

Returns the truth value of x AND y element-wise.

*NOTE*: `LogicalAnd`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

of type`bool`

.: A`y`

`Tensor`

of type`bool`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**rdiv**

```
__rdiv__(
a,
*args
)
```

Divide two values using Python 2 semantics. Used for Tensor.__div__.

:`x`

`Tensor`

numerator of real numeric type.:`y`

`Tensor`

denominator of real numeric type.: A name for the operation (optional).`name`

`x / y`

returns the quotient of x and y.

**rfloordiv**

```
__rfloordiv__(
a,
*args
)
```

Divides `x / y`

elementwise, rounding toward the most negative integer.

The same as `tf.div(x,y)`

for integers, but uses `tf.floor(tf.div(x,y))`

for floating point arguments so that the result is always an integer (though possibly an integer represented as floating point). This op is generated by `x // y`

floor division in Python 3 and in Python 2.7 with `from __future__ import division`

.

Note that for efficiency, `floordiv`

uses C semantics for negative numbers (unlike Python and Numpy).

`x`

and `y`

must have the same type, and the result will have the same type as well.

:`x`

`Tensor`

numerator of real numeric type.:`y`

`Tensor`

denominator of real numeric type.: A name for the operation (optional).`name`

`x / y`

rounded down (except possibly towards zero for negative integers).

: If the inputs are complex.`TypeError`

**rmatmul**

```
__rmatmul__(
a,
*args
)
```

Multiplies matrix `a`

by matrix `b`

, producing `a`

* `b`

.

The inputs must, following any transpositions, be tensors of rank >= 2 where the inner 2 dimensions specify valid matrix multiplication arguments, and any further outer dimensions match.

Both matrices must be of the same type. The supported types are: `float16`

, `float32`

, `float64`

, `int32`

, `complex64`

, `complex128`

.

Either matrix can be transposed or adjointed (conjugated and transposed) on the fly by setting one of the corresponding flag to `True`

. These are `False`

by default.

If one or both of the matrices contain a lot of zeros, a more efficient multiplication algorithm can be used by setting the corresponding `a_is_sparse`

or `b_is_sparse`

flag to `True`

. These are `False`

by default. This optimization is only available for plain matrices (rank-2 tensors) with datatypes `bfloat16`

or `float32`

.

For example:

```
# 2-D tensor `a`
# [[1, 2, 3],
# [4, 5, 6]]
a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3])
# 2-D tensor `b`
# [[ 7, 8],
# [ 9, 10],
# [11, 12]]
b = tf.constant([7, 8, 9, 10, 11, 12], shape=[3, 2])
# `a` * `b`
# [[ 58, 64],
# [139, 154]]
c = tf.matmul(a, b)
# 3-D tensor `a`
# [[[ 1, 2, 3],
# [ 4, 5, 6]],
# [[ 7, 8, 9],
# [10, 11, 12]]]
a = tf.constant(np.arange(1, 13, dtype=np.int32),
shape=[2, 2, 3])
# 3-D tensor `b`
# [[[13, 14],
# [15, 16],
# [17, 18]],
# [[19, 20],
# [21, 22],
# [23, 24]]]
b = tf.constant(np.arange(13, 25, dtype=np.int32),
shape=[2, 3, 2])
# `a` * `b`
# [[[ 94, 100],
# [229, 244]],
# [[508, 532],
# [697, 730]]]
c = tf.matmul(a, b)
# Since python >= 3.5 the @ operator is supported (see PEP 465).
# In TensorFlow, it simply calls the `tf.matmul()` function, so the
# following lines are equivalent:
d = a @ b @ [[10.], [11.]]
d = tf.matmul(tf.matmul(a, b), [[10.], [11.]])
```

:`a`

`Tensor`

of type`float16`

,`float32`

,`float64`

,`int32`

,`complex64`

,`complex128`

and rank > 1.:`b`

`Tensor`

with same type and rank as`a`

.: If`transpose_a`

`True`

,`a`

is transposed before multiplication.: If`transpose_b`

`True`

,`b`

is transposed before multiplication.: If`adjoint_a`

`True`

,`a`

is conjugated and transposed before multiplication.: If`adjoint_b`

`True`

,`b`

is conjugated and transposed before multiplication.: If`a_is_sparse`

`True`

,`a`

is treated as a sparse matrix.: If`b_is_sparse`

`True`

,`b`

is treated as a sparse matrix.: Name for the operation (optional).`name`

A `Tensor`

of the same type as `a`

and `b`

where each inner-most matrix is the product of the corresponding matrices in `a`

and `b`

, e.g. if all transpose or adjoint attributes are `False`

:

`output`

[…, i, j] = sum_k (`a`

[…, i, k] * `b`

[…, k, j]), for all indices i, j.

: This is matrix product, not element-wise product.`Note`

: If transpose_a and adjoint_a, or transpose_b and adjoint_b are both set to True.`ValueError`

**rmod**

```
__rmod__(
a,
*args
)
```

Returns element-wise remainder of division. When `x < 0`

xor `y < 0`

is

true, this follows Python semantics in that the result here is consistent with a flooring divide. E.g. `floor(x / y) * y + mod(x, y) = x`

.

*NOTE*: `FloorMod`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`int32`

,`int64`

,`bfloat16`

,`float32`

,`float64`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

. Has the same type as `x`

.

**rmul**

```
__rmul__(
a,
*args
)
```

Dispatches cwise mul for “Dense*Dense" and “Dense*Sparse“.

**ror**

```
__ror__(
a,
*args
)
```

Returns the truth value of x OR y element-wise.

*NOTE*: `LogicalOr`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

of type`bool`

.: A`y`

`Tensor`

of type`bool`

.: A name for the operation (optional).`name`

A `Tensor`

of type `bool`

.

**rpow**

```
__rpow__(
a,
*args
)
```

Computes the power of one value to another.

Given a tensor `x`

and a tensor `y`

, this operation computes \(x^y\) for corresponding elements in `x`

and `y`

. For example:

```
x = tf.constant([[2, 2], [3, 3]])
y = tf.constant([[8, 16], [2, 3]])
tf.pow(x, y) # [[256, 65536], [9, 27]]
```

: A`x`

`Tensor`

of type`float32`

,`float64`

,`int32`

,`int64`

,`complex64`

, or`complex128`

.: A`y`

`Tensor`

of type`float32`

,`float64`

,`int32`

,`int64`

,`complex64`

, or`complex128`

.: A name for the operation (optional).`name`

A `Tensor`

.

**rsub**

```
__rsub__(
a,
*args
)
```

Returns x - y element-wise.

*NOTE*: `Subtract`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`half`

,`bfloat16`

,`float32`

,`float64`

,`uint8`

,`int8`

,`uint16`

,`int16`

,`int32`

,`int64`

,`complex64`

,`complex128`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

. Has the same type as `x`

.

**rtruediv**

```
__rtruediv__(
a,
*args
)
```

**rxor**

```
__rxor__(
a,
*args
)
```

x ^ y = (x | y) & ~(x & y).

**sub**

```
__sub__(
a,
*args
)
```

Returns x - y element-wise.

*NOTE*: `Subtract`

supports broadcasting. More about broadcasting here

: A`x`

`Tensor`

. Must be one of the following types:`half`

,`bfloat16`

,`float32`

,`float64`

,`uint8`

,`int8`

,`uint16`

,`int16`

,`int32`

,`int64`

,`complex64`

,`complex128`

.: A`y`

`Tensor`

. Must have the same type as`x`

.: A name for the operation (optional).`name`

A `Tensor`

. Has the same type as `x`

.

**truediv**

```
__truediv__(
a,
*args
)
```

**xor**

```
__xor__(
a,
*args
)
```

x ^ y = (x | y) & ~(x & y).

`batch_shape_tensor`

`batch_shape_tensor(name='batch_shape_tensor')`

Shape of a single sample from a single event index as a 1-D `Tensor`

.

The batch dimensions are indexes into independent, non-identical parameterizations of this distribution.

: name to give to the op`name`

:`batch_shape`

`Tensor`

.

`cdf`

```
cdf(
value,
name='cdf'
)
```

Cumulative distribution function.

Given random variable `X`

, the cumulative distribution function `cdf`

is:

`cdf(x) := P[X <= x]`

:`value`

`float`

or`double`

`Tensor`

.: Python`name`

`str`

prepended to names of ops created by this function.

: a`cdf`

`Tensor`

of shape`sample_shape(x) + self.batch_shape`

with values of type`self.dtype`

.

`copy`

`copy(**override_parameters_kwargs)`

Creates a deep copy of the distribution.

Note: the copy distribution may continue to depend on the original initialization arguments.

**override_parameters_kwargs: String/value dictionary of initialization arguments to override with new values.

: A new instance of`distribution`

`type(self)`

initialized from the union of self.parameters and override_parameters_kwargs, i.e.,`dict(self.parameters, **override_parameters_kwargs)`

.

`covariance`

`covariance(name='covariance')`

Covariance.

Covariance is (possibly) defined only for non-scalar-event distributions.

For example, for a length-`k`

, vector-valued distribution, it is calculated as,

`Cov[i, j] = Covariance(X_i, X_j) = E[(X_i - E[X_i]) (X_j - E[X_j])]`

where `Cov`

is a (batch of) `k x k`

matrix, `0 <= (i, j) < k`

, and `E`

denotes expectation.

Alternatively, for non-vector, multivariate distributions (e.g., matrix-valued, Wishart), `Covariance`

shall return a (batch of) matrices under some vectorization of the events, i.e.,

`Cov[i, j] = Covariance(Vec(X)_i, Vec(X)_j) = [as above]`

where `Cov`

is a (batch of) `k' x k'`

matrices, `0 <= (i, j) < k' = reduce_prod(event_shape)`

, and `Vec`

is some function mapping indices of this distribution’s event dimensions to indices of a length-`k'`

vector.

: Python`name`

`str`

prepended to names of ops created by this function.

: Floating-point`covariance`

`Tensor`

with shape`[B1, ..., Bn, k', k']`

where the first`n`

dimensions are batch coordinates and`k' = reduce_prod(self.event_shape)`

.

`cross_entropy`

```
cross_entropy(
other,
name='cross_entropy'
)
```

Computes the (Shannon) cross entropy.

Denote this distribution (`self`

) by `P`

and the `other`

distribution by `Q`

. Assuming `P, Q`

are absolutely continuous with respect to one another and permit densities `p(x) dr(x)`

and `q(x) dr(x)`

, (Shanon) cross entropy is defined as:

`H[P, Q] = E_p[-log q(X)] = -int_F p(x) log q(x) dr(x)`

where `F`

denotes the support of the random variable `X ~ P`

.

:`other`

`tf.distributions.Distribution`

instance.: Python`name`

`str`

prepended to names of ops created by this function.

:`cross_entropy`

`self.dtype`

`Tensor`

with shape`[B1, ..., Bn]`

representing`n`

different calculations of (Shanon) cross entropy.

`entropy`

`entropy(name='entropy')`

Shannon entropy in nats.

`eval`

```
eval(
session=None,
feed_dict=None
)
```

In a session, computes and returns the value of this random variable.

This is not a graph construction method, it does not add ops to the graph.

This convenience method requires a session where the graph containing this variable has been launched. If no session is passed, the default session is used.

: tf.BaseSession. The`session`

`tf.Session`

to use to evaluate this random variable. If none, the default session is used.: dict. A dictionary that maps`feed_dict`

`tf.Tensor`

objects to feed values. See`tf.Session.run()`

for a description of the valid feed values.

```
x = Normal(0.0, 1.0)
with tf.Session() as sess:
# Usage passing the session explicitly.
print(x.eval(sess))
# Usage with the default session. The 'with' block
# above makes 'sess' the default session.
print(x.eval())
```

`event_shape_tensor`

`event_shape_tensor(name='event_shape_tensor')`

Shape of a single sample from a single batch as a 1-D int32 `Tensor`

.

: name to give to the op`name`

:`event_shape`

`Tensor`

.

`get_ancestors`

`get_ancestors(collection=None)`

Get ancestor random variables.

`get_blanket`

`get_blanket(collection=None)`

Get the random variable’s Markov blanket.

`get_children`

`get_children(collection=None)`

Get child random variables.

`get_descendants`

`get_descendants(collection=None)`

Get descendant random variables.

`get_parents`

`get_parents(collection=None)`

Get parent random variables.

`get_shape`

`get_shape()`

Get shape of random variable.

`get_siblings`

`get_siblings(collection=None)`

Get sibling random variables.

`get_variables`

`get_variables(collection=None)`

Get TensorFlow variables that the random variable depends on.

`is_scalar_batch`

`is_scalar_batch(name='is_scalar_batch')`

Indicates that `batch_shape == []`

.

: Python`name`

`str`

prepended to names of ops created by this function.

:`is_scalar_batch`

`bool`

scalar`Tensor`

.

`is_scalar_event`

`is_scalar_event(name='is_scalar_event')`

Indicates that `event_shape == []`

.

: Python`name`

`str`

prepended to names of ops created by this function.

:`is_scalar_event`

`bool`

scalar`Tensor`

.

`kl_divergence`

```
kl_divergence(
other,
name='kl_divergence'
)
```

Computes the Kullback–Leibler divergence.

Denote this distribution (`self`

) by `p`

and the `other`

distribution by `q`

. Assuming `p, q`

are absolutely continuous with respect to reference measure `r`

, the KL divergence is defined as:

```
KL[p, q] = E_p[log(p(X)/q(X))]
= -int_F p(x) log q(x) dr(x) + int_F p(x) log p(x) dr(x)
= H[p, q] - H[p]
```

where `F`

denotes the support of the random variable `X ~ p`

, `H[., .]`

denotes (Shanon) cross entropy, and `H[.]`

denotes (Shanon) entropy.

:`other`

`tf.distributions.Distribution`

instance.: Python`name`

`str`

prepended to names of ops created by this function.

:`kl_divergence`

`self.dtype`

`Tensor`

with shape`[B1, ..., Bn]`

representing`n`

different calculations of the Kullback-Leibler divergence.

`log_cdf`

```
log_cdf(
value,
name='log_cdf'
)
```

Log cumulative distribution function.

Given random variable `X`

, the cumulative distribution function `cdf`

is:

`log_cdf(x) := Log[ P[X <= x] ]`

Often, a numerical approximation can be used for `log_cdf(x)`

that yields a more accurate answer than simply taking the logarithm of the `cdf`

when `x << -1`

.

:`value`

`float`

or`double`

`Tensor`

.: Python`name`

`str`

prepended to names of ops created by this function.

: a`logcdf`

`Tensor`

of shape`sample_shape(x) + self.batch_shape`

with values of type`self.dtype`

.

`log_prob`

```
log_prob(
value,
name='log_prob'
)
```

Log probability density/mass function.

:`value`

`float`

or`double`

`Tensor`

.: Python`name`

`str`

prepended to names of ops created by this function.

: a`log_prob`

`Tensor`

of shape`sample_shape(x) + self.batch_shape`

with values of type`self.dtype`

.

`log_survival_function`

```
log_survival_function(
value,
name='log_survival_function'
)
```

Log survival function.

Given random variable `X`

, the survival function is defined:

```
log_survival_function(x) = Log[ P[X > x] ]
= Log[ 1 - P[X <= x] ]
= Log[ 1 - cdf(x) ]
```

Typically, different numerical approximations can be used for the log survival function, which are more accurate than `1 - cdf(x)`

when `x >> 1`

.

:`value`

`float`

or`double`

`Tensor`

.: Python`name`

`str`

prepended to names of ops created by this function.

`Tensor`

of shape `sample_shape(x) + self.batch_shape`

with values of type `self.dtype`

.

`mean`

`mean(name='mean')`

Mean.

`mode`

`mode(name='mode')`

Mode.

`param_shapes`

```
param_shapes(
cls,
sample_shape,
name='DistributionParamShapes'
)
```

Shapes of parameters given the desired shape of a call to `sample()`

.

This is a class method that describes what key/value arguments are required to instantiate the given `Distribution`

so that a particular shape is returned for that instance’s call to `sample()`

.

Subclasses should override class method `_param_shapes`

.

:`sample_shape`

`Tensor`

or python list/tuple. Desired shape of a call to`sample()`

.: name to prepend ops with.`name`

`dict`

of parameter name to `Tensor`

shapes.

`param_static_shapes`

```
param_static_shapes(
cls,
sample_shape
)
```

param_shapes with static (i.e. `TensorShape`

) shapes.

This is a class method that describes what key/value arguments are required to instantiate the given `Distribution`

so that a particular shape is returned for that instance’s call to `sample()`

. Assumes that the sample’s shape is known statically.

Subclasses should override class method `_param_shapes`

to return constant-valued tensors when constant values are fed.

:`sample_shape`

`TensorShape`

or python list/tuple. Desired shape of a call to`sample()`

.

`dict`

of parameter name to `TensorShape`

.

: if`ValueError`

`sample_shape`

is a`TensorShape`

and is not fully defined.

`prob`

```
prob(
value,
name='prob'
)
```

Probability density/mass function.

:`value`

`float`

or`double`

`Tensor`

.: Python`name`

`str`

prepended to names of ops created by this function.

: a`prob`

`Tensor`

of shape`sample_shape(x) + self.batch_shape`

with values of type`self.dtype`

.

`quantile`

```
quantile(
value,
name='quantile'
)
```

Quantile function. Aka “inverse cdf” or “percent point function”.

Given random variable `X`

and `p in [0, 1]`

, the `quantile`

is:

`quantile(p) := x such that P[X <= x] == p`

:`value`

`float`

or`double`

`Tensor`

.: Python`name`

`str`

prepended to names of ops created by this function.

: a`quantile`

`Tensor`

of shape`sample_shape(x) + self.batch_shape`

with values of type`self.dtype`

.

`sample`

```
sample(
sample_shape=(),
seed=None,
name='sample'
)
```

Generate samples of the specified shape.

Note that a call to `sample()`

without arguments will generate a single sample.

: 0D or 1D`sample_shape`

`int32`

`Tensor`

. Shape of the generated samples.: Python integer seed for RNG`seed`

: name to give to the op.`name`

: a`samples`

`Tensor`

with prepended dimensions`sample_shape`

.

`stddev`

`stddev(name='stddev')`

Standard deviation.

Standard deviation is defined as,

`stddev = E[(X - E[X])**2]**0.5`

where `X`

is the random variable associated with this distribution, `E`

denotes expectation, and `stddev.shape = batch_shape + event_shape`

.

: Python`name`

`str`

prepended to names of ops created by this function.

: Floating-point`stddev`

`Tensor`

with shape identical to`batch_shape + event_shape`

, i.e., the same shape as`self.mean()`

.

`survival_function`

```
survival_function(
value,
name='survival_function'
)
```

Survival function.

Given random variable `X`

, the survival function is defined:

```
survival_function(x) = P[X > x]
= 1 - P[X <= x]
= 1 - cdf(x).
```

:`value`

`float`

or`double`

`Tensor`

.: Python`name`

`str`

prepended to names of ops created by this function.

`Tensor`

of shape `sample_shape(x) + self.batch_shape`

with values of type `self.dtype`

.

`value`

`value()`

Get tensor that the random variable corresponds to.

`variance`

`variance(name='variance')`

Variance.

Variance is defined as,

`Var = E[(X - E[X])**2]`

where `X`

is the random variable associated with this distribution, `E`

denotes expectation, and `Var.shape = batch_shape + event_shape`

.

: Python`name`

`str`

prepended to names of ops created by this function.

: Floating-point`variance`

`Tensor`

with shape identical to`batch_shape + event_shape`

, i.e., the same shape as`self.mean()`

.

**array_priority**