making Bayesian modelling easy(er)¶
uravu
is about the relationship between some data and a function that may be used to describe the data.
The aim of uravu
is to make using the amazing Bayesian inference libraries that are available in Python as easy as scipy.optimize.curve_fit()
.
Therefore enabling many more to make use of these exciting tools and powerful libraries.
Plus, we have some nice plotting functionalities available in the plotting
module, capable of generating publication quality figures.

In an effort to make the uravu
API friendly to those new to Bayesian inference, uravu
is opinionated, making assumptions about priors amoung other things.
However, we have endevoured to make it straightforward to ignore these opinions.
In addition to the library and API, we also have some basic tutorials discussing how Bayesian inference methods can be used in the analysis of data.
uravu
is under active development, more details of which can be found on Github.
Bayesian inference in Python¶
There are a couple of fantastic Bayesian inference libraries available in Python that uravu
makes use of:
emcee
: enables the use of the Goodman & Weare’s Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler to evaluate the structure of the model parameter posterior distributions,dynesty
: implements the nested sampling and dynamic nested sampling algorithms for evidence estimation.
Where function
is some user-defined function, abscissa
is x-data, ordinate
is y-data, and ordinate_error
is y-uncertainty, getting an uravu.relationship.Relationship
running is as simple as.
from uravu.relationship import Relationship
modeller = Relationship(function, abscissa, ordinate, ordinate_error=ordinate_error)
modeller.max_likelihood('mini')
modeller.mcmc()
modeller.nested_sampling()
Installation¶
uravu
can be installed from the PyPI package manager with pip
pip install uravu
Alternatively, if you would like to download and install the latest development build, it can be found Github along with specific installation instructions.
Getting started¶
uravu
enables the use of powerful Bayesian modelling in Python, building on the capabilities of emcee
and dynesty
, for the understanding of some data. To show how this can be used, we will create some synthetic data using numpy
.
[1]:
import numpy as np
import matplotlib.pyplot as plt
from uravu.relationship import Relationship
from uravu import plotting
[2]:
np.random.seed(1)
[3]:
x = np.linspace(0, 9, 10)
y = np.linspace(0, 18, 10) + 2
y += y * np.random.randn((10)) * 0.025
y_err = np.ones((10))
Plotting this data, shows it appears to be a straight line relationship, with some gradient and intercept.
[4]:
plt.errorbar(x, y, y_err, marker='o', ls='')
plt.show()

Let’s use uravu
to model a straight line function for the data, for more information about the requirements of the input function check out the Input functions tutorial.
[5]:
def straight_line(x, m, c):
"""
A straight line.
Args:
x (array_like): Abscissa data.
m (float): Gradient.
c (float): Intercept.
Returns:
(array_like): Ordinate data.
"""
return m * x + c
The function, x data, y data, and y error can then be brought together in the Relationship
class.
[6]:
modeller = Relationship(straight_line, x, y, ordinate_error=y_err)
The Relationship
class enables substantial functionality and is at the core of uravu
, you can learn about this functionality in the tutorials. Quickly, we will show the maximisation of the likelihood (there is more about this in the first tutorial).
[7]:
modeller.max_likelihood('mini')
The string 'mini'
indicates that the standard scipy.optimize.minimize() function should be used.
The built in plotting
library can be used to make publication quality plot of the Relationship
.
[8]:
fig, ax = plt.subplots(figsize=(10, 6))
ax = plotting.plot_relationship(modeller, axes=ax)
plt.show()

We hope you enjoy using uravu
, feel free to contribute on Github, and tell your friends and colleagues!
Tutorials¶
Here are some simple tutorials, increasing in complexity, of how Bayesian inference may be used in the analysis of some data.
Some of these were heavily inspired by the tutorial paper of Hogg, Bovy, and Lang so for more information we recommend that. If you want to go even deeper, then check out Information Theory, Inference, and Learning Algorithms from David MacKay.
- Maximum likelihood
- Input Functions
- Markov chain Monte Carlo
- Using distributions
- Nested Sampling
- Custom priors
Maximum likelihood¶
In Bayesian modelling, the likelihood, , is the name given to the measure of the goodness of fit between the model, with some given variables, and the data. When the likelihood is maximised,
, the most probable statistical model has been found for the given data.
In this tutorial we will see how uravu
can be used to maximize the likelihood of a model for some dataset.
In uravu
, when the sample is normally distributed the likelihood is calculated as follows,
where, is the data ordinate,
is the model ordinate, and
is uncertainty in
.
uravu
is able to maximize this function using the scipy.optimize.minimize() function (we minimize the negative of the likelihood).
Before we maximise the likelihood, is it necessary to create some synthetic data to analyse.
[1]:
import numpy as np
import matplotlib.pyplot as plt
from uravu import plotting
from uravu.relationship import Relationship
[2]:
np.random.seed(1)
[3]:
x = np.linspace(0, 10, 20)
y = np.exp(0.5 * x) * 4
y += y * np.random.randn(20) * 0.1
dy = y * 0.2
[4]:
plt.errorbar(x, y, dy, marker='o', ls='')
plt.yscale('log')
plt.show()

The data plotted above (note the logarthimic -axis) may be modelled with the following relationship,
where and
are the variables of interest in the modelling process. We want to find the values for these variables, which maximises the likelihood.
First, we must write a function to describe the model (more about the function specification can be found in teh Input functions tutorial).
[5]:
def my_model(x, a, b):
"""
A function to describe the model under investgation.
Args:
x (array_like): Abscissa data.
a (float): The pre-exponential factor.
b (float): The x-multiplicative factor.
Returns
y (array_like): Ordinate data.
"""
return a * np.exp(b * x)
With our model defined, we can construct a Relationship
object.
[6]:
modeller = Relationship(my_model, x, y, ordinate_error=dy)
The Relationship
object gives us access to a few powerful Bayesian modelling methods. However, this tutorial is focused on maximising the likelihood, this is achieved with the max_likelihood()
class method, where the keyword 'mini'
indicates the standard scipy.optimize.minimize() function should be used.
[7]:
modeller.max_likelihood('mini')
[8]:
print(modeller.variable_modes)
[4.55082086 0.46090669]
We can see that the variables are close to the values used in the data synthesis. Note, that here variable_modes
are in fact the variable values that maximise the likelihood.
Let’s inspect the model visually. This can be achieved easily with the plotting
module in uravu
.
[9]:
ax = plotting.plot_relationship(modeller)
plt.yscale('log')
plt.show()

Above, we can see that the orange line of maximum likelihood agrees well with the data.
Input functions¶
One of the main features of uravu
is making Bayesian data modelling for a user defined function as straight forward as scipy.optimize.curve_fit().
The input functions in uravu
are currently designed to be as straightforward as possible. It should be a function, where the first argument is the abscissa and the other arguments are variables.
[1]:
def my_straight_line(x, m, c):
"""
A straight line function.
Args:
x (array_like): Abscissa data.
m (float): Gradient.
c (float): Intercept.
Returns:
(array_like): Model
"""
return x * m + c
Note that the function variables must each be a single float
or int
value. This is due to underlying mechanics of uravu
limiting the use of array or variable-length arguments.
Additionally, the return of the function should only be the resulting model data, no other blob
of information is accepted unfortunately.
In future, we will work on enabling the more flexible function definitions. If you have any ideas, please feel free to contribute.
Markov chain Monte Carlo¶
In the maximum likelihood tutorial, you were introduced to the idea of the likelihood, . This parameter is just one of the components of Bayes theorem, the foundation of Bayesian inference,
where, is the posterior probability,
is the prior probability, and
is the evidence (more about these latter two factors in the custom priors and nested sampling tutorials). This tutorial will show how Markov chain Monte Carlo may be used to evaluate the shape of the posterior probability, and show what this means for our understanding of a model’s inverse uncertainties.
Let’s start where we left the maximum likelihood tutorial, with a set of variables that maximise the likelihood of some model.
[1]:
import numpy as np
import matplotlib.pyplot as plt
from uravu.relationship import Relationship
from uravu import plotting
[2]:
np.random.seed(1)
[3]:
x = np.linspace(0, 10, 20)
y = np.exp(0.5 * x) * 4
y += y * np.random.randn(20) * 0.1
dy = y * 0.2
[4]:
def my_model(x, a, b):
"""
A function to describe the model under investgation.
Args:
x (array_like): Abscissa data.
a (float): The pre-exponential factor.
b (float): The x-multiplicative factor.
Returns
y (array_like): Ordinate data.
"""
return a * np.exp(b * x)
This time, we have added bounds
to out Relationship
object. These bounds are are associated with the variables, so in this example the a
is constrained between and
, while
b
is constrained between and
.
This allows the use of the differential evoluation (with the string 'diff_evo'
) minimisation algorithm, which is a global minimisation method.
[5]:
modeller = Relationship(my_model, x, y, ordinate_error=dy, bounds=((0, 10), (0, 1)))
modeller.max_likelihood('diff_evo')
[6]:
ax = plotting.plot_relationship(modeller)
plt.yscale('log')
plt.show()

The maximum likelihood values for the variables, a
and b
, in my_model()
are,
[7]:
print(modeller.variable_modes)
[4.55082025 0.46090671]
It is clear from the plot above that there is some uncertainty in the experimental data (shown by the error bars). However, there is no uncertainty in the variables currently, because these are the single values which maximize the likelihood.
This is where the posterior distributions are useful. If we know the structure of these distributions, we can attribute confidence intervals (and when the distributions are Gaussian, standard deviations) to these variables.
MCMC is a tool that we can use to sample the posterior distribution. In uravu
performing an MCMC sampling for some relationship is as simple as,
[8]:
modeller.mcmc()
100%|██████████| 1000/1000 [01:14<00:00, 13.40it/s]
This will run 25000 samples of the posterior distribution.
Having performed the MCMC sampling, lets visualise the results.
[9]:
plotting.plot_corner(modeller)
plt.show()

Above is a corner plot, showing the posterior distribution for each variables and a correlation plot showing the correlation between them.
We can also print the median or mode of each variable with (these should be slightly different, as the distributions are not normal),
[10]:
print(modeller.variable_medians)
print(modeller.variable_modes)
[4.46522622 0.46464549]
[4.56938421 0.45901488]
Each variable is stored in a list in the variables
object.
[11]:
print(modeller.variables[0].samples)
[3.55193264 4.5166827 3.57466299 ... 4.87739729 4.15375473 3.9105342 ]
These objects are of type uravu.distribution.Distribution
, a special class for storing information about a distribution.
[12]:
modeller.variables[0].normal, modeller.variables[0].n, modeller.variables[0].con_int
[12]:
(False, 4.465226223112712, array([3.40517564, 5.06201216]))
Above we can see that the 0th variable (a
in the function my_model
) is not normally distributed and can be reported as , with a
confidence interval from
to
.
The uravu
plotting will take advantage of these Distribution
class objects. When the relationship is plotted, a subset of the models that are probable for the given data will be plotted, giving a probability distribution of lines on the plot on top of the data.
[13]:
plotting.plot_relationship(modeller)
plt.yscale('log')
plt.show()

The orange line is now spread across most of the measured uncertainties, indicating the range of probable models that are available given the data.
Using distributions¶
uravu
isn’t limited to using normally distributed ordinate values. Any distribution of ordinate values can be used, as uravu
will perform a Gaussian kernel density estimation on the samples to determine a description for the distribution.
This is most easily shown in action, imagine we have some experimental data, that is distributed with a skew normal distribution, rather than the typical normal distribution. So the values for a particular -value take the (peculiar) shape,
[1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import skewnorm
from uravu.distribution import Distribution
from uravu.relationship import Relationship
from uravu.utils import straight_line
from uravu import plotting
[2]:
np.random.seed(2)
[3]:
y = skewnorm(10, 0, 0.1)
plt.hist(y.rvs(size=5000), bins=25)
plt.show()

Let’s build some synthetic data, collected by sampling some skew normal distribution across a series of values of .
[4]:
x = np.linspace(1, 100, 10)
Y = []
for i in x:
Y.append(Distribution(skewnorm.rvs(10, i*3.14, i*0.5, size=5000)+(1*np.random.randn())))
Note that the sample, in this case a series of random values from the distribution, are passed to the uravu.distribution.Distribution
object and stored as a list of Distribution
objects.
This list
is passed to the Relationship
class as shown below (note the ordinate_error
keyword argument is no longer used as we are describing the distribution directly in the Y
variable),
[5]:
r = Relationship(straight_line, x, Y, bounds=((-10, 10), (-10, 10)))
r.max_likelihood('diff_evo')
r.variable_medians
[5]:
array([ 3.37733187, -0.02284246])
[6]:
plotting.plot_relationship(r)
plt.show()

It is then possible to use the standard sampling methods to investigate the distribution of the model parameters, here the model is a simple straight line relationship.
[7]:
r.mcmc()
100%|██████████| 1000/1000 [01:04<00:00, 15.45it/s]
Above Markov chain Monte Carlo is used to sample the distribution of the gradient and intercept of the straight line, given the uncertainties in the ordinate values (from the distributions).
These distributions can be visualised with the plot_distribution
from the uravu.plotting
library.
[8]:
plotting.plot_distribution(r.variables[0])
plt.show()

[9]:
plotting.plot_distribution(r.variables[1])
plt.show()

We can also see how these distributions affect the agreement with the data by plotting the relationship.
[10]:
plotting.plot_relationship(r)
plt.show()

Nested sampling¶
In the MCMC tutorial, Bayes theorem was shown,
where where, is the posterior probability,
is the prior probability,
is the likelihood, and
is the evidence. Normally, in the evaluation of the posterior probability, the evidence is removed as a constant of proportionality. This is due to the fact that it is a single value for a given model and dataset pair.
However, sometimes it is desirable to find the evidence for a model to a given dataset. In particular, when we want to compare the evidence for a series of models, to determine which best describes the dataset. In order to achieve this, uravu
is able to perform nested sampling to estimate the Bayesian evidence for a model given a particular dataset.
This tutorial will show how this is achieved, and show the utility of Bayesian model selection.
However, as always we must first create some synthetic data.
[1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm
from uravu.distribution import Distribution
from uravu.relationship import Relationship
from uravu import plotting, utils
[2]:
np.random.seed(2)
This synthetic data is log-normally distributed in the ordinate. Below, we use a scipy.stats.rv_continuous
object to define our ordinate data.
[6]:
x = np.linspace(10, 50, 20)
y = .3 * x ** 2 - 1.4 * x + .2
Y = []
for i in y:
Y.append(lognorm(s=2, loc=i, scale=1))
From looking at the code used to synthesize this data, it is clear that the functional form of the model is a second order polynomial. However, if this was data collected from some measurement and the physical theory suggests it should be analysed with a polynormial of unknown degree then Bayesian model selection would be the ideal tool to find the best model.
Let’s quickly write a few models to perform the n
degree polynomial analysis with.
[7]:
def one_degree(x, a, b):
return b * x + a
def two_degree(x, a, b, c):
return c * x ** 2 + b * x + a
def three_degree(x, a, b, c, d):
return d * x ** 3 + c * x ** 2 + b * x + a
def four_degree(x, a, b, c, d, e):
return e * x ** 4 + d * x ** 3 + c * x ** 2 + b * x + a
With these functions defined, we can now build the Relationship
objects for each of the functions. Using the 'diff_evo'
string to perform the global optimisation by differential evolution.
[8]:
one_modeller = Relationship(one_degree, x, Y,
bounds=((-300, 0), (-2, 20)))
one_modeller.max_likelihood('diff_evo')
[9]:
two_modeller = Relationship(two_degree, x, Y,
bounds=((-2, 2), (-2, 2), (-1, 1)))
two_modeller.max_likelihood('diff_evo')
[10]:
three_modeller = Relationship(three_degree, x, Y,
bounds=((-2, 2), (-2, 2), (-1, 1), (-0.2, 0.2)))
three_modeller.max_likelihood('diff_evo')
[11]:
four_modeller = Relationship(four_degree, x, Y,
bounds=((-2, 2), (-2, 2), (-1, 1), (-0.2, 0.2), (-0.02, 0.02)))
four_modeller.max_likelihood('diff_evo')
Having built these, lets see what the maximum likelihood variables are for each.
[12]:
print(one_modeller.variable_modes)
[-233.02181382 16.91419436]
[13]:
print(two_modeller.variable_modes)
[ 0.50360562 -1.22086729 0.29722928]
[14]:
print(three_modeller.variable_modes)
[-3.41783502e-01 -1.19673254e+00 2.97185414e-01 -8.76900188e-07]
[15]:
print(four_modeller.variable_modes)
[ 1.20603075e+00 4.35409450e-01 1.48054997e-01 4.14102794e-03
-3.65616682e-05]
It is possible to quickly visualise the relationship with the plotting
function, which shows the log-normal distribution of the ordinate data.
[16]:
plotting.plot_relationship(one_modeller)
plt.show()

Above, in the variable_modes
, we can see that the highest order terms are quite small for the larger degree of polynomial. Let’s see what effect this has on the evidence.
Note that uravu
and dynesty
calculates the natural log of the evidence, and while this tutorial show how nested sampling may be used,
uravu
and dynesty
also enable the use of dynamic nested sampling (by setting the keyword argument dynamic
to True
).
[17]:
one_modeller.nested_sampling()
3281it [00:20, 158.05it/s, +500 | bound: 5 | nc: 1 | ncall: 20604 | eff(%): 18.351 | loglstar: -inf < -110.062 < inf | logz: -115.772 +/- 0.137 | dlogz: 0.001 > 0.509]
[18]:
two_modeller.nested_sampling()
3495it [00:25, 138.14it/s, +500 | bound: 6 | nc: 1 | ncall: 21404 | eff(%): 18.665 | loglstar: -inf < -82.411 < inf | logz: -88.545 +/- 0.142 | dlogz: 0.001 > 0.509]
[19]:
three_modeller.nested_sampling()
5843it [00:37, 154.48it/s, +500 | bound: 15 | nc: 1 | ncall: 28341 | eff(%): 22.381 | loglstar: -inf < -84.331 < inf | logz: -95.193 +/- 0.200 | dlogz: 0.001 > 0.509]
[20]:
four_modeller.nested_sampling()
8725it [01:01, 140.73it/s, +500 | bound: 29 | nc: 1 | ncall: 39087 | eff(%): 23.601 | loglstar: -inf < -83.449 < inf | logz: -100.113 +/- 0.254 | dlogz: 0.001 > 0.509]
Having estimated for each
Relationship
, lets plot them as a function of the number of variables in the model.
[21]:
variables = [len(one_modeller.variables), len(two_modeller.variables),
len(three_modeller.variables), len(four_modeller.variables)]
ln_evidence = [one_modeller.ln_evidence.n, two_modeller.ln_evidence.n,
three_modeller.ln_evidence.n, four_modeller.ln_evidence.n]
ln_evidence_err = [one_modeller.ln_evidence.s, two_modeller.ln_evidence.s,
three_modeller.ln_evidence.s, four_modeller.ln_evidence.s]
[22]:
plt.errorbar(variables, ln_evidence, ln_evidence_err, marker='o', ls='')
plt.xlabel('Number of variables')
plt.ylabel(r'$\ln{Z}$')
plt.show()

We can see that the evidence reaches a maxiumum at 3 variables (the two_degree
model), this indicates that this is the most probable model for analysis of this dataset.
Finally, we can use some built in functionality of uravu
to compare different evidence values, in a value known as the Bayes factor. Let’s compare the one_degree
and two_degree
models.
[23]:
print(utils.bayes_factor(two_modeller.ln_evidence, one_modeller.ln_evidence))
54.5+/-0.4
The Bayes factor between the two and one degree models, , has a value of
. The Table in Kass and Raftery suggests that this shows “Strong” evidence for the model with more variables (the
two_degree
model).
This means that it is sensible to go ahead and use the two_degree
model in the further analysis of our data. Below the plot of the second order polynomial is shown over the data.
[24]:
plotting.plot_relationship(two_modeller)
plt.show()

[ ]:
Custom priors¶
The prior probability is a critical element of Bayes theorem. However, to keep uravu
straightforward to use, by default, a broad uniform prior probability is assigned to the Relationship
object, or if bounds are present these are used as the limits.
Of course this may be ignored and custom priors may be used (and sometimes it may be necessary that this is done). This tutorial will show how custom priors may be used with uravu
.
Let’s start, as always, by producing some synthetic data
[1]:
import numpy as np
import matplotlib.pyplot as plt
[2]:
np.random.seed(2)
[3]:
x = np.linspace(10, 50, 20)
y = .3 * x ** 2 - 1.4 * x + .2
y += y * np.random.randn(20) * 0.05
dy = 3 * x
[4]:
plt.errorbar(x, y, dy, marker='o', ls='')
plt.show()

The model for this data is a second order polynomial, below is a function that defines this. The Relationship
object is also created.
[5]:
def two_degree(x, a, b, c):
return c * x ** 2 + b * x + a
[6]:
from uravu.relationship import Relationship
[7]:
modeller = Relationship(two_degree, x, y, ordinate_error=dy)
modeller.max_likelihood('mini')
The max likelihood (which makes no consideration of the prior) is found,
[9]:
print(modeller.variable_modes)
[79.94767915 -8.90884397 0.46614102]
The default prior probabilities for these variables with uravu
are uniform in the range , where
is the current value of the variable.
However, if you wanted the prior probability to be a normal distribution, centred on the current value of the varible with a width of 1, it would be necessary to create a custom prior function. This function is shown below.
[10]:
from scipy.stats import norm
def custom_prior():
priors = []
for var in modeller.variable_medians:
priors.append(norm(loc=var, scale=1))
return priors
Note that the function returns a list of ‘frozen’ scipy
RV objects that describe the shape of the priors.
To make use of these priors, they must be passed to the mcmc
or nested_sampling
functions as the prior_function
keyword argument.
[11]:
modeller.mcmc(prior_function=custom_prior)
100%|██████████| 1000/1000 [01:12<00:00, 13.75it/s]
[12]:
modeller.nested_sampling(prior_function=custom_prior)
2280it [00:34, 65.70it/s, +500 | bound: 2 | nc: 1 | ncall: 18751 | eff(%): 14.826 | loglstar: -inf < -110.472 < inf | logz: -114.157 +/- 0.094 | dlogz: 0.001 > 0.509]
[13]:
print(modeller.ln_evidence)
-114.16+/-0.09
Any scipy
statistical function that has a logpdf
class method may be used in the definition of priors.
FAQ¶
How can I cite
uravu
in my publication?The
uravu
package has been published in the Journal of Open Source Software, so the following reference should be included whereuravu
is used in a publication: “A. R. McCluskey & T. Snow, (2020). uravu: Making Bayesian modelling easy(er). Journal of Open Source Software, 5(50), 2214, DOI: 10.21105/joss.02214”How do I use the
uravu.utils.bayes_factor()
function to compare different models?The
uravu.utils.bayes_factor()
function uses the the second Table on page 777 of Kass and Raftery’s paper on Bayesian model comparison. The function will return a value for, where the model 1 is the first argument in the function and model 0 is the second. The table mentioned above is reproduced below.
Interpretation 0 to 2 1 to 3 Not worth a bare mention 2 to 6 3 to 20 Positive 6 to 10 20 to 150 Strong > 10 > 150 Very Strong So if
uravu.utils.bayes_factor(model1, model2)
returns4.3
, there is “Positive” evidence formodel1
overmodel2
.
API¶
uravu.axis¶
The Axis
class controls the organisation of axes in the uravu code, including the evaluation of an axis-level multidimensional kernal density estimate.
-
class
uravu.axis.
Axis
(values: Union[list, uravu.distribution.Distribution])[source]¶ Bases:
object
The Axes class is a flexible storage option for both numerical (
numpy.ndarray
) and distribution (uravu.distribution.Distribution
) arrays.Parameters: values – Array of values. -
ci
(ci_points: List[float] = [2.5, 97.5]) → numpy.ndarray[source]¶ Returns: Uncertainties from 95 % confidence intervals for axis.
-
classmethod
from_dict
(my_dict: dict) → uravu.axis.Axis[source]¶ Class method to produce from a dictionary.
Parameters: my_dict – Input dictionary Returns: Axis from dictionary.
-
kde
¶ Multi-dimensional kernel density estimation for the axis.
Type: return
-
logpdf
(x: numpy.ndarray) → numpy.ndarray[source]¶ ” Get the natural log probability density function for all of the distributions in the axes.
Parameters: x – Values to return natural log probability of. Returns: Natural log probability.
-
mode
¶ Values that maximise the probability for axis.
Type: return
-
n
¶ Medians for axis.
Type: return
-
pdf
(x: numpy.ndarray) → numpy.ndarray[source]¶ ” Get the probability density function for all of the distributions in the axes.
Parameters: x – Values to return probability of. Returns: Probability.
-
s
¶ Standard deviation of axis values.
Type: return
-
shape
¶ Shape of axis.
Type: return
-
size
¶ Size of axis.
Type: return
-
values
¶ Array of values.
Type: return
-
uravu.distribution¶
The storage and manipulation of probability distributions is fundamental to the operation of uravu
and Bayesian inference.
The Distribution
class oversees these operations.
-
class
uravu.distribution.
Distribution
(samples: Union[List[Union[float, int]], numpy.ndarray], name: str = 'Distribution', random_state: numpy.random._generator.Generator = None)[source]¶ Bases:
object
In addition to storage of the probability distribution, this class allows for some basic analysis, such as determination of normality.
-
samples
¶ Samples in the distribution.
Type: array_like
-
name
¶ Distribution name.
Type: str
-
normal
¶ Are the samples normally distributed?
Type: bool
-
kde
¶ Kernel density approximation for the distribution.
Type: scipy.stats.kde.gaussian_kde
Parameters: - samples (
array_like
) – Sample for the distribution. - name (
str
, optional) – A name to identify the distribution. Default is'Distribution'
. - ci_points (
array_like
, optional) – The two percentiles at which confidence intervals should be found. Default is[2.5, 97.5]
(a 95 % confidence interval).
-
add_samples
(samples: Union[List[Union[float, int]], numpy.ndarray])[source]¶ Add samples to the distribution.
Parameters: samples ( array_like
) – Samples to be added to the distribution.
-
check_normality
() → bool[source]¶ Uses a
scipy.stats.normaltest()
to evaluate if samples are normally distributed and updates thenormal
attribute.
-
ci
(ci_points: List[float] = [2.5, 97.5]) → numpy.ndarray[source]¶ Get the extrema of the confidence intervals of the distribution.
Parameters: ci_points – The confidence interval points to return. Returns: Distribution values at the confidence interval.
-
con_int
(ci_points: List[float] = [2.5, 97.5]) → numpy.ndarray[source]¶ Get the extrema of the confidence intervals of the distribution.
Parameters: ci_points – The confidence interval points to return. Returns: Distribution values at the confidence interval.
-
dist_max
¶ Get the value that maximises the distribution. If no
kde
has been created (for example if the distribution has fewer than 8 values) the median is returned.Returns: Most likely value.
-
classmethod
from_dict
(my_dict: dict) → uravu.distribution.Distribution[source]¶ Parameters: my_dict – Dictionary description of the distribution. Returns: Distribution object form the dictionary.
-
logpdf
(x: Union[float, List[Union[float, int]], numpy.ndarray]) → Union[float, numpy.ndarray][source]¶ Get the natural log probability density function for the distribution.
Parameters: x – Value to return natural log probability of. Returns: Natural log probability.
-
max
¶ Sample maximum.
Type: return
-
min
¶ Sample minimum.
Type: return
-
n
¶ Median value.
Type: return
-
negative_pdf
(x: Union[float, List[Union[float, int]], numpy.ndarray]) → Union[float, numpy.ndarray][source]¶ Get the negative of the probability density function for the distribution.
Parameters: x – Value to return negative probability of. Returns: Negative probability.
-
pdf
(x: Union[float, List[Union[float, int]], numpy.ndarray]) → Union[float, numpy.ndarray][source]¶ Get the probability density function for the distribution.
Parameters: x – Value to return probability of. Returns: Probability.
-
s
¶ Standard deviation of the distribution. For a non-normal distribution, this will return
None
.Type: return
-
size
¶ Number of samples in the distribution.
Type: return
-
v
¶ Standard deviation of the distribution. For a non-normal distribution, this will return
None
.Type: return
-
uravu.optimize¶
The optimize module includes the functionality necessary for maximum likelihood determination.
Furthermore, the natural log likelihood function used in the mcmc()
and nested_sampling()
methods may be found here.
-
uravu.optimize.
ln_likelihood
(variables: numpy.ndarray, function: Callable, abscissa: numpy.ndarray, ordinate: numpy.ndarray) → float[source]¶ Calculate the natural logarithm of the likelihood given a set of variables, when there is no uncertainty in the abscissa.
Parameters: - variables – Variables for the function evaluation.
- function – The function to be evaluated.
- abscissa – The abscissa values.
- ordinate – The ordinate values.
Returns: Natural log-likelihood between model and data.
-
uravu.optimize.
max_ln_likelihood
(relationship: uravu.relationship.Relationship, method: str, x0: numpy.ndarray = None, **kwargs) → numpy.ndarray[source]¶ Determine the variable values which maximize the likelihood for the given relationship. For keyword arguments see the
scipy.optimize.minimize()
documentation.Parameters: - relationship – The relationship for which variables should be found.
- method – Method for optimisation to be used.
- x0 – Initial guesses for the variable values. Default to the current
variables
values which are initialized as all1
.
Returns: Optimized variables for the relationship.
-
uravu.optimize.
negative_lnl
(variables: numpy.ndarray, function: Callable, abscissa: numpy.ndarray, ordinate: numpy.ndarray) → float[source]¶ Calculate the negative natural logarithm of the likelihood given a set of variables, when there is no uncertainty in the abscissa.
Parameters: - variables – Variables for the function evaluation.
- function – The function to be evaluated.
- abscissa – The abscissa values.
- ordinate – The ordinate values.
Returns: Negative natural log-likelihood between model and data.
uravu.plotting¶
These are plotting functions that take either Relationship
or Distribution
class objects.
The aim is to produce publication quality plots. However, we recognise that taste exists, and ours may be different from yours. The colorscheme in this work was chosen to be colorblind friendly.
-
uravu.plotting.
plot_corner
(relationship, figsize=(8, 8))[source]¶ Plot the
corner
(named for the Python package) plot between the relationships variables.Parameters: - relationship (
uravu.relationship.Relationship
) – The relationship containing the distributions to be plotted. - fig_size (
tuple
, optional) – horizontal and veritcal size for figure (in inches). Default is(10, 6)
.
Returns: - Containing:
matplotlib.figure.Figure
: The figure with new plots.matplotlib.axes.Axes
: The axes with new plots.
Return type: tuple
- relationship (
-
uravu.plotting.
plot_distribution
(distro, axes=None, figsize=(5, 3))[source]¶ Plot the probability density function for a distribution.
Parameters: - ( (distro) – py:class`uravu.distriobution.Distribution`): The distribution to be plotted.
- axes (
matplotlib.axes.Axes
, optional) – Axes to which the plot should be added. IfNone
given new axes will be created. Default isNone
. - fig_size (
tuple
) – Horizontal and veritcal size for figure (in inches). Default is(10, 6)
.
Returns: The axes with new plots.
Return type: (
matplotlib.axes.Axes
)
-
uravu.plotting.
plot_relationship
(relationship, axes=None, figsize=(10, 6))[source]¶ Plot the relationship. Additional plots will be included on this if posterior sampling has been used to find distributions.
Parameters: - relationship (
uravu.relationship.Relationship
) – The relationship to be plotted. - axes (
matplotlib.axes.Axes
, optional) – Axes to which the plot should be added. IfNone
given new axes will be created. Default isNone
. - fig_size (
tuple
, optional) – horizontal and veritcal size for figure (in inches). Default is(10, 6)
.
Returns: The axes with new plots.
Return type: (
matplotlib.axes.Axes
)- relationship (
uravu.relationship¶
The Relationship
class is a foundational component of the uravu
package, and acts as the main API for use of the package.
This class enables the storage of the relationship between the model and the data.
Objects of this class offer easy methods to perform maximum likelihood evaluation, Markiv chain Monte Carlo (MCMC) for posterior probabiltiy determination and Bayesian evidence estimation by nested sampling.
See the tutorials online for more guidence of how to use this package.
-
class
uravu.relationship.
Relationship
(function: Callable, abscissa: Union[List[float], numpy.ndarray], ordinate: Union[List[Union[uravu.distribution.Distribution, scipy.stats._distn_infrastructure.rv_frozen, float]], numpy.ndarray], bounds: Tuple[Tuple[float, float]] = None, ordinate_error: Union[List[float], numpy.ndarray] = None)[source]¶ Bases:
object
The
Relationship
class is the base of theuravu
package, enabling the use of Bayesian inference for the assessment of a model’s ability to describe some data.Attributes: :param function (
callable
): The function that is modelled. :param abscissa (array_like
): The abscissa data that the modelling should be performed on. :param ordinate (list
oruravu.distribution.Distribution
orarray_like
): The ordinate data against with the model should be compared. This should be anlist
oruravu.distribution.Distribution
unless aordinate_error
is given. :param variables (list
ofuravu.distribution.Distribution
): Variables in thefunction
.bounds (tuple
): The minimum and maximum values for each parameters. ln_evidence (uncertainties.core.Variable
): The natural-log of the Bayesian evidence for the model to the given data. mcmc_results (dict
): The results fromemcee.EnsembleSampler.run_mcmc()
sampling. nested_sampling_results (dict
): The results fromdynesty.NestedSampler.run_nested()
nested sampling.Parameters: - function – The functional relationship to be modelled.
- abscissa – The abscissa data. If multi-dimensional, the array is expected to have the shape
(N, d)
, whereN
is the number of data points andd
is the dimensionality. - ordinate – The ordinate data. This should have a shape
(N,)
. - bounds – The minimum and maximum values for each parameters. Defaults to
None
. - ordinate_error – The uncertainty in the ordinate, this should be the standard error in the measurement. Only used if
ordinate
is not alist
oruravu.distribution.Distribution
. Defaults toNone
.
-
bayesian_information_criteria
()[source]¶ Calculate the Bayesian information criteria for the relationship.
Returns: Bayesian information criteria. Return type: float
-
flatchain
¶ Sampling flatchain.
Type: return
-
get_sample
(i)[source]¶ Return the variable values for a given sample.
Parameters: i ( int
) – The sample index.Returns: Variable values at given index. Return type: list
offloat
-
len_parameters
¶ Determine the number of variables in the assessment function.
Returns: py:attr`int`: Number of variables.
-
max_likelihood
(method, x0=None, **kwargs)[source]¶ Determine values for the variables which maximise the likelihood for the
Relationship
. For keyword arguments see thescipy.optimize.minimize()
documentation.Parameters: x0 ( array_like
) – Initial guess values for the parameters.
-
mcmc
(prior_function=None, walkers=50, n_samples=500, n_burn=500, progress=True)[source]¶ Perform MCMC to get the posterior probability distributions for the variables of the relationship. Note, running this method will populate the
variables
attribute withDistribution
objects. Once run, a result dictionary containing thedistributions
,chain
, andsamples
fromemcee
is piped into the class variablemcmc_results
.Parameters: - prior_function (
callable
, optional) – The function to populated some prior distributions. Default is the broad uniform priors inprior()
. - walkers (
int
, optional) – Number of MCMC walkers. Default is50
. - n_samples (
int
, optional) – Number of sample points. Default is :py:attr:500`. - n_burn (
int
, optional) – Number of burn in samples. Default is500
. - progress (
bool
, optional) – Showtqdm
progress for sampling. Default isTrue
.
- prior_function (
-
mcmc_done
¶ Has MCMC been performed? Determined based on the type of the variables.
Returns: Has MCMC been performed? Return type: bool
-
nested_sampling
(prior_function=None, progress=True, dynamic=False, **kwargs)[source]¶ Perform nested sampling, or dynamic nested sampling, to determine the Bayesian natural-log evidence. For keyword arguments see the
dynesty.NestedSampler.run_nested()
documentation. Once run, the result dictionary produced bydynesty.NestedSampler.run_nested()
is piped into the class variablenested_sampling_results
.Parameters: - prior_function (
callable
, optional) – The function to populate some prior distributions. Default is the broad uniform priors inprior()
. - progress (
bool
, optional) – Showtqdm
progress for sampling. Default isTrue
.
- prior_function (
-
nested_sampling_done
¶ Has nested sampling been performed? Determined based on if the ln_evidence has a value.
Returns: Has nested sampling been performed? Return type: bool
-
prior
()[source]¶ Standard priors for the relationship. These priors are broad, uninformative, for normal variables running the range
[x - x * 10, x + x * 10)
(wherex
is the variable value).Returns: scipy.stats
functions describing the priors.Return type: list
ofscipy.stats.rv_continuous
-
variable_medians
¶ The median values for each of the variables.
Returns: Variable medians. Return type: array_like
-
variable_modes
¶ The mode values for each of the variables.
Returns: Variable modes. Return type: array_like
-
x
¶ Abscissa values.
Returns: Abscissa values. Return type: array_like
-
y
¶ Ordinate values.
Returns: Ordinate values. Return type: array_like
uravu.sampling¶
The sampling
module implements the use of a generalised MCMC (using emcee
) and nested sampling (using dynesty
) for the Relationship
objects.
-
uravu.sampling.
ln_probability
(variables, function, abscissa, ordinate, priors)[source]¶ Determine the natural log probability for a given set of variables, by summing the prior and likelihood.
Parameters: - variables (
array_like
) – Variables for the function evaluation. - function (
callable
) – The function to be evaluated. - abscissa (
array_like
) – The abscissa values. - ordinate (
array_like
) – The ordinate values. - unaccounted_uncertainty (
bool
) – Should an unaccounted uncertainty parameter be considered. - prior_function (
callable
, optional) – The function to populated some prior distributions. Default isprior()
.
Returns: Negative natural log-probability between model and data, considering priors.
Return type: float
- variables (
-
uravu.sampling.
mcmc
(relationship, prior_function=None, walkers=50, n_samples=500, n_burn=500, progress=True)[source]¶ Perform MCMC to get the probability distributions for the variables of the relationship.
Parameters: - relationship (
uravu.relationship.Relationship
) – The relationship to determine the posteriors of. - prior_function (
callable
, optional) – The function to populated some prior distributions. Default isuravu.relationship.Relationship.prior()
. - walkers (
int
, optional) – Number of MCMC walkers. Default is50
. - n_samples (
int
, optional) – Number of sample points. Default is500
. - n_burn (
int
, optional) – Number of burn in samples. Default is500
. - progress (
bool
, optional) – Show tqdm progress for sampling. Default isTrue
.
Returns: Dictionary with the distributions as a list (
'distributions'
), the chain ('chain'
) and the samples as anarray_like
('samples'
).Return type: dict
- relationship (
-
uravu.sampling.
nested_prior
(array, priors)[source]¶ Convert to dynesty prior style from at used within
uravu
.Parameters: - array (
array_like
) – An array of random uniform numbers (0, 1]. The shape of which is M x N, where M is the number of parameters and N is the number of walkers. - prior_function (
callable
, optional) – The function to populated some prior distributions. Default isuravu.relationship.Relationship.prior()
.
Returns: An array of random uniform numbers distributed in agreement with the priors.
Return type: array_like
- array (
-
uravu.sampling.
nested_sampling
(relationship, prior_function=None, progress=True, dynamic=False, **kwargs)[source]¶ Perform the nested sampling, or dynamic nested sampling, in order to determine the Bayesian natural log evidence. See the
dynesty.NestedSampler.run_nested()
documentation.Parameters: - relationship (
Relationship
) – The relationship to estimate the evidence for. - prior_function (
callable
, optional) – The function to populated some prior distributions. Default is the broad uniform priors inprior()
. - progress (
bool
, optional) – Showtqdm
progress for sampling. Default isTrue
. - dynamic (
bool
, optional) – Should dynamic nested sampling be used?. Default isFalse
.
Returns: The results from
dynesty.NestedSampler.run_nested()
.Return type: dict
- relationship (
uravu.utils¶
A few additional utility functions to improve the usability of uravu
.
-
uravu.utils.
bayes_factor
(model_1, model_2)[source]¶ Use the Bayes factor to compare two models. Using Table from Kass and Raftery to compare.
Parameters: - model_1 (
uncertainties.core.Variable
orfloat
) – ln evidence for model 1. - model_2 (
uncertainties.core.Variable
orfloat
) – ln evidence for model 2.
Returns: 2ln(B), where B is the Bayes Factor between the two models.
Return type: uncertainties.core.Variable
orfloat
- model_1 (
-
uravu.utils.
correlation_matrix
(relationship)[source]¶ Evaluate the Pearsons correlation coefficient matrix for the variables in a given relationship.
Parameters: relationship ( uravu.relationship.Relationship
) – The relationship to determine the correlation matrix for.Returns: The correlation matrix for the relationships variables. Return type: array_like
-
uravu.utils.
latex
(distribution)[source]¶ Get some LaTeX math-type code that describes the mean and confidence intervals of the distribution.
Parameters: distribution ( uravu.distribution.Distribution
) – The distribution to return LaTeX for.Returns: A LaTeX formatted string for the mean and confidence intervals of the distribution. Return type: ( str
)
-
uravu.utils.
straight_line
(abscissa, gradient, intercept)[source]¶ A one dimensional straight line function.
Parameters: - abscissa (
array_like
) – The abscissa data. - gradient (
float
) – The slope of the line. - intercept (
float
) – The y-intercept of the line.
Returns: The resulting ordinate.
Return type: array_like
- abscissa (