Simulation and Fitting

PyPWA defines both the monte carlo simulation method as well as the several likelihoods. To use these, the cost function or amplitude needs to be defined in a support object.

  • Defining an Amplitude describes how to define a function for use with the simulation and likelihoods.

  • Simulating describes the Monte Carlo Simulation methods.

  • Likelihoods describes the built in likelihoods. These likelihoods also automatically distribute the fitting function across several processors.

  • Fitting describes the built in minuit wrapper, as well as how to use the Likelihood objects with other optimizers.

Defining an Amplitude

Amplitudes or cost functions can be defined for using either an Object Oriented approach, or a Functional programming approach. If using pure functions for the function, wrap the calculation function and optional setup function in PyPWA.FunctionalAmplitude, if using the OOP approach, extend the PyPWA.NestedFunction abstract class when defining the amplitude.

It is assumed by both the Likelihoods and Monte Carlo that the calculate functions of either methods will return a standard numpy array of final values.

class PyPWA.NestedFunction

Interface for Amplitudes

These objects are used for calculating the users’ amplitude. They’re expected to be initialized by the time they are sent to the kernel, and will be deep-copied for each process. The setup will be called first to initialize data and anything else that might need to be done, and then the calculate function will be called for each call to the likelihood.

Set USE_MP to false to execute on the main thread only, this is best for when using packages like numexpr that handle multi-threading themselves.

Set USE_TORCH to calculate the likelihood using PyTorch. Assumes that all data returned from the NestedFunction will be in a Tensor.

Set USE_THREADS to calculate the likelihood using threads. This is best if the likelihood is dependent on waiting for responses from hardware or network devices; or if you are working with data that can not be forked.

Set USE_GPU to calculate the likelihood using GPU. If this is set to true, then USE_MP will be set to false, and USE_THREADS and USE_TORCH will be set to True internally. This will raise a RuntimeError if the GPU is not available.

Set DEBUG to True to disable all multiprocessing and threads, this will prevent errors from being buried in tracebacks.

Warning

If you enable USE_MP and USE_THREADS, then a RuntimeError will be raised, since Multiprocessing and threads are not compatible.

See also

FunctionAmplitude

For using the old amplitudes with PyPWA 3

abstract calculate(parameters)

Calculates the amplitude

Parameters

parameters (Dict[str, float]) – The parameters sent to the process by the optimizer

Returns

The array of results for the amplitude, these will be summed by the likelihood. A tensor is expected when USE_TORCH is true

Return type

npy.ndarray, Series, or Tensor

abstract setup(data)

Sets up the amplitude for use.

This is where the data that will be used for this specific process will be passed to.

Parameters

data (DataFrame or npy.ndarray) – The data that will be used for calculation

class PyPWA.FunctionAmplitude(setup, processing)

Wrapper for Legacy PyPWA 2.X amplitudes

The old amplitudes were two simple functions that would be passed to the kernels, a single setup function and a calculate function. Now the amplitudes are objects. This wraps the functions and presents them as the new Amplitude object

Parameters
  • setup (Callable[[], ] function with no arguments or returns) – The old setup function that would be used

  • processing (Callable[[pd.DataFrame, Dict[str, float]], float]) – The old processing function

See also

NestedFunction

For defining new functions

calculate(parameters)

Calculates the amplitude

Parameters

parameters (Dict[str, float]) – The parameters sent to the process by the optimizer

Returns

The array of results for the amplitude, these will be summed by the likelihood. A tensor is expected when USE_TORCH is true

Return type

npy.ndarray, Series, or Tensor

setup(data)

Sets up the amplitude for use.

This is where the data that will be used for this specific process will be passed to.

Parameters

data (DataFrame or npy.ndarray) – The data that will be used for calculation

Simulating

There are two choices when using the Monte Carlo Simulation method defined in PyPWA: Simulation in one pass producing the rejection list, or simulation in two passes to produce the intensities and finally the rejection list. Both methods will take advantage of SMP where available.

  • If doing a single pass, just use the PyPWA.monte_carlo_simulation function. This will take the fitting function defined from Defining an Amplitude along with the data, and return a single rejection list.

  • If doing two passes for more control over when the intensities and rejection list, use both PyPWA.simulate.process_user_function to calculate the intensity and local max value, and PyPWA.simulate.make_rejection_list to take the global max value and local intensity to produce the local rejection list.

PyPWA.monte_carlo_simulation(amplitude, data, params=None, processes=2)

Produces the rejection list This takes a user defined intensity object along with it’s associated data, and generates a pass/fail array to be used to mask any dataset of the same length as data.

Parameters
  • amplitude (Amplitude derived from AbstractAmplitude) – A user defined amplitude or pre-made PyPWA amplitude that you wish to carve your data with.

  • data (Structured Array, DataFrame, or BaseFolder from Project) – This is the data you want to be passed to the setup function of your amplitude. If you provide a Structured Array or DataFrame the entire calculation will occur in memory with the selected number of processes. If you provide a Project BaseFolder the calculation will rely entirely on the Amplitude.

  • params (Dict[str, float], optional) – An optional dictionary of parameters that will be passed to the AbstractAmplitude’s calculate function.

  • processes (int, optional) – Selects the number of processes to run with, defaults to the number of processes detected through multiprocessing

Returns

A masking array that can be used with any DataFrame or Structured Array to cut the events to the generated shape

Return type

boolean npy.ndarray

Raises

ValueError – If the data is not understood. If you received this, check your data to ensure its a supported type

Examples

How to cut your data with results from monte_carlo_simulation

>>> rejection = monte_carlo_simulation(Amplitude(), data)
>>> carved = data[rejection]
PyPWA.simulate.process_user_function(amplitude, data, params=None, processes=2)

Produces an array of values for the calculated function.

Parameters
  • amplitude (Amplitude derived from AbstractAmplitude) – A user defined amplitude or pre-made PyPWA amplitude that you wish to carve your data with.

  • data (Structured Array, DataFrame, or BaseFolder from Project) – This is the data you want to be passed to the setup function of your amplitude. If you provide a Structured Array or DataFrame the entire calculation will occur in memory with the selected number of processes. If you provide a Project BaseFolder the calculation will rely entirely on the Amplitude.

  • params (Dict[str, float], optional) – An optional dictionary of parameters that will be passed to the AbstractAmplitude’s calculate function.

  • processes (int, optional) – Selects the number of processes to run with, defaults to the number of processes detected through multiprocessing

Returns

The final values computed from the user’s function and the max value computed for that dataset.

Return type

(float npy.ndarray, float)

Raises

ValueError – If the data is not understood. If you received this, check your data to ensure its a supported type

PyPWA.simulate.make_rejection_list(intensities, max_value)

Produces the rejection list from pre-calculated function values. Uses the values returned by process_user_function.

Parameters
  • intensities (Numpy array or Pandas Series) – This is a single dimensional array containing the final values for the user’s function.

  • max_value (List, Tuple, Set, nd.ndarray, or float) – The max value for the entire dataset, or list of all the max values from each dataset. Only the largest value from the list will be used.

Returns

A masking array that can be used with any DataFrame or Structured Array to cut the events to the generated shape

Return type

boolean npy.ndarray

Likelihoods

PyPWA supports 3 unique likelihood types for use with either the Minuit wrapper or any optimizer that expects a function. All likelihoods have built in support for SMP when they’re called, and require to be closed when no longer needed.

  • PyPWA.LogLikelihood defines the likelihood, and works with either the standard log likelihood, the binned log likelihood, or the extended log likelihood.

  • PyPWA.ChiSquared defines the ChiSquared method, supporting both the binned and standard ChiSquare.

  • PyPWA.EmptyLikelihood does no post operation on the final values except sum the array and return the final sum. This allows for defining unique likelihoods that have not already been defined, fitting functions that do not require a likelihood, or using the builtin multi processing without the weight of a standard likelihood.

class PyPWA.LogLikelihood(amplitude, data, monte_carlo=None, binned=None, quality_factor=None, generated_length=1, is_minimizer=True, num_of_processes=2)

Computes the log likelihood with a given amplitude.

To use the standard log likelihood, you only need to provide data, If binned and quality factor are not provided, they will default to 1. If you wish to use the Extended Log Likelihood, you must provide monte_carlo data. The generated length will be set to the length of the monte_carlo, unless a generated length is provided.

Parameters
  • amplitude (AbstractAmplitude) – Either an user defined amplitude, or an amplitude from PyPWA

  • data (DataFrame or npy.ndarray) – Data that will be passed directly to the amplitude

  • monte_carlo (DataFrame or npy.ndarray, optional) – Data that will be passed to the monte_carlo

  • binned (Series or npy.ndarray, optional) – Array with bin values. This won’t be used if monte_carlo is provided.

  • quality_factor (Series or npy.ndarray, optional) – Array with quality factor values

  • generated_length (int, optional) – The generated length of values for use with the monte_carlo, this value will default to the length of monte_carlo

  • is_minimizer (bool, optional) – Specify if the final value of the likelihood should be multiplied by -1. Defaults to True.

  • num_of_processes (int, optional) – How many processes to be used to calculate the amplitude. Defaults to the number of threads available on the machine. If USE_MP is set to false or this is set to zero, no extra processes will be spawned

Notes

Standard Log-Likelihood. If not provided, \(Q_f\) and binned will be set to 1:

\[L = \sum{Q_f \cdot binned \cdot log (Amp(data))}\]

Extended Log-Likelihood. If not provided, the Q_f will be set to 1, and generated_length will be set to len(monte_carlo)

\[L = \sum{Q_f \cdot log (Amp(data))} - \ \frac{1}{generated\_length} \cdot \sum{Amp(monte\_carlo)}\]
close()

Closes the likelihood This needs to be called after you’re done with the likelihood, UNLESS, you created the likelihood using the with statement

class PyPWA.ChiSquared(amplitude, data, binned=None, event_errors=None, expected_values=None, is_minimizer=True, num_of_processes=2)

Computes the Chi-Squared Likelihood with a given amplitude.

This likelihood supports two different types of the ChiSquared, one with binned or one with expected values.

To use the binned ChiSquared, you need to provide data and binned values, to use the expected values, you need to provide data, event_errors, and expected_values.

Parameters
  • amplitude (AbstractAmplitude) – Either an user defined amplitude, or an amplitude from PyPWA

  • data (DataFrame or npy.ndarray) – The data that will be passed directly to the amplitude

  • binned (Series or npy.ndarray, optional) – The array of bin values, should be the same length as data

  • event_errors (Series or npy.ndarray, optional) – The array of errors, should be the same length as data

  • expected_values (Series or npy.ndarray, optional) – The array of expected values, should be the same length as data

  • is_minimizer (bool, optional) – Specify if the final value of the likelihood should be multiplied by -1. Defaults to True.

  • num_of_processes (int, optional) – How many processes to be used to calculate the amplitude. Defaults to the number of threads available on the machine. If USE_MP is set to false or this is set to zero, no extra processes will be spawned

Raises

ValueError – If binned values or expected/errors are not provided

Notes

Binned ChiSquare:

\[\chi^{2} = \frac{(Amp(data) - binned)^{2}}{binned}\]

Expected values:

\[\chi^{2} = \frac{(Amp(data) - expected)^{2}}{errors}\]
close()

Closes the likelihood This needs to be called after you’re done with the likelihood, _Unless_, you created the likelihood using the with statement

class PyPWA.EmptyLikelihood(amplitude, data, num_of_processes=2)

Provides the multiprocessing benefits of a standard likelihood without a defined likelihood.

This allows you to include a likelihood into your amplitude or to run your amplitude without a likelihood entirely.

amplitude

Either an user defined amplitude, or an amplitude from PyPWA

Type

AbstractAmplitude

data

The data that will be passed directly to the amplitude

Type

DataFrame or npy.ndarray

num_of_processes

How many processes to be used to calculate the amplitude. Defaults to the number of threads available on the machine. If USE_MP is set to false or this is set to zero, no extra processes will be spawned

Type

int, optional

close()

Closes the likelihood This needs to be called after you’re done with the likelihood, UNLESS, you created the likelihood using the with statement

Fitting

PyPWA supplies a single wrapper around iMinuit’s module. This is a convenience function to make working with Minuit’s parameters easier. However, if wanting to use a different fitting function, like Scikit or Scipy, the likelihoods should work natively with them.

Most optimizers built in Python assume the data is some sort of global variable, and the function passed to them is just accepting parameters to fit against. The Likelihoods take advantage of this by wrapping the data and the defined functions a wrapper that attempts to scale the function to several processors, while providing function-like capabilities by taking advantage of Python’s builtin __call__ magic function.

This should allow the likelihoods to work with any optimizer, as long as they’re expecting a function or callable object, and as long as the parameters they pass are pickle-able.

PyPWA.minuit(settings, likelihood)

Optimization using iminuit

Parameters
  • settings (Dict[str, Any]) – The settings to be passed to iminuit. Look into the documentation for iminuit for specifics

  • likelihood (Likelihood object from likelihoods or single function) –

Returns

The minuit object after the fit has been completed.

Return type

iminuit.Minuit

Note

See Iminuit’s documentation for more imformation, as it should explain the various options that can be passed to iminuit, and how to use the resulting object after a fit has been completed.