Simulation and Fitting¶
PyPWA defines both the monte carlo simulation method as well as the several likelihoods. To use these, the cost function or amplitude needs to be defined in a support object.
Defining an Amplitude describes how to define a function for use with the simulation and likelihoods.
Simulating describes the Monte Carlo Simulation methods.
Likelihoods describes the built in likelihoods. These likelihoods also automatically distribute the fitting function across several processors.
Fitting describes the built in minuit wrapper, as well as how to use the Likelihood objects with other optimizers.
Defining an Amplitude¶
Amplitudes or cost functions can be defined for using either an Object Oriented approach, or a Functional programming approach. If using pure functions for the function, wrap the calculation function and optional setup function in PyPWA.FunctionalAmplitude, if using the OOP approach, extend the PyPWA.NestedFunction abstract class when defining the amplitude.
It is assumed by both the Likelihoods and Monte Carlo that the calculate functions of either methods will return a standard numpy array of final values.
- class PyPWA.NestedFunction¶
Interface for Amplitudes
These objects are used for calculating the users’ amplitude. They’re expected to be initialized by the time they are sent to the kernel, and will be deep-copied for each process. The setup will be called first to initialize data and anything else that might need to be done, and then the calculate function will be called for each call to the likelihood.
Set USE_MP to false to execute on the main thread only, this is best for when using packages like numexpr that handle multi-threading themselves.
Set USE_TORCH to calculate the likelihood using PyTorch. Assumes that all data returned from the NestedFunction will be in a Tensor.
Set USE_THREADS to calculate the likelihood using threads. This is best if the likelihood is dependent on waiting for responses from hardware or network devices; or if you are working with data that can not be forked.
Set USE_GPU to calculate the likelihood using GPU. If this is set to true, then USE_MP will be set to false, and USE_THREADS and USE_TORCH will be set to True internally. This will raise a RuntimeError if the GPU is not available.
Set DEBUG to True to disable all multiprocessing and threads, this will prevent errors from being buried in tracebacks.
Warning
If you enable USE_MP and USE_THREADS, then a RuntimeError will be raised, since Multiprocessing and threads are not compatible.
See also
FunctionAmplitude
For using the old amplitudes with PyPWA 3
- abstract calculate(parameters)¶
Calculates the amplitude
- Parameters
parameters (Dict[str, float]) – The parameters sent to the process by the optimizer
- Returns
The array of results for the amplitude, these will be summed by the likelihood. A tensor is expected when USE_TORCH is true
- Return type
npy.ndarray, Series, or Tensor
- abstract setup(data)¶
Sets up the amplitude for use.
This is where the data that will be used for this specific process will be passed to.
- Parameters
data (DataFrame or npy.ndarray) – The data that will be used for calculation
- class PyPWA.FunctionAmplitude(setup, processing)¶
Wrapper for Legacy PyPWA 2.X amplitudes
The old amplitudes were two simple functions that would be passed to the kernels, a single setup function and a calculate function. Now the amplitudes are objects. This wraps the functions and presents them as the new Amplitude object
- Parameters
setup (Callable[[], ] function with no arguments or returns) – The old setup function that would be used
processing (Callable[[pd.DataFrame, Dict[str, float]], float]) – The old processing function
See also
NestedFunction
For defining new functions
- calculate(parameters)¶
Calculates the amplitude
- Parameters
parameters (Dict[str, float]) – The parameters sent to the process by the optimizer
- Returns
The array of results for the amplitude, these will be summed by the likelihood. A tensor is expected when USE_TORCH is true
- Return type
npy.ndarray, Series, or Tensor
- setup(data)¶
Sets up the amplitude for use.
This is where the data that will be used for this specific process will be passed to.
- Parameters
data (DataFrame or npy.ndarray) – The data that will be used for calculation
Simulating¶
There are two choices when using the Monte Carlo Simulation method defined in PyPWA: Simulation in one pass producing the rejection list, or simulation in two passes to produce the intensities and finally the rejection list. Both methods will take advantage of SMP where available.
If doing a single pass, just use the PyPWA.monte_carlo_simulation function. This will take the fitting function defined from Defining an Amplitude along with the data, and return a single rejection list.
If doing two passes for more control over when the intensities and rejection list, use both PyPWA.simulate.process_user_function to calculate the intensity and local max value, and PyPWA.simulate.make_rejection_list to take the global max value and local intensity to produce the local rejection list.
- PyPWA.monte_carlo_simulation(amplitude, data, params=None, processes=2)¶
Produces the rejection list This takes a user defined intensity object along with it’s associated data, and generates a pass/fail array to be used to mask any dataset of the same length as data.
- Parameters
amplitude (Amplitude derived from AbstractAmplitude) – A user defined amplitude or pre-made PyPWA amplitude that you wish to carve your data with.
data (Structured Array, DataFrame, or BaseFolder from Project) – This is the data you want to be passed to the setup function of your amplitude. If you provide a Structured Array or DataFrame the entire calculation will occur in memory with the selected number of processes. If you provide a Project BaseFolder the calculation will rely entirely on the Amplitude.
params (Dict[str, float], optional) – An optional dictionary of parameters that will be passed to the AbstractAmplitude’s calculate function.
processes (int, optional) – Selects the number of processes to run with, defaults to the number of processes detected through multiprocessing
- Returns
A masking array that can be used with any DataFrame or Structured Array to cut the events to the generated shape
- Return type
boolean npy.ndarray
- Raises
ValueError – If the data is not understood. If you received this, check your data to ensure its a supported type
Examples
How to cut your data with results from monte_carlo_simulation
>>> rejection = monte_carlo_simulation(Amplitude(), data) >>> carved = data[rejection]
- PyPWA.simulate.process_user_function(amplitude, data, params=None, processes=2)¶
Produces an array of values for the calculated function.
- Parameters
amplitude (Amplitude derived from AbstractAmplitude) – A user defined amplitude or pre-made PyPWA amplitude that you wish to carve your data with.
data (Structured Array, DataFrame, or BaseFolder from Project) – This is the data you want to be passed to the setup function of your amplitude. If you provide a Structured Array or DataFrame the entire calculation will occur in memory with the selected number of processes. If you provide a Project BaseFolder the calculation will rely entirely on the Amplitude.
params (Dict[str, float], optional) – An optional dictionary of parameters that will be passed to the AbstractAmplitude’s calculate function.
processes (int, optional) – Selects the number of processes to run with, defaults to the number of processes detected through multiprocessing
- Returns
The final values computed from the user’s function and the max value computed for that dataset.
- Return type
(float npy.ndarray, float)
- Raises
ValueError – If the data is not understood. If you received this, check your data to ensure its a supported type
- PyPWA.simulate.make_rejection_list(intensities, max_value)¶
Produces the rejection list from pre-calculated function values. Uses the values returned by process_user_function.
- Parameters
intensities (Numpy array or Pandas Series) – This is a single dimensional array containing the final values for the user’s function.
max_value (List, Tuple, Set, nd.ndarray, or float) – The max value for the entire dataset, or list of all the max values from each dataset. Only the largest value from the list will be used.
- Returns
A masking array that can be used with any DataFrame or Structured Array to cut the events to the generated shape
- Return type
boolean npy.ndarray
Likelihoods¶
PyPWA supports 3 unique likelihood types for use with either the Minuit wrapper or any optimizer that expects a function. All likelihoods have built in support for SMP when they’re called, and require to be closed when no longer needed.
PyPWA.LogLikelihood defines the likelihood, and works with either the standard log likelihood, the binned log likelihood, or the extended log likelihood.
PyPWA.ChiSquared defines the ChiSquared method, supporting both the binned and standard ChiSquare.
PyPWA.EmptyLikelihood does no post operation on the final values except sum the array and return the final sum. This allows for defining unique likelihoods that have not already been defined, fitting functions that do not require a likelihood, or using the builtin multi processing without the weight of a standard likelihood.
- class PyPWA.LogLikelihood(amplitude, data, monte_carlo=None, binned=None, quality_factor=None, generated_length=1, is_minimizer=True, num_of_processes=2)¶
Computes the log likelihood with a given amplitude.
To use the standard log likelihood, you only need to provide data, If binned and quality factor are not provided, they will default to 1. If you wish to use the Extended Log Likelihood, you must provide monte_carlo data. The generated length will be set to the length of the monte_carlo, unless a generated length is provided.
- Parameters
amplitude (AbstractAmplitude) – Either an user defined amplitude, or an amplitude from PyPWA
data (DataFrame or npy.ndarray) – Data that will be passed directly to the amplitude
monte_carlo (DataFrame or npy.ndarray, optional) – Data that will be passed to the monte_carlo
binned (Series or npy.ndarray, optional) – Array with bin values. This won’t be used if monte_carlo is provided.
quality_factor (Series or npy.ndarray, optional) – Array with quality factor values
generated_length (int, optional) – The generated length of values for use with the monte_carlo, this value will default to the length of monte_carlo
is_minimizer (bool, optional) – Specify if the final value of the likelihood should be multiplied by -1. Defaults to True.
num_of_processes (int, optional) – How many processes to be used to calculate the amplitude. Defaults to the number of threads available on the machine. If USE_MP is set to false or this is set to zero, no extra processes will be spawned
Notes
Standard Log-Likelihood. If not provided, \(Q_f\) and binned will be set to 1:
\[L = \sum{Q_f \cdot binned \cdot log (Amp(data))}\]Extended Log-Likelihood. If not provided, the Q_f will be set to 1, and generated_length will be set to len(monte_carlo)
\[L = \sum{Q_f \cdot log (Amp(data))} - \ \frac{1}{generated\_length} \cdot \sum{Amp(monte\_carlo)}\]- close()¶
Closes the likelihood This needs to be called after you’re done with the likelihood, UNLESS, you created the likelihood using the with statement
- class PyPWA.ChiSquared(amplitude, data, binned=None, event_errors=None, expected_values=None, is_minimizer=True, num_of_processes=2)¶
Computes the Chi-Squared Likelihood with a given amplitude.
This likelihood supports two different types of the ChiSquared, one with binned or one with expected values.
To use the binned ChiSquared, you need to provide data and binned values, to use the expected values, you need to provide data, event_errors, and expected_values.
- Parameters
amplitude (AbstractAmplitude) – Either an user defined amplitude, or an amplitude from PyPWA
data (DataFrame or npy.ndarray) – The data that will be passed directly to the amplitude
binned (Series or npy.ndarray, optional) – The array of bin values, should be the same length as data
event_errors (Series or npy.ndarray, optional) – The array of errors, should be the same length as data
expected_values (Series or npy.ndarray, optional) – The array of expected values, should be the same length as data
is_minimizer (bool, optional) – Specify if the final value of the likelihood should be multiplied by -1. Defaults to True.
num_of_processes (int, optional) – How many processes to be used to calculate the amplitude. Defaults to the number of threads available on the machine. If USE_MP is set to false or this is set to zero, no extra processes will be spawned
- Raises
ValueError – If binned values or expected/errors are not provided
Notes
Binned ChiSquare:
\[\chi^{2} = \frac{(Amp(data) - binned)^{2}}{binned}\]Expected values:
\[\chi^{2} = \frac{(Amp(data) - expected)^{2}}{errors}\]- close()¶
Closes the likelihood This needs to be called after you’re done with the likelihood, _Unless_, you created the likelihood using the with statement
- class PyPWA.EmptyLikelihood(amplitude, data, num_of_processes=2)¶
Provides the multiprocessing benefits of a standard likelihood without a defined likelihood.
This allows you to include a likelihood into your amplitude or to run your amplitude without a likelihood entirely.
- amplitude¶
Either an user defined amplitude, or an amplitude from PyPWA
- Type
AbstractAmplitude
- data¶
The data that will be passed directly to the amplitude
- Type
DataFrame or npy.ndarray
- num_of_processes¶
How many processes to be used to calculate the amplitude. Defaults to the number of threads available on the machine. If USE_MP is set to false or this is set to zero, no extra processes will be spawned
- Type
int, optional
- close()¶
Closes the likelihood This needs to be called after you’re done with the likelihood, UNLESS, you created the likelihood using the with statement
Fitting¶
PyPWA supplies a single wrapper around iMinuit’s module. This is a convenience function to make working with Minuit’s parameters easier. However, if wanting to use a different fitting function, like Scikit or Scipy, the likelihoods should work natively with them.
Most optimizers built in Python assume the data is some sort of global variable, and the function passed to them is just accepting parameters to fit against. The Likelihoods take advantage of this by wrapping the data and the defined functions a wrapper that attempts to scale the function to several processors, while providing function-like capabilities by taking advantage of Python’s builtin __call__ magic function.
This should allow the likelihoods to work with any optimizer, as long as they’re expecting a function or callable object, and as long as the parameters they pass are pickle-able.
- PyPWA.minuit(settings, likelihood)¶
Optimization using iminuit
- Parameters
settings (Dict[str, Any]) – The settings to be passed to iminuit. Look into the documentation for iminuit for specifics
likelihood (Likelihood object from likelihoods or single function) –
- Returns
The minuit object after the fit has been completed.
- Return type
iminuit.Minuit
Note
See Iminuit’s documentation for more imformation, as it should explain the various options that can be passed to iminuit, and how to use the resulting object after a fit has been completed.