Intro to PyPWA with the 2D Gauss

The goal with this little tutorial is to walk through how those PyPWA and it’s collective features.

Note: Multiproccessing is done automatically when it’s selected. However if you have some direct C/C++ code dependency in your Function on called in your class’s __init__, you will encounter issues.

import numpy as np  # Vectorization and arrays
import pandas as pd  # A powerful data science toolkit
import numexpr as ne  # A threaded accelerator for numpy

import PyPWA as pwa
from IPython.display import display

Warning: If you try to import cupy on a system that does not have a supported NVIDIA GPU, it will fail with an ImportError.

There are 3 different supported ways to define your kernel/amplitude in PyPWA. - Using multiprocessing: Write your kernel using Numpy and include any externally compiled code the setup method. This is the default kernel, and will result in your kernel being deployed in parallel across all threads on the host system. - Using Numexpr to use hardware threads and low leve vectorization to futher accelerate Numpy. - Using CuPy to accelerate the kernel and likelihood. This is by far the fastest way to process data, but it requires a NVIDIA GPU with proper CUDA libraries.

class Gauss2dAmplitude(pwa.NestedFunction):
    This is a simple 2D Gauss, but built to use PyPWA's builtin
    multiprocessing module. For you, you don't need to worry about thread or
    process management, how to pass data between threads, or any of the other
    hassels that come with multithreading.

    Instead you just define your class while extending the NestedFunction,
    and when you pass it to the fitter or the simulator, it'll mirror your
    class, split your data, and deploy to every processing thread your
    machine has.

    def setup(self, array):
        This function is where your data is passes too. Here you can also
        load any C or Fortran external libraries that typically would not
        support being packaged in Python's object pickles.
        self.__x = array["x"]
        self.__y = array["y"]

    def calculate(self, params):
        This function recieves the parameters from the minimizer, and
        returns the values from there. All the values are returned from
        here should the just the final array. The likelihood will be
        calculated elsewhere.
        scaling = 1 / (params["A2"] * params["A4"])
        left = ((self.__x - params["A1"])**2)/(params["A2"]**2)
        right = ((self.__y - params["A3"])**2)/(params["A4"]**2)
        return scaling * np.exp(-(left + right))

        return ne.evaluate(
            "(1/(a2*a4)) * exp(-((((x-a1)**2)/(a2**2))+(((y-a3)**2)/(a4**2))))",
                "a1": params["A1"], "a2": params["A2"],
                "a3": params["A3"], "a4": params["A4"],
                "x": self.__data["x"], "y": self.__data["y"]
class NeGauss2dAmplitude(pwa.NestedFunction):
    This is the same Gauss as above, but instead of using raw numpy, it
    uses numexpr, a hypervectorized, multithreading, numerical package that
    should accelerate the calculation of your data.

    USE_MP defaults to True, but you should set it to false when using
    numexpr, or if you're using some method that takes advantage of multiple
    threads in some other manner. Setting this to false will prevent PyPWA
    from launching multiple processes.

    However, if you are chasing acceleration gains for your amplitude,
    consider running this with 2-4 threads. While numexpr does threading of
    it's own, there are still some gains from using a small number of threads
    to acclerate inefficiencies in the Python intrepretor.

    USE_MP = False

    def setup(self, array):
        self.__data = array

    def calculate(self, params):
        return ne.evaluate(
            "(1/(a2*a4)) * exp(-((((x-a1)**2)/(a2**2))+(((y-a3)**2)/(a4**2))))",
                "a1": params["A1"], "a2": params["A2"],
                "a3": params["A3"], "a4": params["A4"],
                "x": self.__data["x"], "y": self.__data["y"]
import cupy as cp  # Cupy Library, mirrors numpy
# If this causes an ImportError, then your system isn't CuPy ready,
# this might be because of missing CUDA libraries, or an
# incompatibility in the hardware.

class GPUGauss2dAmplitude(pwa.NestedFunction):
    Finally, this is the GPU version of the Gauss2D.

    Here, the individual arrays when they are passed to the setup method
    need to be passed through CUDA to have the data uploaded to the GPU.

    USE_GPU needs to be set to avoid having the multiprocessing module
    cause issues with the GPU, it also ways notifies the likelihood that
    the results are expected in CuPy array format.

    USE_GPU = True

    def setup(self, array):
        self.__x = cp.asarray(array["x"])
        self.__y = cp.asarray(array["y"])

    def calculate(self, params):
        scaling = 1 / (params["A2"] * params["A4"])
        left = ((self.__x - params["A1"])**2)/(params["A2"]**2)
        right = ((self.__y - params["A3"])**2)/(params["A4"]**2)
        return scaling * cp.exp(-(left + right))

Using caching for intermediate steps

PyPWA’s caching module supports caching intermediate steps. The advantage of using the caching module is that saving and loading values is fast. Much faster than almost any other solution, and supports almost anything you can store in a variable.

PyPWA.cache has two important functions in it, read and write. Both functions support an intermediate boolean value. When set to true, it tells the module that it’s caching a value that doesn’t have an associated text file. The benefit of this is that you can save almost anything in the cache: lists, DataFrames, dictionarys, etc. There is a chance that it won’t save the value if the data isn’t serializable into a pickle, and it may not be compatible between different versions of python, so I don’t recommend using this for data that you can’t reproduce. However, if you need to do some feature engineering, or data sanitizing, before you can use the data in whatever way you need and want to keep that data around to speed up future executions, this module will make your life a touch easier.

Below, I created a large flat DataFrame, and then binned that DataFrame into 10 bins, each with 1,000,000 events in them. Then, I saved those results into an intermediate cache that will appear in the current working directory with the name “flat_data.pickle” with Intermediate set to true.

  • returns two values, the first is a boolean that is True only if it was able to read the pickle, and the second is the parsed data, which will be None if it was unable to parse anything from the file, or the file doesn’t exist.

  • pwa.write has no returns, but does write the data out in Pickle format to the provided file.

valid_cache, binned_flat ="flat_data", True)
if not valid_cache:
    flat_data = pd.DataFrame()
    flat_data["x"] = np.random.rand(10_000_000) * 20
    flat_data["y"] = np.random.rand(10_000_000) * 20
    flat_data["binning"] = np.random.rand(10_000_000) * 20
    binned_flat = pwa.bin_with_fixed_widths(flat_data, "binning", 1_000_000)
    pwa.cache.write("flat_data", binned_flat, True)

Simulation with bins

Simulation can be ran as a whole system, you simply provide the function and data, and it’ll return the masked values, or you can run the two steps independently, with the first step returning the intensities, and the second returning the masks. When your working with a single dataset, running it as a single step make sense, however if you bin your data, then running it as two steps is better so that all bins are masked against the same max value of the intensity.

  • pwa.simulate.process_user_function takes all the same arguments as pwa.monte_carlo_simulation so it can be a drop in replacement. The difference is that this function will return the final values for the user’s function and the max value.

  • pwa.simulate.make_rejection_list takes the final values and either a single max value, or a list or array of max values, and it’ll use the largest max value. This function will return the same value as pwa.monte_carlo_simulation

Below, I iterate over the bins and produce the final values and max value for each bin and store them in their own lists.

simulation_params = {
    "A1": 10, "A2": 3,
    "A3": 10, "A4": 3

final_values = []
max_values = []
for fixed_bin in binned_flat:
    final, m = pwa.simulate.process_user_function(
        GPUGauss2dAmplitude(), fixed_bin, simulation_params

pwa.cache.write("final_values", max_values, True)

After the final values have been produced, I use pwa.simulate.make_rejection_list to reject events from each bin, and then store the new carved results in a fresh list.

rejected_bins = []
masked_final_values = []
for final_value, bin_data in zip(final_values, binned_flat):
    rejection = pwa.simulate.make_rejection_list(final_value, max_values)

pwa.cache.write("fitting_bins", rejected_bins, True)
pwa.cache.write("kept_final_values", masked_final_values, True)
for index, simulated_bin in enumerate(rejected_bins):
        f"Bin {index+1}'s length is {len(simulated_bin)}, "
        f"{(len(simulated_bin) / 1_000_000) * 100:.2f}% events were kept"
Bin 1's length is 70275, 7.03% events were kept
Bin 2's length is 70542, 7.05% events were kept
Bin 3's length is 70411, 7.04% events were kept
Bin 4's length is 70384, 7.04% events were kept
Bin 5's length is 70616, 7.06% events were kept
Bin 6's length is 70917, 7.09% events were kept
Bin 7's length is 70654, 7.07% events were kept
Bin 8's length is 70978, 7.10% events were kept
Bin 9's length is 70620, 7.06% events were kept
Bin 10's length is 70682, 7.07% events were kept

But first, why Caching has that intermediate step option.

What PyPWA’s caching module is primarily used for is caching values from a file to allow for quick parsing later. Here, you don’t set intermediate to True since you are saving the value stored inside another file. You must use the same filename for both the cache and the file that you’re writing so that it can place the cache in the caching directory, and be set to invalid when the file changes.

The benefit of this is that even when you’re using another data format that isn’t directly supported by PyPWA’s data library, you can still take advantage of PyPWA’s caching module.

is_valid, first_bin ="first_bin.csv")
if not is_valid:
    first_bin = binned_flat[0]
    first_bin.to_csv("first_bin.csv", index=False)
    # if the file exists but there is no cache, you would load the file as normal
    # but then write the cache afterwards.
    # first_bin = pandas.from_csv("fist_bin.csv")
    pwa.cache.write("first_bin.csv", first_bin)


While you can use PyPWA’s likelihoods with any minimizer, PyPWA supports Iminuit 1.X out of the box. The first thing that is done is we setup the parameters to fit against, as well as the individual names of each parameter.

Traditionally, iminuit works by reading the values from the provided function to guess what the parameters are and what to pass to the function, however since we wrap the minimized function to take advantage of GPU acceleration and multiprocessing, you must also tell iMinuit what the values are directly.

fitting_settings = {
    "errordef": .1, "pedantic" : False,
    "A1": 1, "limit_A1": [.1, None],
    "A2": 1, "limit_A2": [1, None],
    "A3": 1, "limit_A3": [.1, None],
    "A4": 1, "limit_A4": [1, None],

param_names = ["A1", "A2", "A3", "A4"]

Then below, we can simply fit those values. Even though we defined the multiprocessing module

# Even though we're using Numexpr, I do want to take advantage of both
# multiprocessing and Numexpr's low level optimizations. So by selecting
# a small number of processes with Numexpr, you still get an overall
# speedup over either Numexpr or regular multiprocessing
NeGauss2dAmplitude.USE_MP = True

cpu_final_values= []
for simulated_bin in rejected_bins:
    with pwa.LogLikelihood(
        NeGauss2dAmplitude(), simulated_bin, num_of_processes=4
    ) as likelihood:
            pwa.minuit(param_names, fitting_settings, likelihood, 1)
gpu_final_values = []
for simulated_bin in rejected_bins:
    with pwa.LogLikelihood(GPUGauss2dAmplitude(), simulated_bin) as likelihood:
            pwa.minuit(param_names, fitting_settings, likelihood, 1)

Issues with PyPWA.cache

There are some values that can not be saved in a PyPWA’s cache. Typically it’s an object from a package that takes advantage of Cython or Fortran to accelerate it’s execution. Either because the values are stored as pointers to arrays, or uses C types too deep for Python’s interpretor to analyze. A good example of this case is results from Iminuit.

As you can see below, a Runtime Warning is thrown from PyPWA’s caching module about how the data can’t be saved. However, the real error is that the tuple has values that can not be converted to a pure Python object for pickling.

pwa.cache.write("fitting_results", cpu_final_values, True)
TypeError                                 Traceback (most recent call last)
~/Projects/pypwa/src/pypwa/PyPWA/libs/file/ in write(path, data, intermediate)
    144         with"wb") as stream:
--> 145             pickle.dump(data_package, stream)
    146     except Exception:

~/.anaconda3/envs/pypwa/lib/python3.8/site-packages/iminuit/ in iminuit._libiminuit.Minuit.__reduce_cython__()

TypeError: self.cfmin,self.initial_upst,self.last_upst,self.pyfcn cannot be converted to a Python object for pickling

During handling of the above exception, another exception occurred:

RuntimeWarning                            Traceback (most recent call last)
<ipython-input-13-0d41921eeb3b> in <module>
----> 1 pwa.cache.write("fitting_results", cpu_final_values, True)

~/Projects/pypwa/src/pypwa/PyPWA/libs/file/ in write(path, data, intermediate)
    147         if cache_path.exists():
    148             cache_path.unlink()
--> 149         raise RuntimeWarning("Your data can not be saved in cache!")

RuntimeWarning: Your data can not be saved in cache!

Viewing the results

Finally, we can see what the results of the fitting. Iminuit has three useful properties that can be called to see information about the fitting.

  • obj.fmin provieds the minimal value that Iminuit found while fitting function.

  • obj.params provides basic information about the params. Validity information, final values, measured errors, and the final parameters

  • obj.matrix() Produces the covariance matrix, which can be useful for some fits that need the matrix to calculate overall error.

print(f"CPU bin 1")
print(f"GPU bin 1")
CPU bin 1
FCN = 2.246e+05 Nfcn = 231 (231 total)
EDM = 5.99e-07 (Goal: 0.0002)
Valid Minimum Valid Parameters SOME Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Hesse ok Has Covariance Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 A1 10.000 0.011 0.1
1 A2 2.987 0.011 1
2 A3 9.997 0.011 0.1
3 A4 3.008 0.011 1
A1 A2 A3 A4
A1 0 0 0 0
A2 0 0 0 -0
A3 0 0 0 0
A4 0 -0 0 0


GPU bin 1
FCN = 2.246e+05 Nfcn = 231 (231 total)
EDM = 5.99e-07 (Goal: 0.0002)
Valid Minimum Valid Parameters SOME Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Hesse ok Has Covariance Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 A1 10.000 0.011 0.1
1 A2 2.987 0.011 1
2 A3 9.997 0.011 0.1
3 A4 3.008 0.011 1
A1 A2 A3 A4
A1 0 0 -0 -0
A2 0 0 -0 -0
A3 -0 -0 0 0
A4 -0 -0 0 0