Intro to PyPWA with the 2D Gauss

The goal with this little tutorial is to walk through how those PyPWA and its collective features.

Note: Multiproccessing is done automatically when it’s selected. However, if you have some direct C/C++ code dependency in your Function on called in your class’s __init__, you will encounter issues. This is why each object has a setup function- To initialize Fortran and C++ dependencies there.

[1]:
import numpy as np  # Vectorization and arrays
import pandas as pd  # A powerful data science toolkit
import numexpr as ne  # A threaded accelerator for numpy

import PyPWA as pwa
from IPython.display import display
/home/mark/.anaconda3_install/envs/PyPWA/lib/python3.10/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

There are 3 different supported ways to define your kernel/amplitude in PyPWA. - Using multiprocessing: Write your kernel using Numpy and include any externally compiled code the setup method. This is the default kernel, and will result in your kernel being deployed in parallel across all threads on the host system. - Using Numexpr to use hardware threads and low level vectorization to further accelerate Numpy. There is some benefit to running Numexpr on less cores than traditional Numpy, but largely you can treat Numexpr the same as the above. - Using Torch to compute the Kernel. This will allow you to take advantage of Metal on Apple PCs, or CUDA GPUs on Linux machines. However, to utilize CUDA, you must disable Multiprocessing. At this time, CUDA does not support the main process being forked.

[2]:
class Gauss2dAmplitude(pwa.NestedFunction):
    """
    This is a simple 2D Gauss, but built to use PyPWA's builtin
    multiprocessing module. For you, you don't need to worry about thread or
    process management, how to pass data between threads, or any of the other
    hassles that come with multithreading.

    Instead, you just define your class while extending the NestedFunction,
    and when you pass it to the fitter or the simulator, it'll clone your
    class, split your data, and deploy to every processing thread your
    machine has.
    """

    def __init__(self):
        """
        You can override the init function if you need to set parameters
        before the amplitude is passed to the likelihood or simulation
        functions. You can see an example of this with the JPAC amplitude
        included in the other tutorials. However, you must remember
        to always call the `super` function if you do this.
        """
        super(Gauss2dAmplitude, self).__init__()

    def setup(self, array):
        """
        This function is where your data is passes too. Here you can also
        load any C or Fortran external libraries that typically would not
        support being packaged in Python's object pickles.
        """
        self.__x = array["x"]
        self.__y = array["y"]

    def calculate(self, params):
        """
        This function receives the parameters from the minimizer, and
        returns the values from there. Only the amplitude values should
        be calculated here. The likelihood will be calculated elsewhere.
        """
        scaling = 1 / (params["A2"] * params["A4"])
        left = ((self.__x - params["A1"])**2)/(params["A2"]**2)
        right = ((self.__y - params["A3"])**2)/(params["A4"]**2)
        return scaling * np.exp(-(left + right))
[3]:
class NeGauss2dAmplitude(pwa.NestedFunction):
    """
    This is the same Gauss as above, but instead of using raw numpy, it
    uses numexpr, a hyper vectorized, multithreading, numerical package that
    should accelerate the calculation of your data.

    USE_MP defaults to True, but you should consider setting it to false.
    Numexpr will do some partial multithreading on its own for its
    calculations, however any part of your algorithm that is defined outside
    Numexpr will not benefit from Numexpr. Due to this, there is an optimum
    number of threads for amplitudes with Numexpr that range from 2 threads
    to around 80% of the system threads. A good starting point is around
    50% of the CPU threads available.
    """

    USE_MP = False

    def setup(self, array):
        self.__data = array

    def calculate(self, params):
        return ne.evaluate(
            "(1/(a2*a4)) * exp(-((((x-a1)**2)/(a2**2))+(((y-a3)**2)/(a4**2))))",
            local_dict={
                "a1": params["A1"], "a2": params["A2"],
                "a3": params["A3"], "a4": params["A4"],
                "x": self.__data["x"], "y": self.__data["y"]
            }
        )
[4]:
import torch as tc

class TorchGauss2dAmplitude(pwa.NestedFunction):
    """
    Finally, this is the Torch version of the Gauss2D.

    To utilize Torch, the USE_TORCH flag must be set to True, or the
    likelihood will assume that the results will be in standard Numpy
    arrays, and not Torch Tensors.

    Torch affords us some features that Numpy does not. Specifically,
    support for both Apple's Metal Acceleration, and CUDA Acceleration.
    For Apple's Metal, there is no work required further than defining
    your amplitude in Torch due to the shared memory on Apple systems.
    To utilize CUDA, however, you must move the data to the GPU before
    the GPU can accelerate the operations. Because of the nature of
    CUDA, CUDA and Multiprocessing are not compatible, so you must
    disable multiprocessing when using CUDA.

    **WARNING** You **MUST** set USE_MP to False if you are using
    CUDA as a Torch Device!
    """

    USE_TORCH = True
    USE_MP = False

    # device is not a flag for Amplitude, but we use it track the current
    # device that the amplitude should run on.
    device = ...  # type: tc.device

    def setup(self, array):
        # We want to always set the current device. It also helps to be able
        # to toggle GPU on and off for the entire amplitude using the USE_MP,
        # flag since the flag can be set after initialization.
        if self.USE_MP:
            self.device = tc.device("cpu")
        else:
            self.device = tc.device("cuda:0")

        # Since the data is in Pandas, we need to map it to Numpy first
        narray = pwa.pandas_to_numpy(array)

        self.__x = tc.from_numpy(narray["x"]).to(self.device)
        self.__y = tc.from_numpy(narray["y"]).to(self.device)

    def calculate(self, params):
        scaling = 1 / (params["A2"] * params["A4"])
        left = ((self.__x - params["A1"])**2)/(params["A2"]**2)
        right = ((self.__y - params["A3"])**2)/(params["A4"]**2)
        return scaling * tc.exp(-(left + right))

Using caching for intermediate steps

PyPWA’s caching module supports caching intermediate steps. The advantage of using the caching module is that saving and loading values is fast; much faster than almost any other solution, and supports almost anything you can store in a variable.

PyPWA.cache has two functions in it, read and write. You can save almost anything in the cache: lists, DataFrames, dictionary’s, etc. There is a chance that it won’t save the value if the data isn’t serializable into a pickle, and it may not be compatible between different versions of python, so I don’t recommend using this for data that you can’t reproduce. However, if you need to do some feature engineering, or data sanitizing, before you can use the data in whatever way you need and want to keep that data around to speed up future executions, this module will make your life a touch easier.

Below, I created a large flat DataFrame, and then binned that DataFrame into 10 bins, each with 1,000,000 events in them. Then, I saved those results into a cache object that will appear in the current working directory with the name “flat_data.intermediate”.

  • pwa.read returns two values, the first is a boolean that is True only if it was able to read the pickle, and the second is the parsed data, which will be None if it was unable to parse anything from the file, or the file doesn’t exist.

  • pwa.write has no returns, but does write the data out in Pickle format to the provided filename + the “.intermediate” extension.

[5]:
valid_cache, binned_flat = pwa.cache.read("flat_data")
if not valid_cache:
    flat_data = pd.DataFrame()
    flat_data["x"] = np.random.rand(10_000_000) * 20
    flat_data["y"] = np.random.rand(10_000_000) * 20
    flat_data["binning"] = np.random.rand(10_000_000) * 20
    binned_flat = pwa.bin_with_fixed_widths(flat_data, "binning", 1_000_000)
    pwa.cache.write("flat_data", binned_flat)

Simulation with bins

Simulation can be run as a whole system, you simply provide the function and data, and it’ll return the masked values, or you can run the two steps independently, with the first step returning the intensities, and the second returning the masks. When your working with a single dataset, running it as a single step make sense, however if you bin your data, then running it as two steps is better so that all bins are masked against the same max value of the intensity.

  • pwa.simulate.process_user_function takes all the same arguments as pwa.monte_carlo_simulation so it can be a drop in replacement. The difference is that this function will return the final values for the user’s function and the max value.

  • pwa.simulate.make_rejection_list takes the final values and either a single max value, or a list or array of max values, and it’ll use the largest max value. This function will return the same value as pwa.monte_carlo_simulation

Below, I iterate over the bins and produce the final values and max value for each bin and store them in their own lists.

[6]:
simulation_params = {
    "A1": 10, "A2": 3,
    "A3": 10, "A4": 3
}

final_values = []
max_values = []
for fixed_bin in binned_flat:
    final, m = pwa.simulate.process_user_function(
        TorchGauss2dAmplitude(), fixed_bin, simulation_params
    )
    final_values.append(final)
    max_values.append(m)

pwa.cache.write("final_values", max_values)

After the final values have been produced, I use pwa.simulate.make_rejection_list to reject events from each bin, and then store the new carved results in a fresh list.

[7]:
rejected_bins = []
masked_final_values = []
for final_value, bin_data in zip(final_values, binned_flat):
    rejection = pwa.simulate.make_rejection_list(final_value, max_values)
    rejected_bins.append(bin_data[rejection])
    masked_final_values.append(final_value[rejection])

pwa.cache.write("fitting_bins", rejected_bins, True)
pwa.cache.write("kept_final_values", masked_final_values, True)
[8]:
for index, simulated_bin in enumerate(rejected_bins):
    print(
        f"Bin {index+1}'s length is {len(simulated_bin)}, "
        f"{(len(simulated_bin) / 1_000_000) * 100:.2f}% events were kept"
    )
Bin 1's length is 70589, 7.06% events were kept
Bin 2's length is 70544, 7.05% events were kept
Bin 3's length is 70621, 7.06% events were kept
Bin 4's length is 70608, 7.06% events were kept
Bin 5's length is 70532, 7.05% events were kept
Bin 6's length is 70369, 7.04% events were kept
Bin 7's length is 71019, 7.10% events were kept
Bin 8's length is 70542, 7.05% events were kept
Bin 9's length is 70533, 7.05% events were kept
Bin 10's length is 70633, 7.06% events were kept

How Caching is used by the Read and Write functions

If you want your data to be parsable by standard libraries, but still want to leverage the speed of caching, you can use both, by default even! When the Cache Module is used by the Data Module, it utilizes an additional feature that is tucked away when used by itself: File Hashing. The Cache module can be told when it’s caching a specific file, so before the cache is created, it will parse the source file to determine it’s SHA512 Sum, and then store that inside the cache. When the file is loaded next, the saved SHA512 Sum is compared to the file’s current sum, and if they match the cache is returned, otherwise the file is parsed again, and the cache is recreated.

After the below cell runs, you’ll see two new files created: first_bin.csv and first_bin.cache. These two files will contain the same data, but if the CSV file is changed, even if just by a single character, the file will be parsed again on the next call of pwa.read

[9]:
try:
    first_bin = pwa.read("first_bin.csv")
except Exception:
    first_bin = binned_flat[0]
    pwa.write("first_bin.csv", binned_flat[0])

Fitting

While you can use PyPWA’s likelihoods with any minimizer, PyPWA supports Iminuit 2.X out of the box. The first thing that is done is we set up the parameters to fit against, as well as the individual names of each parameter.

Traditionally, iminuit works by reading the values from the provided function to guess what the parameters are and what to pass to the function, however since we wrap the minimized function to take advantage of GPU acceleration and multiprocessing, you must also tell iMinuit what the values are directly.

[10]:
fitting_settings = {
    "A1": 1, "A2": 1,
    "A3": 1, "A4": 1,
}

Then below, we can simply fit those values.

[11]:
import multiprocessing as mp
# Even though we're using Numexpr, I do want to take advantage of both
# multiprocessing and Numexpr's low level optimizations. So by selecting
# a small number of processes with Numexpr, you still get an overall
# speedup over either Numexpr or regular multiprocessing
NeGauss2dAmplitude.USE_MP = True

cpu_final_values= []
for simulated_bin in rejected_bins:
    with pwa.LogLikelihood(
            NeGauss2dAmplitude(), simulated_bin,
            num_of_processes=int(mp.cpu_count() / 2)
    ) as likelihood:
        optimizer = pwa.minuit(fitting_settings, likelihood)

        for param in ["A1", "A3"]:
            optimizer.limits[param] = (.1, None)

        for param in ["A2", "A4"]:
            optimizer.limits[param] = (1, None)

        cpu_final_values.append(optimizer.migrad())
[12]:
gpu_final_values = []
for simulated_bin in rejected_bins:
    with pwa.LogLikelihood(TorchGauss2dAmplitude(), simulated_bin) as likelihood:
        optimizer = pwa.minuit(fitting_settings, likelihood)

        for param in ["A1", "A3"]:
            optimizer.limits[param] = (.1, None)

        for param in ["A2", "A4"]:
            optimizer.limits[param] = (1, None)

        gpu_final_values.append(optimizer.migrad())

A note about with

If you are new to Python, the with statement might be new to you. with allows you to create objects that should be closed. Traditionally, you will see with used with files, but we use this with Likelihoods. In the file case, when you leave the with block it will flush the buffers for you and close the file’s handle. In the case of Likelihoods, when you leave the with block it will shut down any associated threads, processes, and pipes that are associated with the created Likelihood.

Below is an example of how the Likelihood works without using the with statement.

[13]:
for simulated_bin in rejected_bins:
    likeihood = pwa.LogLikelihood(TorchGauss2dAmplitude(), simulated_bin)
    optimizer = pwa.minuit(fitting_settings, likelihood)

    for param in ["A1", "A3"]:
        optimizer.limits[param] = (.1, None)

    for param in ["A2", "A4"]:
        optimizer.limits[param] = (1, None)

    # You must remember to close the Likelihood when not using the 'with' block!
    likeihood.close()

Issues with PyPWA.cache

There are some values that can not be saved in a PyPWA’s cache. Typically, it’s an object from a package that takes advantage of Cython or Fortran to accelerate its execution: either because the values are stored as pointers to arrays, or uses C types too deep for Python’s interpreter to analyze. A good example of this case is results from Iminuit.

As you can see below, a Runtime Warning is thrown from PyPWA’s caching module about how the data can’t be saved. However, the real error is that the tuple has values that can not be converted to a pure Python object for pickling.

[14]:
try:
    pwa.cache.write("fitting_results", cpu_final_values, True)
except RuntimeWarning as error:
    print("Caught a cache error")
    print(f"{type(error)}: {error}")
Caught a cache error
<class 'RuntimeWarning'>: Your data can not be saved in cache!

Viewing the results

Finally, we can see what the results of the fitting. The result objects from iminuit are actually Jupyter aware, so if you view a result from iminuit in Jupyter, the values will be responsive.

If you want to know what methods and parameters are available the result object returned by iminuit, you should read through their (documentation)[https://iminuit.readthedocs.io/]

[15]:
print(f"CPU bin 1")
cpu_final_values[0]
CPU bin 1
[15]:
Migrad
FCN = 2.259e+05 Nfcn = 231
EDM = 5.29e-07 (Goal: 0.0001)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 A1 10.005 0.008 0.1
1 A2 3.003 0.008 1
2 A3 9.995 0.008 0.1
3 A4 3.006 0.008 1
A1 A2 A3 A4
A1 6.39e-05 5.59e-08 8.65e-14 8.61e-14
A2 5.59e-08 6.39e-05 6.89e-14 -3.45e-13
A3 8.65e-14 6.89e-14 6.4e-05 5.57e-08
A4 8.61e-14 -3.45e-13 5.57e-08 6.4e-05
[16]:
print(f"GPU bin 1")
display(gpu_final_values[0])
GPU bin 1
Migrad
FCN = 2.259e+05 Nfcn = 231
EDM = 5.29e-07 (Goal: 0.0001)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 A1 10.005 0.008 0.1
1 A2 3.003 0.008 1
2 A3 9.995 0.008 0.1
3 A4 3.006 0.008 1
A1 A2 A3 A4
A1 6.39e-05 5.59e-08 0 -0
A2 5.59e-08 6.39e-05 -0 0
A3 0 -0 6.4e-05 5.57e-08
A4 -0 0 5.57e-08 6.4e-05