nuisance is hosted by Hepforge, IPPP Durham

Comparing generators to data with nuiscomp

Written by Clarence on 22 April 2019.


A broad range of datasets have been implemented into the NUISANCE framework that support direct comparison between different generator Monte Carlo event samples and published data. The full list is updated for every git commit and available here.

nuiscomp is the simplest executable comparing data to generator predictions. The datasets and generator files are specified by providing different sample IDs and pointing to different files through a "card file".

You can also provide NUISANCE with systematic parameters and a variety of options outlined below to do closure tests, fitting systematics, and making error bands.

For the code-inclined, nuiscomp lives in nuisance/app/nuiscomp.cxx and is driven by nuisance/src/Routines/ComparisonRoutines.cxx. You can view it here.

This expands on the documentation provided by running

$ ./nuiscomp

which is equivalent to

$ ./nuiscomp -h

An example nuiscomp run

This example compares GENIE events to MiniBooNE data. The example is near identical for a different generator, you just have to Prepare appropriately and replace GENIE in the cardfile with your generator descriptor, e.g. NEUT, NUWRO, GIBUU.

Requirements for the example:

Make a plain file mbcomp.card and fill it with:

<nuisance> <sample name="MiniBooNE_CCQE_XSec_1DQ2_nu" input="GENIE:PATH_TO_PREPARED_GENIE_EVENTS.root"/> </nuisance>

sample name specifies the measurement to compare to: in this case MiniBooNE CCQE cross-section in 1 dimensional in Q2 for neutrino.
input specifies the input generator and the path to the prepared file containing the events: in this case GENIE and our file is PATH_TO_PREPARE_GENIE_EVENTS.root

You can now run nuiscomp with this card file by doing

$ ./nuiscomp -c mbcomp.card -o myoutput.root

where -c flags preceeds the card file and -o preceeds the output file nuiscomp will write to.

Specifying a measurement

Changing the measurement is as easy as replacing the sample name field in the card file. The sample names can be found by running the nuissamples script in nuisance/scripts/nuissamples, making sure you have exported the NUISANCE environment variable to be your NUISANCE directory.

The output strings can then be used in the sample name field, e.g. sample name="MiniBooNE_CC1pip_XSec_1DQ2_nu" would specify MiniBooNE's CC1π+ cross section measurement in one dimension in Q2 for neutrino.

The script essentially looks at the string comparison fields in nuisance/src/FCN/SampleList.cxx which is run by nuiscomp. Check it out if you're curious.

Options for a measurement

Samples have a normalisation option, which is applied through norm="scale", where scale is a number. It is only accepted if type="FREE" is specified, e.g.

<sample name="MiniBooNE_CCQE_XSec_1DQ2_nu" input="GENIE:PATH_TO_PREPARED_GENIE_EVENTS.root" norm="0.5" type="FREE"/>

sets the normalisation of the sample to be 0.5. N.B. the normalisation parameter actually normalises the Monte-Carlo to 1/norm.

The options given to type relate to how the sample is treated, either statistically (e.g. 𝜒2 calculation without a covariance matrix) or normalisation scaling (e.g. doing a shape-only fit). These come in full effect in nuismin which is driven by a test-statistic which isn't the case for the simpler nuiscomp.

Use type="OPTION" with OPTION set to:

  • FIX, FREE, SHAPE: Keep the sample normalisation fixed, free, or shape only (normalises the generator prediction to the data)
  • FULL, DIAG: Use the full covariance or diagonal covariance matrix to calculate the test-statistic
  • NORM: Add a normalisation penalty if one is specified in the sample, controlled by the private member fNormError, set in each sample's constructor.
    e.g. for MiniBooNE Tμcosθμ it is specified that the normalisation error was 10.1%, so is implemented here.
where options on different rows can be provided simultaneously, separated by /. For an example of this in nuismin, see here.

Sample dependent and debugging options

Some lesser used (sample dependent) types to provide to a sample are:

  • Q2CORR: Apply a Q2-dependent correction if data supports it (currently just ANL, BNL, BEBC and FNAL bubble chamber CCQE measurements)
  • MASK: Use provided masking histogram. This may be useful if you want to use a measurement that should mask out regions you don't want to use, e.g. low Q2 in a bubble chamber

Some even lesser used options are related to how the sample is scaled. These options are always set in the sample implementation file and shouldn't have to be touched by a user for other than debugging purposes. They are:

  • ENU1D: Treat the measurement as a total cross-section: that is scale toa bin-by-bin flux integrated cross-section rather than flux integrated over the entire range.
  • NOWIDTH: No bin-width scaling is applied (so is e.g. cross-section/bin rather than cross-section/MeV)
  • RAW: Measurement is raw events (not a cross-section), so does not scale MC to a differential cross-section. Instead it scales the MC integral to be the data integral for every iteration

Additionally, some samples have settings specific to each measurement. These are documented in the implementation. A good example is nuisance/src/ANL/ANL_CC1ppip_XSec_1DEnu_nu.cxx (found here), which has many data releases and corrections: W < 1.4 GeV, W < 1.6 GeV, and no W cut. Furthermore the W < 1.4 GeV and no W cut data has the ability to apply a E𝝂 dependent correction (Phys. Rev. D 90, 112017 and Eur. Phys. J. C (2016) 76: 474. ).

Specifying a generator

To change the generator you first need to have generated events with said generator and linked NUISANCE to the generator. You can find notes on this here. You also need to have prepared generator files for NUISANCE use.

Once you have NUISANCE linked and prepared files you simply change the input field to match your generator (GENIE, NEUT, NUWRO, GIBUU or NUANCE), and new file location. For example input="NUWRO:PATH_TO_PREPARED_NUWRO_FILE.root".

Changing systematics via event reweighting in nuiscomp

An overview of the systematics present in NUISANCE is given here. Here we show how to include systematics in card file for nuiscomp.

Expanding on mbcomp.card with mbcomp_fit.card:

<nuisance> <sample name="MiniBooNE_CCQE_XSec_1DQ2_nu" input="GENIE:PATH_TO_PREPARED_GENIE_EVENTS.root"/> <parameter name="MaCCQE" nominal="1.0" type="genie_parameter"/> </nuisance>

For nuiscomp we can additionally provide a state for each parameter. The valid state for parameters is just if we're specifying the nominal in terms of 1 sigma, fractional or absolute values.

  • type="ABS" specifies it as an absolute value and uses the lookup in ${NUISANCE}/parameters/dial_conversion.card
  • type="FRAC" specifies it as a fractional value
  • Not specifying uses the default generator behaviour, often meaning relative the 1 sigma uncertainty (e.g. for GENIE these are found in ${GENIE}/src/ReWeight/GSystUncertainity.cxx).

N.B. nuismin expands on these options, as seen here.

Making fake-data via event reweighting

Fake-data can be provided to nuiscomp in two ways:

  • Giving the -d parameter to nuiscomp (outlined later)
  • Specifying a fakeparameter field
The fakeparameter must be a reweighting parameter already present in the card file, and follows the exact structure, specifying the name and nominal. The nominal is the value with which the fakedata is generated. The fake-data becomes the data histogram in NUISANCE, which is used to calculate e.g. the goodness of fit.

A card file for MiniBooNE CCQE in 1D Q2 for neutrino, setting the fake-data to be GENIE with MaCCQE set to 1 sigma above the default, would be

<nuisance> < sample name="MiniBooNE_CCQE_XSec_1DQ2_nu" input="GENIE:PATH_TO_PREPARED_GENIE_EVENTS.root" /> < parameter name="MaCCQE" nominal="1.0" type="genie_parameter" /> < fakeparameter name="MaCCQE" nominal="1.0" type="genie_parameter" /> </nuisance>

Running nuiscomp

nuiscomp takes a number of command line arguments, one of which is required. It requires either

  • -c mycard.card where mycard.card is the provided card file (see above)
  • -i 'card file contents'

The -i option follows the same structure as a card file but is instead an argument. For example

$ ./nuiscomp -o output.root -i 'sample name="MiniBooNE_CCQE_XSec_1DQ2_nu" input="GENIE:PATH_TO_PREPARED_GENIE_EVENTS.root"'

would acheive the same effect as providing nuiscomp with the simple mbcomp.card earlier.

The complete list of arguments and their types is:

  • -c string: the cardfile to read
  • -o string: the output file to create (default: append ".root" to cardfile name)
  • -n int: the number of events to run on (default: all)
  • -d string: fakedata input, can be either (default: empty)
    • "MC": for setting the data to the nominal MC prediction
    • "fakedata.root": Direct path to root file containing a MC histogram called the measurement name with "_MC" at the end (the default format for NUISANCE MC histograms)
  • -i string: card file-like XML commands (default: empty)
  • -q string: config arguments, which overwrite nuisance/parameters/config.xml (default: empty)
  • e +/- error level relative that in nuisance/parameters/config.xml
  • v +/- verbosity level relative that in nuisance/parameters/config.xml

Running nuiscomp as

$ ./nuiscomp -c mycard.card -o myoutput.root

will read the card file mycard.card and produce the output ROOT file myoutput.root.

Inspecting nuiscomp logs

NUISANCE produces various output as it sets up, using different verbosity logging levels, defined in nuisance/parameters/config.xml. This section explains what lines indicate what.

In this example the home directory is /home/nuisance, built in /home/nuisance/build. The cardfile is called cardfile.xml, containing

<nuisance> <parameter name="ZExpA1CCQE" nominal="1" type="genie_parameter" state="FIX"/> <parameter name="ZExpA2CCQE" nominal="-1" type="genie_parameter" state="FIX"/> <parameter name="ZExpA3CCQE" nominal="1" type="genie_parameter" state="FIX"/> <parameter name="ZExpA4CCQE" nominal="-1" type="genie_parameter" state="FIX"/> <sample name="MINERvA_CC0pi_XSec_1DQ2QE_nu" input="GENIE:/home/nuisance/GENIE_file.root" type="FREE" norm="0.8/> </nuisance>

which loads up the sample MINERvA_CC0pi_XSec_1DQ2QE_nu with a prepared GENIE file in /home/nuisance/GENIE_file.root and a normalisation of 0.8. It specifies four GENIE ReWeight parameters for the z-expansion model: ZExpA1CCQE, ZExpA2CCQE, ZExpA3CCQE, and ZExpA4CCQE.

Printing verbosity

The first lines set up the logger with verbosity levels, defined in /home/nuisance/parameters/config.xml. Here we've set it to VERBOSITY=4 and ERROR=2.

[ NUISANCE ]: Loading DEFAULT settings from : /home/nuisance/parameters/config.xml -> DONE. [LOG Fitter]: : nuiscomp.cxx::main[l. 213] : Starting nuiscomp.exe [ NUISANCE ]: Trying to parse file : cardfile.xml -> Found XML file. [ NUISANCE ]: Loading XML settings from : cardfile.xml -> DONE. [ NUISANCE ]: Finalising run settings[INFO]: Removing node: config [INFO]: Removing node: config [INFO]: Removing node: config [INFO]: Removing node: config -> DONE. [ NUISANCE ]: Setting VERBOSITY=4 [ NUISANCE ]: Setting ERROR=2

Reading the cardfile

NUISANCE then continues to process the cardfile and register the settings internally. It recognises our four GENIE parameters, their values and types. It then proceeds to find the sample that we specified, it's type, normalisation and file location

[LOG Fitter]: Setting up nuiscomp [LOG Fitter]: Number of parameters : 4 [LOG Fitter]: Read genie_parameter : ZExpA1CCQE = 1 : FIX [LOG Fitter]: Read genie_parameter : ZExpA2CCQE = -1 : FIX [LOG Fitter]: Read genie_parameter : ZExpA3CCQE = 1 : FIX [LOG Fitter]: Read genie_parameter : ZExpA4CCQE = -1 : FIX [LOG Fitter]: Number of samples : 1 [LOG Fitter]: Read Sample 0. : MINERvA_CC0pi_XSec_1DQ2QE_nu (FREE) [Norm=0.8] -> input='GENIE:/home/nuisance/GENIE_file.root'

Loading the ReWeighter

The reweighting parameters are specified in the FitWeight class, which interfaces to GENIE and the reweighting. The GENIE parameters are each given a unique enum. The sample normalisation is also registered as a parameter and is given a unique enum.

[LOG Fitter]: Setting up FitWeight Engine [LOG Fitter]: Registered Dial Enum : ZExpA1CCQE 5 5069 [LOG Fitter]: Adding reweight engine 5 [LOG Fitter]: Setting up GENIE RW : genierw [LOG Fitter]: Setting GENIE ReWeight CCQE to kModeZExp [LOG Fitter]: Registering ZExpA1CCQE from ZExpA1CCQE [LOG Fitter]: Registered Dial Enum : ZExpA2CCQE 5 5070 [LOG Fitter]: Registering ZExpA2CCQE from ZExpA2CCQE [LOG Fitter]: Registered Dial Enum : ZExpA3CCQE 5 5071 [LOG Fitter]: Registering ZExpA3CCQE from ZExpA3CCQE [LOG Fitter]: Registered Dial Enum : ZExpA4CCQE 5 5072 [LOG Fitter]: Registering ZExpA4CCQE from ZExpA4CCQE [LOG Fitter]: Registered Dial Enum : MINERvA_CC0pi_XSec_1DQ2QE_nu_norm 9 9000 [LOG Fitter]: Adding reweight engine 9 [LOG Fitter]: Setting up Likelihood Weight RW : normrw

Building the samples

We then progress to reading the input sample, its file location and loads up the GENIE event handler. We're provided with some information on the input file: the number of entries we're reading, the converted event rate and flux, and finally the Event/Flux which is equivalent to the flux averaged single bin cross-section.

[LOG Fitter]: Building the SampleFCN [LOG Minmzr]:- Loading Samples : 1 [LOG Minmzr]:- Loading Sample : MINERvA_CC0pi_XSec_1DQ2QE_nu [LOG Fitter]: Searching for dynamic sample manifests in: /home/nuisance/build/app/ [LOG Fitter]: Loaded 0 from 0 shared object libraries. [LOG Sample]:-- Loading Sample : MINERvA_CC0pi_XSec_1DQ2QE_nu [LOG Sample]:-- Creating GENIEInputHandler : MINERvA_CC0pi_XSec_1DQ2QE_nu |-> Read Max Entries : 100000 |-> Total Entries : 100000 |-> Event Integral : 6.14102e-36 events/nucleon |-> Flux Integral : 142.944 /cm2 |-> Event/Flux : 4.29612e-38 cm2/nucleon [LOG Sample]:-- Registered MINERvA_CC0pi_XSec_1DQ2QE_nu with EventManager. [LOG Sample]:-- Reading data from root file: /home/nuisance/data/MINERvA/CC0pi_1D/FixedBinWidthPub/cov_q2qe_qelike.root;q2qe_cross_section [LOG Sample]:-- Finalising Sample Settings: MINERvA_CC0pi_XSec_1DQ2QE_nu [LOG Sample]:-- Finalising Measurement: MINERvA_CC0pi_XSec_1DQ2QE_nu [ERR FATAL ]: Data error does not match covariance error for bin 1 (0-0.00625) [ERR FATAL ]: Data error: 5.7419e-40 [ERR FATAL ]: Cov error: 5.8002e-40 [ERR FATAL ]: Data error does not match covariance error for bin 3 (0.0125-0.025) [ERR FATAL ]: Data error: 7.23557e-40 [ERR FATAL ]: Cov error: 7.33115e-40 [ERR FATAL ]: Data error does not match covariance error for bin 16 (2-4) [ERR FATAL ]: Data error: 4.65479e-42 [ERR FATAL ]: Cov error: 4.79306e-42

We're then told where the data is being read from (/home/nuisance/data/MINERvA/CC0pi_1D/FixedBinWidthPub/cov_q2qe_qelike.root) and the histogram (q2qe_cross_section) which it contains. NUISANCE does a number of checks of the data and here notices that for some bins the provided covariance matrix does not agree with the error on the data histogram. In this case the mismatch is 0.1E-40, which isn't too concerning, although definitely is a problem!. Here we contacted the analyser (Dan Ruterbories, Rochester) and this is apparently due to rounding errors in making the covariance matrix. We suggest you do the same if noticing something similar!

The event loop

The main event loop starts once everything has been loaded up successfully. Events are extracted out of the input file, converted to the common format, and event characteristics are calculated (e.g. pμ and cosθμ to construct Q2QE). Each event passes through the signal definition defined in the sample, and the generator (MC) histograms are filled up.

The output prints it's running the Compare routine, and starts the first iteration.

[LOG Fitter]: Running ComparisonRoutines : Compare [LOG Fitter]: Routine: Compare [LOG Fitter]: Generating Comparison. [LOG Reconf]:--- Starting Reconfigure iter. 0 [LOG Reconf]:--- Event Manager Reconfigure [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 0 events. [M, W] = [13, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 10000 events. [M, W] = [26, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 20000 events. [M, W] = [26, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 30000 events. [M, W] = [26, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 40000 events. [M, W] = [46, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 50000 events. [M, W] = [42, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 60000 events. [M, W] = [26, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 70000 events. [M, W] = [26, 1] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 80000 events. [M, W] = [1, 0.845794] [LOG Reconf]:--- MINERvA_CC0pi_XSec_1DQ2QE_nu : Processed 90000 events. [M, W] = [52, 1] [LOG Reconf]:--- Filled 11083 signal events. [LOG Reconf]:--- Time taken ReconfigureUsingManager() : 12

NUISANCE reports the status of the event loop with 10 prints to screen per reconfigure. So running on 100,000 events will cause a print every 10,000 event. With every print, NUISANCE reports the interaction mode (M) and the weight (W) from the main loop. The interaction modes are documented here.

The bold underscored line with [M, W] = [1, 0.845794] is a CCQE event (M=1). Since we're running with a variation of the z-expansion parameters we'd expect a CCQE event to receive a weight different to 1, which is indeed the case (W=0.845794). This is useful for debugging purposes, and to see the progress during a fit with nuismin.

Finally the number of signal events (11083 out of 100000) is reported, and the time taken for a full reconfigure (12 s). Hence the looping and reweighting through GENIE ReWeight takes about 0.1 ms/event (N.B. without reweighting the time taken was 7 s).

The summary

The summary of the final state of the program is printed at the end of the routine.

We see a summary of the parameters and their values, and a converted value in real units (which uses nuisance/parameters/dial_conversion.card if available). As expected, we have four z-expansion parameters and one sample normalisation parameter. They are all set to the parameter values provided in the card file.

We also see a likelihood for the sample of -2logL=348.248/16, which means the χ2=348.248 and the number of bins N=16.

[LOG Minmzr]:- Finished Reconfigure iter. 0 in 12s [LOG Fitter]: ------------ [LOG Fitter]: # Parameter = Value +- Error (Units) Conv. Val +- Conv. Err (Units) [LOG Fitter]: 0 . ZExpA1CCQE = 1 +- 0 (sig.) 2.622 +- 0 (x) [LOG Fitter]: 1 . ZExpA2CCQE = -1 +- 0 (sig.) -0.198 +- 0 (x) [LOG Fitter]: 2 . ZExpA3CCQE = 1 +- 0 (sig.) -7.6 +- 0 (x) [LOG Fitter]: 3 . ZExpA4CCQE = -1 +- 0 (sig.) 0.575 +- 0 (x) [LOG Fitter]: 4 . MINERvA_CC0pi_XSec_1DQ2QE_nu_norm = 0.8 +- 0 (sig.) 0.8 +- 0 (sig.) [LOG Fitter]: ------------ [LOG Minmzr]:- Getting likelihoods... : -2logL [LOG Minmzr]:- -> MINERvA_CC0pi_XSec_1DQ2QE_nu : 348.248/16 [LOG Fitter]: Likelihood for JointFCN: 348.248 [LOG Fitter]: ------------ [LOG Fitter]: Saving current full FCN predictions [LOG Minmzr]:- Writing likelihood plot.. [LOG Minmzr]:- Writing each of the data classes... [LOG Sample]:-- Written Histograms: MINERvA_CC0pi_XSec_1DQ2QE_nu [LOG Fitter]: ------------------------------------ - [LOG Fitter]: Comparison Complete. [LOG Fitter]: ------------------------------------ -

Inspecting nuiscomp output

Once the code has run nuiscomp saves the histograms to a root file (specified by the user giving the -o option).