[[PageOutline(1-3,,pullout,unnumbered)]]
= How to add a sample in NUISANCE =
This is an informal step-by-step guide on how to add samples into the NUISANCE framework. Most of this can be derived by looking at the already implemented measurements, and I'd advise potential readers to have a look around in the `src` directory in addition to reading this guide.
For this tutorial I'll be adding the T2K CC1pi+ H,,2,,O data. The data comes from the [http://t2k-experiment.org/results/nd280data-numu-cc1pi-xs-on-h2o-2015/ T2K site] and [https://arxiv.org/abs/1605.07964 arxiv]. The data is supplied both in a ROOT file and a number of .csv files: I'll be using the ROOT file here.
After this we'll be able to make data/MC comparisons to the T2K CC1pi+ H,,2,,O data-set for NEUT, GENIE, GiBUU and !NuWro; all in the same framework. We'll get MC predictions, chi2 values, interaction mode contributions, and more using a consistent signal definition, guaranteed the same across the generators.
'''NUISANCE is currently fully written in C++03 and we would appreciate future contributions to adhere to this standard.'''
{{{
#!div class="important"
'''Versions:''' `NUISANCE v1r0`, `NEUT 5.3.6`, `arxiv 1605.07964v2.pdf`
}}}
= Examining the data and choosing distributions =
== Finding the neutrino flux and generating events ==
The first issue at hand is to find the flux for the experiment. If we don't have this, we cant make a generator prediction---unless the measurement is a total cross-section without any phase space cuts (in which case you should probably cast a suspicious eye).
A quick search through the arxiv document points us to reference ![12]. It is also in our [wiki:ExperimentFlux flux list].
We then generate events in NEUT 5.3.6 with a suitable card-file, see our [wiki:HowToNeut NEUT guide] on how to do this. We need the correct target (water) and flux, and perform the model selections we want (e.g. Rein-Sehgal or Berger-Sehgal coherent model). The procedure is very similar for other generators too.
Aim to generate around 1M events with all interaction modes turned on. This way we make sure to get all interaction modes into the topologically defined cross-section. We get a small amount of CCQE events which excite a pion from the outgoing nucleon interacting in the nucleus to kick out a pion, leading to a CCQE+1pi ~ CC1pi+ final state, which is signal for this particular sample.
== Choosing a cross-section distribution ==
The T2K CC1pi+ H,,2,,O data release contains various distributions in FIG 4. In this tutorial I'll look at adding one kinematic distribution and one derived distribution: p,,mu,, and E^rec^,,nu,, shown below.
[[Image(T2K_CC1pip_H2O_pmu.png, 300px)]][[Image(T2K_CC1pip_H2O_enurec.png, 300px)]]
In NUISANCE we try to add all available distributions from a publication. '''However''', some distributions will have detector effects "folded" into them (i.e. they will be raw detector-level data). We can only use these if there is some method which bring detector-level variables (e.g. p_mu seen in the detector) to truth-level variable (e.g. p_mu seen after correcting for the detector effects).
= Preparing structure for a sample =
NUISANCE is designed to easily allow adding new samples.
[=#point_base]
Each experimental measurement is its own class. To simplify including new samples we supply a base class for 1D (`Measurement1D`) and 2D (`Measurement2D`) measurements, which in turn inherits from the virtual base class (`MeasurementBase`).
These `MeasurementBase` classes are then looped over in the executables and data/MC comparisons are the result. A class has to at least inherit from `MeasurementBase` and implement the necessary methods to be "looped over". The recommended method is to use and expand the supplied `Measurement1D` and `Measurement2D` classes; anything else may require expert knowledge and won't be covered here.
The inheritance tree is simple and goes `Specific_Measurement -> MeasurementXD -> MeasurementBase`
== Naming the sample ==
Some automatic processing is done on loading up the samples to set up generator scaling factors, chi2 calculations and so on. These are simple string comparisons done in the base class constructors, but do '''place responsibility on the user'''.
'''The structure is `Experiment_Measurement_Target_DataType_DimensionVar_Neutrino`''':
* `Experiment` is the experiment (for us T2K)
* `Measurement` is a suitable name for the cross-section (for us CC1pip)
* `Target` is the interaction target (for us H2O)
* `DataType` is the measurement type, either `Evt` for an event rate measurement or `XSec` for a cross-section measurement (for us `XSec`)
* `DimensionVar` is 1D or 2D followed by a suitable name for the dependent variable (for us `1Dpmu`)
* `Neutrino` is the neutrino type (for us `nu`)
Out of these, string comparisons are only made on `DataType` and `DimensionVar`. The other identifiers exist to adhere to some standard and keep sample naming tidy and consistent.
'''What to do:'''
Following the above convention, we end up with '''`T2K_CC1pip_H2O_XSec_1Dpmu_nu`''': there is no ambiguity what the class describes and there is no way of confusing it with other classes in NUISANCE.
== Placing the sample in a directory ==
As with many packages, NUISANCE has source code in the `src` directory. We have a directory for each experiment inside `src`, e.g. `src/T2K`, `src/MINERvA`, and so on.
The applications using the `src` files go in the `app` directory. However, including a new sample does not involve changing anything there, so let's ignore it for the purpose of this tutorial.
The new files themselves for the new samples should be identical (or very similar) to the sample name to avoid confusion.
'''What to do:'''
Make the new files '''`src/T2K/T2K_CC1pip_H2O_XSec_1Dpmu_nu.cxx`''' and '''`src/T2K/T2K_CC1pip_H2O_1Dpmu_nu.h`'''. These are the implementation and header files for the new measurement.
== Placing the data in a directory ==
The data for the measurement goes into the `data` directory.
As with the `src` directory we have sub-directories for each experiment, e.g. `data/T2K` and `data/MINERvA`.
Furthermore, we specify another directory for the measurement topology to avoid confusion, e.g. `data/T2K/CC1pip` for our sample.
In some cases we might have the same topology defining the cross-section but for different targets. In this case we add another sub-directory for the target, e.g. `data/T2K/CC1pip/H2O`.
'''What to do:'''
Put the ROOT file from the [http://t2k-experiment.org/results/nd280data-numu-cc1pi-xs-on-h2o-2015/ data release] into the `data/T2K/CC1pip/H2O` directory.
{{{
#!html
}}}
= Coding up a sample =
Now that we have the rough structure set-up, we can finally start writing some code.
== Writing the header ==
In the case of our T2K CC1pi+ H,,2,,O data, we're dealing with 1D distributions. They should therefore inherit from the `Measurement1D` base class, as mentioned [#point_base earlier]. The `Measurement1D` class is implemented in `src/FitBase/Measurement1D.cxx`.
NUISANCE requires a `constructor` and `destructor` for the class, and we'll need to overload `MeasurementBase` methods which define the dependent variable(s) (p,,mu,, in our case) and what our signal is (CC interaction with one muon and one positive pion with no other pions or mesons and any number of nucleons in our case). The `MeasurementBase` functions which we need to overload are `MeasurementBase::isSignal(FitEvent *event)` and `MeasurementBase::FillEventVariables(FitEvent *event)`.
The `FitEvent` class is an object which contains information about one single event: all the particles, their kinematics and their status after the interaction, the interaction channel which produced the final state, possible secondary interactions, the interaction target, and so on. The `FitEvent` class is implemented in `src/FitBase/FitEvent.cxx` and essentially the generator-independent common event format which NUISANCE uses for the generator events.
[=#point_signal]
We see that the `src/T2K` directory contains `T2K_SignalDef.cxx`. This extends the `SignalDef` namespace which holds many of the signal definitions for the classes. If you're implementing multiple distributions for the same measurement (e.g. dSig/dp,,mu,, and dSig/dcosmu) we recommend adding the signal definition to the namespace rather than directly in the `isSignal(FitEvent *event)` function in the class implementation. We'll talk about implementing a signal definition [#point_signaldef later].
'''What to do:'''
Our header file now looks like:
{{{
#ifndef T2K_CC1PIP_H2O_XSEC_1DPMU_NU_H_SEEN
#define T2K_CC1PIP_H2O_XSEC_1DPMU_NU_H_SEEN
#include "Measurement1D.h"
#include "T2K_SignalDef.h"
class T2K_CC1pip_H2O_XSec_1Dpmu_nu : public Measurement1D {
public:
T2K_CC1pip_H2O_XSec_1Dpmu_nu(std::string inputfile, FitWeight *rw, std::string type, std::string fakeDataFile);
virtual ~T2K_CC1pip_H2O_XSec_1Dpmu_nu() {};
void FillEventVariables(FitEvent *event);
bool isSignal(FitEvent *event);
private:
};
#endif
}}}
where we've copied the constructor arguments from some other class we're using as a template. I'll elaborate more on this [#point_const below].
[=#point_const]
== Making the constructor ==
NUISANCE loads samples through `FCN/SampleList.cxx` in the function `SampleUtils::LoadSample`, which makes a `std::list` of `MeasurementBase` pointers which it later loops over from the executables.
Most of the automated setting up of histograms, titles, and so on is done in `Measurement1D::SetupMeasurement(std::string inputfile, std::string type, FitWeight *rw, std::string fakeDataFile)`. It hence makes sense to have a class constructor which passes all of these on to `SetupMeasurement`.
The arguments to `Measurement1D::SetupMeasurement` are:
* `inputfile`: the location of the input file containing generated events
* `type`: the type of comparison we want to make with this sample
* `rw`: the reweighting engine which may or may not be used to reweight the sample
* `fakeDataFile`: the location of a fake-data input file which is to be used as data instead of the actual data for fake-data studies
which currently all have to be passed. In reality, both `rw` and `fakeDataFile` do nothing in `SetupMeasurement` but are left for legacy reasons. We will likely change this soon!
Looking at the base classes which we are inheriting our sample from (`Measurement1D` and `MeasurementBase`, both in `src/FitBase`), we minimally need to set:
* `std::string fName`: The name of the sample
* `std::string fPlotTitles`: The x and y axes of the sample
* `double EnuMin`: The minimum neutrino energy in the sample signal definition
* `double EnuMax`: The maximum neutrino energy in the sample signal definition
* `TH1D *fDataHist`: The data histogram scanned from the provided input
* `TH1D *fMCHist`: The generator histogram binning with same units as `fDataHist`
* `TH1D *fMCStat`: The generator histogram without any scaling (i.e. N,,events,,)
* `TH1D *fMCFine`: The generator histogram with same units as `fDataHist` but finer binning
* `TH1 **fMCHist_PDF`: The mode-by-mode generator histograms
* `TMatrixDSym *fFullCovar`: The covariance matrix
* `TMatrixDSym *covar`: The inverted covariance matrix (bad naming, sorry!)
* `double fScaleFactor`: The scaling factor we need to go from a event rate to a cross-section (N,,evt,,(p,,mu,,) -> dSig(p,,mu,,))
We can set most of these by calling helper functions in `Measurement1D`, such as
* `Measurement1D::SetupMeasurement`: checks the naming of the sample and sets up the protected/private members, sets up the input generator processing for the sample, sets the fit options
* `Measurement1D::SetDataValues`: sets the data histogram from a given file
* `Measurement1D::SetCovarMatrix`: sets up the covariance matrices needed to calculate likelihoods
* `Measurement1D::SetupDefaultHist`: sets all the generator histograms and their auxillary histograms (e.g. mode-by-mode generator histograms)
or similar functions, e.g. calling `Measurement1D::SetDataFromFile` instead of `Measurement1D::SetDataValues`. '''Feel free to add reasonable setter functions to the base classes if you can't find a suitable one!'''
Current NUISANCE (v1r0) behaviour requires `fName`, `fPlotTitls`, `EnuMin`, `EnuMax` and `fScaleFactor` to be set by user in the implementation file. These are the only variables that need to be supplied by the user in the constructor, granted above helper functions are used.
'''For non-expert use''', it's sufficient to just copy the `fScaleFactor` variable as `(fEventHist->Integral("width")*1E-38)/double(fNEvents)/TotalIntegratedFlux("width");`. This number is applied to the event rate histogram to take us to a cross-section histogram in `Measurement1D::ScaleEvents`.
{{{
#!div class=important
If the cross-section is not in cm^2^/nucleon but instead cm^2^/CH (or any other molecule) you will have to scale the `fScaleFactor` by the number of available target nucleons.
For CCQE, nu,,mu,, interactions only occur off neutrons, which we have 6 of in CH. But the total number of nucleons in CH is 13. So if an experiment reports their cross-section in cm^2^/CH for CCQE nu,,mu,, and our generator output is cm^2^/nucleon we need to do `fScaleFactor*=13./6.`.
If instead an experiment measures a CC1pi^+^ cross-section in `cm^2^/CH`, we scale by a factor 13 because CC1pi^+^ happens on both neutrons and protons.
}}}
It's possible that these will be automated in the future, so please look at [https://nuisance.hepforge.org/trac/browser/src recently implemented samples] for up-to-date references.
{{{
#include "T2K_CC1pip_H2O_XSec_1Dpmu_nu.h"
// The muon momentum
//********************************************************************
T2K_CC1pip_H2O_XSec_1Dpmu_nu::T2K_CC1pip_H2O_XSec_1Dpmu_nu(std::string inputfile, FitWeight *rw, std::string type, std::string fakeDataFile){
//********************************************************************
fName = "T2K_CC1pip_H2O_XSec_1Dpmu_nu";
fPlotTitles = "; p_{#mu} (GeV/c); d#sigma/dp_{#mu} (cm^{2}/(GeV/c)/nucleon)";
EnuMin = 0.;
EnuMax = 10.;
Measurement1D::SetupMeasurement(inputfile, type, rw, fakeDataFile);
// Data comes in ROOT file
// hResultTot is cross-section with all errors
// hResultStat is cross-section with stats-only errors
// hTruthNEUT is the NEUT cross-section given by experimenter
// hTruthGENIE is the GENIE cross-section given by experimenter
SetDataFromFile(GeneralUtils::GetTopLevelDir()+"/data/T2K/CC1pip/H2O/nd280data-numu-cc1pi-xs-on-h2o-2015.root","MuMom/hResultTot");
SetCovarFromDataFile(GeneralUtils::GetTopLevelDir()+"/data/T2K/CC1pip/H2O/nd280data-numu-cc1pi-xs-on-h2o-2015.root", "MuMom/TotalCovariance", true);
SetupDefaultHist();
fScaleFactor = (fEventHist->Integral("width")*1E-38)/double(fNEvents)/TotalIntegratedFlux("width");
};
}}}
== Specifying the event-level dependent variable(s) ==
[=#point_signaldef]
== Specifying a signal definition ==
= The final implementation =
`T2K_CC1pip_H2O_XSec_1Dpmu_nu.cxx`:
{{{
#!cpp
#include "T2K_CC1pip_H2O_XSec_1Dpmu_nu.h"
// The muon momentum
//********************************************************************
T2K_CC1pip_H2O_XSec_1Dpmu_nu::T2K_CC1pip_H2O_XSec_1Dpmu_nu(std::string inputfile, FitWeight *rw, std::string type, std::string fakeDataFile){
//********************************************************************
fName = "T2K_CC1pip_H2O_XSec_1Dpmu_nu";
fPlotTitles = "; p_{#mu} (GeV/c); d#sigma/dp_{#mu} (cm^{2}/(GeV/c)/nucleon)";
EnuMin = 0.;
EnuMax = 10.;
Measurement1D::SetupMeasurement(inputfile, type, rw, fakeDataFile);
// Data comes in ROOT file
// hResultTot is cross-section with all errors
// hResultStat is cross-section with stats-only errors
// hTruthNEUT is the NEUT cross-section given by experimenter
// hTruthGENIE is the GENIE cross-section given by experimenter
SetDataFromFile(GeneralUtils::GetTopLevelDir()+"/data/T2K/CC1pip/H2O/nd280data-numu-cc1pi-xs-on-h2o-2015.root","MuMom/hResultTot");
SetCovarFromDataFile(GeneralUtils::GetTopLevelDir()+"/data/T2K/CC1pip/H2O/nd280data-numu-cc1pi-xs-on-h2o-2015.root", "MuMom/TotalCovariance", true);
SetupDefaultHist();
fScaleFactor = (fEventHist->Integral("width")*1E-38)/double(fNEvents)/TotalIntegratedFlux("width");
};
//********************************************************************
// Find the momentum of the muon
void T2K_CC1pip_H2O_XSec_1Dpmu_nu::FillEventVariables(FitEvent *event) {
//********************************************************************
// Need to make sure there's a muon
if (event->NumFSParticle(13) == 0) return;
// Get the muon
TLorentzVector Pmu = event->GetHMFSParticle(13)->fP;
double p_mu = FitUtils::p(Pmu);
fXVar = p_mu;
return;
};
//********************************************************************
// Beware: The H2O analysis has different signal definition to the CH analysis!
bool T2K_CC1pip_H2O_XSec_1Dpmu_nu::isSignal(FitEvent *event) {
//********************************************************************
return SignalDef::isCC1pip_T2K_H2O(event, EnuMin, EnuMax);
}
}}}
`T2K_CC1pip_H2O_XSec_1Dpmu_nu.h`:
{{{
#ifndef T2K_CC1PIP_H2O_XSEC_1DPMU_NU_H_SEEN
#define T2K_CC1PIP_H2O_XSEC_1DPMU_NU_H_SEEN
#include "Measurement1D.h"
#include "T2K_SignalDef.h"
class T2K_CC1pip_H2O_XSec_1Dpmu_nu : public Measurement1D {
public:
T2K_CC1pip_H2O_XSec_1Dpmu_nu(std::string inputfile, FitWeight *rw, std::string type, std::string fakeDataFile);
virtual ~T2K_CC1pip_H2O_XSec_1Dpmu_nu() {};
void FillEventVariables(FitEvent *event);
bool isSignal(FitEvent *event);
private:
};
#endif
}}}
=== Make NUISANCE aware of the sample ===
FCN/SampleList
=== Add the sample to the makefile ===
CMake