Version 5 (modified by 8 years ago) (diff) | ,
---|
Page BUILD Built
GENIE NUISMIN
PAGE BEING BUILT
The NUISANCE minimizer application can be used to tune MC models by directly comparing with published scattering data and using ROOT's minimizer libraries to find a best fit parameter set.
Wiki content
- Page BUILD Built
- GENIE NUISMIN
- PAGE BEING BUILT
- Running NUISMIN
- Running a GENIE Minimization
Running NUISMIN
Author: Patrick Stowell
Date: June 2017
Versions: NUISANCE v2r0
, GENIE 2.12.6
The following example details how to run NUISANCE and tune a simple model to MiniBooNE_CCQE_XSec_1DQ2_nu data, with additional examples on how to include penalty terms and perform fake data fits. This example assumes you have at least tried to run the nuiscomp tutorial shown here HowToUseNUISCOMP and understand what the standard NUISANCE sample outputs look like. If you haven't tried the nuiscomp tutorial you should start there first as its relatively quick and an easier place to start.
Each generator requires very slightly different ways to handle NUISANCE, therefore multiple versions of this tutorial have been provided. Please use the following links to choose what generator you would like to use.
Running a GENIE Minimization
In this example we will take pregenerated GENIE events as our starting model and run a tuning to find a best fit value for some of the CCQE Reweight dials in GENIE ReWeight?.
Generating GENIE Events
If we want to see how a given GENIE model behaves, first we need to generate events. This can be done using the standard gevgen application, using the appropriate target and flux for a given data sample.
If all you want to do is check your NUSIANCE is built correctly, you can skip this step by downloading MC files from our online storage area by following the steps found here:LinkToNUISANCEMCFiles
Running gevgen
The standard gevgen application options can be ran using
$ gevgen -h Syntax: gevgen [-h] [-r run#] -n nev -e energy (or energy range) -p neutrino_pdg -t target_pdg [-f flux_description] [-o outfile_name] [-w] [--seed random_number_seed] [--cross-sections xml_file] [--event-generator-list list_name] [--message-thresholds xml_file] [--unphysical-event-mask mask] [--event-record-print-level level] [--mc-job-status-refresh-rate rate] [--cache-file root_file]
We need to provide a flux and target list to GENIE when running gevgen.
We want to run comparisons to MiniBooNE muon-neutrino scattering data on a mineral oil target, therefore to generate events we pass it the default GENIE splines, the MiniBooNE flux in root format, and the required target and beam peg settings
$ source $GENIE_DIR/environment_setup.sh $ export GXMLPATH=${GENIE_DIR}/genie_xsec/v2_12_0/NULL/DefaultPlusMECWithNC/data/ $ gevgen -n 2500000 \ -t 1000060120[0.85714],1000010010[0.14286] \ -p 14 --cross-sections $GXMLPATH/gxspl-FNALsmall.xml \ --event-generator-list Default -f MiniBooNE_numu_flux.root,numu_mb \ -e 0,10 -o gntp.2063030.ghep.root
For more information on how to generate events in GENIE please see: https://arxiv.org/abs/1510.05494
Once our events have been generated we can check that they have finished correctly by opening the output event file and checking it has a ‘gtree’ object and the checking flux spectrum file contains the correct histogram.
$ root gntp.2063030.ghep.root root [0] Attaching file gntp.2063030.ghep.root as _file0... Warning in <TClass::TClass>: no dictionary for class genie::NtpMCEventRecord is available Warning in <TClass::TClass>: no dictionary for class genie::NtpMCRecordI is available Warning in <TClass::TClass>: no dictionary for class genie::NtpMCRecHeader is available Warning in <TClass::TClass>: no dictionary for class genie::EventRecord is available Warning in <TClass::TClass>: no dictionary for class genie::GHepRecord is available Warning in <TClass::TClass>: no dictionary for class genie::Interaction is available Warning in <TClass::TClass>: no dictionary for class genie::InitialState is available Warning in <TClass::TClass>: no dictionary for class genie::Target is available Warning in <TClass::TClass>: no dictionary for class genie::ProcessInfo is available Warning in <TClass::TClass>: no dictionary for class genie::Kinematics is available Warning in <TClass::TClass>: no dictionary for class genie::XclsTag is available Warning in <TClass::TClass>: no dictionary for class genie::KPhaseSpace is available Warning in <TClass::TClass>: no dictionary for class genie::GHepParticle is available Warning in <TClass::TClass>: no dictionary for class pair<genie::EKineVar,double> is available root [1] _file0->ls(); TFile** gntp.2063030.ghep.root TFile* gntp.2063030.ghep.root KEY: genie::NtpMCTreeHeader header;1 GENIE output tree header KEY: TFolder gconfig;1 GENIE configs KEY: TFolder genv;1 GENIE user environment KEY: TTree gtree;1 GENIE MC Truth TTree, Format: [NtpMCEventRecord]
$ root input-flux.root root [0] Attaching file input-flux.root as _file0... root [1] _file0->ls(); TFile** input-flux.root TFile* input-flux.root KEY: TH1D spectrum;1 neutrino_flux
Now that the event samples have been generated correctly, we need to prepare them for use in NUISANCE.
Running PrepareGENIE
The standard gevgen application doesn’t save the total event rate predictions into the event file itself. NUISANCE needs these to correctly normalise predictions so before we can use these new events we need to prepare them.
The PrepareGENIE application is built when NUISANCE is built with GENIE support should be available after the NUISANCE environmental setup script is ran.
$ PrepareGENIE -h PrepareGENIEEvents NUISANCE app. Takes GHep Outputs and prepares events for NUISANCE. PrepareGENIEEvents [-h,-help,--h,--help] [-i inputfile1.root,inputfile2.root,inputfile3.root,...] [-f flux_root_file.root,flux_hist_name] [-t target1[frac1],target2[frac2],...] Prepare Mode [Default] : Takes a single GHep file, reconstructs the original GENIE splines, and creates a duplicate file that also contains the flux, event rate, and xsec predictions that NUISANCE needs. Following options are required for Prepare Mode: [ -i inputfile.root ] : Reads in a single GHep input file that needs the xsec calculation ran on it. [ -f flux_file.root,hist_name ] : Path to root file containing the flux histogram the GHep records were generated with. A simple method is to point this to the flux histogram genie generatrs '-f /path/to/events/input-flux.root,spectrum'. [ -t target ] : Target that GHepRecords were generated with. Comma seperated list. E.g. for CH2 target=1000060120,1000010010,1000010010
The PrepareGENIE application, when ran, loops over all the events, extracts the cross-section as a function of energy for each discrete interaction mode and uses this information to reconstruct the cross-section splines for each target that were used to generate events.
These splines are then multiplied by specified flux and added according to the target definition provided to produce total flux and event rate predictions as a function of energy for the sample and saves them into the events file.
We want to prepare our MiniBooNE events so we pass in the event files, the input flux, and the CH2 target definition.
$ PrepareGENIE -i gntp.2063030.ghep.root -f input-flux.root,spectrum -t 1000060120,1000010010,1000010010
Now when we open our event file again, we should see the flux and event rate histograms are now saved into the file ready for NUISANCE to read them.
KEY: genie::NtpMCTreeHeader header;1 GENIE output tree header KEY: TFolder gconfig;1 GENIE configs KEY: TFolder genv;1 GENIE user environment KEY: TTree gtree;1 GENIE MC Truth TTree, Format: [NtpMCEventRecord] KEY: TDirectoryFile IndividualGENIESplines;1 IndividualGENIESplines KEY: TDirectoryFile TargetGENIESplines;1 TargetGENIESplines KEY: TH1F nuisance_xsec;1 KEY: TH1F nuisance_events;1 KEY: TH1F nuisance_flux;1
Choosing our samples
Now that we have an event sample we can load them load them into NUISANCE so that it can use them to form a joint likelihood by specifying them at run time.
Writing a card file
To specify samples we need to write a NUISANCE card file that lists all comparisons that should be made and the event files that should be used for each one.
We want to produce comparisons to MiniBooNE CCQE data, so first we should search the NUISANCE sample list.
The ‘nuissamples’ script is provided for easy access of the sample list. Running it without any arguments will return a full sample list of available data comparisons. Providing an additional argument will return only samples containing the provided substring.
We can list the MIniBooNE samples using
[stowell@hepgw1 ~]$ nuissamples MiniBooNE MiniBooNE_CCQE_XSec_1DQ2_nu MiniBooNE_CCQELike_XSec_1DQ2_nu MiniBooNE_CCQE_XSec_1DQ2_antinu MiniBooNE_CCQELike_XSec_1DQ2_antinu MiniBooNE_CCQE_CTarg_XSec_1DQ2_antinu MiniBooNE_CCQE_XSec_2DTcos_nu MiniBooNE_CCQELike_XSec_2DTcos_nu MiniBooNE_CCQE_XSec_2DTcos_antinu MiniBooNE_CCQELike_XSec_2DTcos_antinu MiniBooNE_CC1pip_XSec_1DEnu_nu MiniBooNE_CC1pip_XSec_1DQ2_nu MiniBooNE_CC1pip_XSec_1DTpi_nu MiniBooNE_CC1pip_XSec_1DTu_nu MiniBooNE_CC1pip_XSec_2DQ2Enu_nu MiniBooNE_CC1pip_XSec_2DTpiCospi_nu MiniBooNE_CC1pip_XSec_2DTpiEnu_nu MiniBooNE_CC1pip_XSec_2DTuCosmu_nu MiniBooNE_CC1pip_XSec_2DTuEnu_nu MiniBooNE_CC1pi0_XSec_1DEnu_nu MiniBooNE_CC1pi0_XSec_1DQ2_nu MiniBooNE_CC1pi0_XSec_1DTu_nu MiniBooNE_CC1pi0_XSec_1Dcosmu_nu MiniBooNE_CC1pi0_XSec_1Dcospi0_nu MiniBooNE_CC1pi0_XSec_1Dppi0_nu MiniBooNE_NC1pi0_XSec_1Dcospi0_antinu MiniBooNE_NC1pi0_XSec_1Dcospi0_rhc MiniBooNE_NC1pi0_XSec_1Dcospi0_nu MiniBooNE_NC1pi0_XSec_1Dcospi0_fhc MiniBooNE_NC1pi0_XSec_1Dppi0_antinu MiniBooNE_NC1pi0_XSec_1Dppi0_rhc MiniBooNE_NC1pi0_XSec_1Dppi0_nu MiniBooNE_NC1pi0_XSec_1Dppi0_fhc MiniBooNE_NCEL_XSec_Treco_nu
We only care about CC1pip data therefore the following samples are of interest
[stowell@hepgw1 ~]$ nuissamples MiniBooNE_CCQE_ MiniBooNE_CCQE_XSec_1DQ2_nu MiniBooNE_CCQE_XSec_1DQ2_antinu MiniBooNE_CCQE_CTarg_XSec_1DQ2_antinu MiniBooNE_CCQE_XSec_2DTcos_nu MiniBooNE_CCQE_XSec_2DTcos_antinu
In this example we will compare to the CCQE 1DQ2 distributions, but we could provide a large number of the samples seen in the lists if we wanted. When we specify multiple samples nuismin will automatically calculate the likelihood for each one and add them uncorrelated to form a joint likelihood.
We write our card file with these two datasets using the following sample object format:
sample NAME_OF_SAMPLE INPUT_TYPE:FILE_INPUT
- NAME_OF_SAMPLE : Name of the sample we found using nuissamples
- INPUT_TYPE : Type of the input file we are using (e.g. GENIE)
- FILE_INPUT : Path to the input MC event we want to use for this sample.
- OPTION : (Optional Argument) Option that can be used to change sample behaviour at runtime. By default this is left as DEFAULT.
- NORM_VALUE : (Optional Argument) Start value of normalisation parameter used to change the MC normalisation for this sample. By default this is left at 1.0.
For further examples on how to include these structures in card files please see Card File Examples.
So if our GENIE file is called 'MiniBooNE_FHC_numu_2.5M.root', our card file will be :
genie_tutorial.card
sample MiniBooNE_CCQE_XSec_1DQ2_nu GENIE:MiniBooNE_FHC_numu_2.5M.root sample MiniBooNE_CCQE_XSec_1DQ2_antinu GENIE:MiniBooNE_FHC_numu_2.5M.root
If you want to check this is valid you could run this card file with 'nuiscomp' as shown in the tutorial here HowToUseNUISCOMP-GENIE and check the nominal prediction looks okay.
Reweighting GENIE Comparisons
The minimiser application allows different GENIE reweight parameters to be provided in our card file and floated freely until the likelihood is maximised.
The format for fixed dial values is:
genie_parameter NAME VALUE STATE
or for free dial variations
genie_parameter NAME VALUE LOW HIGH STEP STATE
- NAME : Specifies the name of the reweight dial. These can be found in '$GENIE/src/ReWeight/GSyst.cxx'
- VALUE: Value of the reweight dial in units of 1-sigma variations (1-sigma defined by GENIE)
- LOW : Lower limit that the dial is allowed to float to.
- HIGH : Upper limit that the dial is allowed to float to.
- STEP : Step size passed to Minuit.
- STATE: State of this parameter dial, for most cases this should be left as 'FIX'.
For further examples on how to include these structures in card files please see Card File Examples.
In our example we want to tune the following parameters
- Charged Current Quasielastic Axial Mass : Axial mass parameter used in the quasielastic form factor
- Charged Current Quasielastic Normalisation : Total normalisation of CCQE events.
First we look for the possible name in GENIE reweight:
$GENIE/src/ReWeight/GSyst.h
class GSyst { public: //...................................................................................... static string AsString(GSyst_t syst) { switch(syst) { case ( kXSecTwkDial_MaNCEL ) : return "MaNCEL"; break; case ( kXSecTwkDial_EtaNCEL ) : return "EtaNCEL"; break; case ( kXSecTwkDial_NormCCQE ) : return "NormCCQE"; break; case ( kXSecTwkDial_NormCCQEenu ) : return "NormCCQEenu"; break; case ( kXSecTwkDial_MaCCQE ) : return "MaCCQE"; break; case ( kXSecTwkDial_MaCCQEshape ) : return "MaCCQEshape"; break; case ( kXSecTwkDial_ZNormCCQE ) : return "ZNormCCQE"; break; ...
We can see the dials we are interested in this list, specified by the strings: 'MaCCQE' and 'NormCCQE' respectively.
Now we want to change include these dials in our card file. To start with we will try tuning a single parameter, keeping NormCCQE fixed at +0.1 sigma instead. Although we are keeping it fixed, we define the limits on NormCCQE anyway, in case we want to vary it later.
genie_parameter MaCCQE 0.0 -3.0 3.0 1.0 FREE genie_parameter NormCCQE 0.1 -3.0 3.0 1.0 FIX
So now our card file looks like the following:
genie_tutorial.card
genie_parameter MaCCQE 0.0 -3.0 3.0 1.0 FREE genie_parameter NormCCQE 0.1 -3.0 3.0 1.0 FIX sample MiniBooNE_CCQE_XSec_1DQ2_nu GENIE:MiniBooNE_FHC_numu_2.5M.root sample MiniBooNE_CCQE_XSec_1DQ2_antinu GENIE:MiniBooNE_FHC_numu_2.5M.root
Running the minimiser (Migrad)
Now that we have a prepared card file from the previous section, we can now run the minimiser.
Lets start it running, we will use a smaller number of events for this example so we can get some idea of the behaviour. We do this by passing the '-n' option at the command line.
WARNING this is only so you can quickly see the results of minuit and check that the minimiser and your card file is working correctly. For serious fitting you should never use the '-n' flag.
$ nuismin -c genie_tutorial.card -o genie_minuit_maccqe-free_normccqe-fixed.root -n 50000
Now that we have left that running in the background, lets quickly discuss how Migrad works and some options you have to make NUISANCE run a bit quicker.
How Migrad works
Migrad is the default minimisation routine that nuismin uses. It attempts to find a best fit parameter set that maximises a likelihood (minimises a chi-squared value). When we run nuismin, the NUISANCE routines classes tells Migrad that the parameters we have defined as FREE should be treated as registered as free variables and the samples we have listed should be used to form a likelihood.
Then when Migrad is run, it will step around the parameter space provided to it, calculate a likelihood given the parameters at the new step, and use a gradient descent algorithm to work its way to a best fit parameter set. Each of these 'steps' is referred to as an iteration, and usually a large number of iterations are required so that Migrad can find a reliable minimum.
To calculate the likelihood at each step NUISANCE has to recalculate all of the MC predictions for the given parameter set at the step, before using them to calculate a data-MC likelihood. Since we have to loop over the event sample at each iteration, this can take a while to actually converge, hence why for this example we have restricted the number of events.
Eventually when Migrad does think it has found a best fit point, it will step around the minimum checking that it is a good valid minimum, and assuming Gaussian statistics, calculate errors and correlations on each of the parameters.
One issue that a user should be aware of is that Migrad can sometimes fall into local minima, which is especially more likely if the likelihood surface is non-linear. This can result in bias fit results, and it is advisable that extra closure tests are performed by start the minimiser from different starting values (changing the NOMINAL value in the parameter card file definition) and checking that the fit results are stable.
Signal Reconfigures
The fact the Migrad has to iterate many times can be very time consuming. With large event samples it is not uncommon for tunings with multiple parameters to take days since to calculate a likelihood for a given iteration NUISANCE has to recalculate all the MC predictions for the updated parameter set before comparing the data to MC. This involves reading every single event in our samples at each iteration, recalculating event weights using our reweight engines, filling each of the histograms with the updated weight, and then finally calculating a likelihood for all samples included. The two most time consuming pieces of this iteration are the reading and event weight calculation stages, and as such a lot of event handling functionality has been added to the core classes of NUISANCE to minimise iteration time. This is referred to as the 'EventManager?'.
The EventManager?'s job is to keep a global list of all possible inputs and what samples they are assigned to, making sure that no event is read from disk twice or event weight is calculated twice during a single iteration. When the EventManager? is switched on 'config EventManager? 1' which it should be by default, during a single event loop, first the EventManager? will read an event and calculate a weight before distributing that event information to only the relevant samples saving a significant amount of time compared to performing an event loop for each sample individually.
To increase the speed of the EventManager? even further a 'SignalReconfigures?=1' option has been added which is not enabled by default, but that makes sure that only events that have been flagged as signal by the sample classes actually get read from disk and weights calculated. This mode is available but has not been extensively tested with all samples yet, so use it with caution.
Saving Nominal Prediction
A configuration option is alose provided that forces NUISANCE to run an additional reconfigure step when starting nuismin that will save the starting MC predictions into a seperate folder in the output file called 'nominal/', so that pre-fit vs post-fit comparisons can be made.
This can be turned on by using the configuration option
config savenominal 1
Analysing the output
You'll notice that some of the nuismin output file is very similar to the output of nuiscomp. The nuismin application builds on those outputs by adding additional information on the fit results.
Data/MC Comparisons
The data/MC comparisons plots (e.g. MiniBooNE_CCQE_XSec_1DQ2_nu_MC) should have the same structure as that of the nuiscomp app, with one slight difference. The MC plots should 'hopefully' look a lot more like the data now.
nuismin should save the MC prediction at the best fit point of the minimisation into the output file, so if we compare the result to the same configuration ran through nuiscomp (before tuning) we should see an improvement.
If there is not an improvement then it means the minimiser failed in some way! Comparing before and after is a very qualitative way to understand the fitter progress however. Instead nuismin also saves a bunch of handy information detailing the steps taken along the way and the best fit results, detailed in the next sections.