Using Spey with experimental data#
Show code cell content
import spey
import numpy as np
import matplotlib.pyplot as plt
In this tutorial we will go over basic functionalities of Spey package. Spey is designed to be used with different plug-ins where each plug-in represents a likelihood prescription. Spey is shipped with set of default likelihood prescriptions which are identified as "default_pdf"
. The list of available likelihood prescriptions can be found using spey.AvailableBackends()
function.
spey.AvailableBackends()
['default_pdf.correlated_background',
'default_pdf.effective_sigma',
'default_pdf.poisson',
'default_pdf.third_moment_expansion',
'default_pdf.uncorrelated_background']
Details on these likelihoods can be found in this link.
Designing custom likelihood prescriptions are also possible, details can be found in the appendix of arXiv:2307.06996 or through this link.
Each likelihood prescription is required to have certain metadata structure which will provide other users the necessary information to cite them. These metadata can be accessed via spey.get_backend_metadata("<plugin>")
function. For instance lets use it for 'default_pdf.third_moment_expansion'
:
spey.get_backend_metadata("default_pdf.third_moment_expansion")
{'name': 'default_pdf.third_moment_expansion',
'author': 'SpeysideHEP',
'version': '0.1.4',
'spey_requires': '0.1.4',
'doi': ['10.1007/JHEP04(2019)064'],
'arXiv': ['1809.05548']}
Example using Default PDFs#
Following data is provided by the CMS-SUS-20-004 analysis where this particular dataset belongs to \(pp\to\tilde{\chi}^0_3\tilde{\chi}^0_2\to HH\tilde{\chi}^0_1\tilde{\chi}^0_1\) process with \(m_{\tilde{\chi}^0_1} = 50\) GeV and \(m_{\tilde{\chi}^0_{3,2}} = 300\) GeV. Utilising this data, we can construct all three types of likelihood prescription shown above. In the following cell we will load the data. This data can be found in this link.
Show code cell content
example_data = np.load("example_data.npz")
# Observed data
observations = example_data["observations"]
# Background yields per bin provided by the experiment
background_yields = example_data["background_yields"]
# Background uncertainty per bin provided by the experiment
background_uncertainty = example_data["background_uncertainty"]
# Covariance matrix for bin-by-bin correlations provided by the experiment
covariance_matrix = example_data["covariance_matrix"]
# Third moments of the background yields per bin provided by the experiment
third_moments = example_data["third_moments"]
# Upper and lower uncertainty envelops per bin provided by the experiment
upper_unc_envelope = example_data["upper_unc_envelope"]
lower_unc_envelope = example_data["lower_unc_envelope"]
# Signal yields per bin
signal_yields = example_data["signal_yields"]
Upper limits and Exclusion level#
We can also compute upper limits on \(\mu\) and exclusion confidence level, \(1-CL_s\), as follows;
print(
f"Observed upper limit on µ at 95% CL: {corr_background_model.poi_upper_limit():.5f}"
)
expected_pval = corr_background_model.poi_upper_limit(
expected=spey.ExpectationType.aposteriori, expected_pvalue="1sigma"
)
muUL_apost = expected_pval[1]
print(
"Expected upper limit on µ with ± 1σ at 95% CL: "
+ ",".join([f"{x:.3f}" for x in expected_pval])
)
Observed upper limit on µ at 95% CL: 1.19567
Expected upper limit on µ with ± 1σ at 95% CL: 0.573,0.766,1.060
print(
f"Observed upper limit on µ at 68% CL: {corr_background_model.poi_upper_limit(confidence_level=0.68):.5f}"
)
Observed upper limit on µ at 68% CL: 0.75275
Compute it by hand:
poi = np.linspace(0, 1.5, 20)
poiUL = np.array(
[
corr_background_model.exclusion_confidence_level(
poi_test=p, expected=spey.ExpectationType.aposteriori
)
for p in poi
]
)
Show code cell source
plt.plot(poi, poiUL[:, 2], color="tab:red")
plt.fill_between(poi, poiUL[:, 1], poiUL[:, 3], alpha=0.8, color="green", lw=0)
plt.fill_between(poi, poiUL[:, 0], poiUL[:, 4], alpha=0.5, color="yellow", lw=0)
plt.plot(
[muUL_apost, muUL_apost],
[0, 1],
ls="dashed",
color="b",
label=f"upper limit on $\\mu$: {muUL_apost:.3f}",
)
plt.plot([0, 10], [0.95, 0.95], color="k", ls="dashed")
plt.xlabel(r"${\rm signal\ strength\ or\ POI }\ (\mu)$")
plt.ylabel("$1-CL_s$")
plt.xlim([0.2, 1.5])
plt.ylim([0.6, 1.01])
plt.text(0.3, 0.96, r"$95\%\ {\rm CL}$")
plt.legend(fontsize=12)
plt.show()
print(
f"Observed exclusion confidence level, 1-CLs: {corr_background_model.exclusion_confidence_level()[0]:.5f}"
)
Observed exclusion confidence level, 1-CLs: 0.87378
exp_cls = corr_background_model.exclusion_confidence_level(
expected=spey.ExpectationType.aposteriori
)
print(
f"Expected exclusion confidence level, 1-CLs ± 1σ: {exp_cls[2]:.4f} - {exp_cls[2]-exp_cls[3]:.4f} + {exp_cls[1]-exp_cls[2]:.4f}"
)
Expected exclusion confidence level, 1-CLs ± 1σ: 0.9900 - 0.0584 + 0.0089
More details on exclusion limit computation can be found in the dedicated section of the online documentation.
Third Moment Expansion#
Proposed in Buckley, Citron, Fichet, Kraml, Waltenberger, Wardle; JHEP ‘18
\(\bar{n}_b^i\) := the central value of the background
\(B_i\) := the effective sigma of the background uncertainty
\(S_i\) := asymmetry of the background uncertainty
More information can be found in this link. As before, lets prepare our statistical model:
Question:
Experimental collaboration provided asymmetric uncertainties but not third moments, can spey compute third moments?
Answer: Yes! one can use compute_third_moments
function which computes third moments using Bifurcated Gaussian
But this is an assumption, if collaboration provides exact third moments, please always use those.
pdf_wrapper = spey.get_backend("default_pdf.third_moment_expansion")
third_mom_expansion_model = pdf_wrapper(
signal_yields=signal_yields,
background_yields=background_yields,
data=observations,
covariance_matrix=covariance_matrix,
third_moment=third_moments,
analysis="cms-sus-20-004-TM",
)
print(third_mom_expansion_model)
StatisticalModel(analysis='cms-sus-20-004-TM', backend=default_pdf.third_moment_expansion, calculators=['toy', 'asymptotic', 'chi_square'])
print(f"Observed upper limit on µ: {third_mom_expansion_model.poi_upper_limit():.5f}")
print(
f"Expected upper limit on µ: {third_mom_expansion_model.poi_upper_limit(expected=spey.ExpectationType.aposteriori):.5f}"
)
print(
f"Observed exclusion confidence level, 1-CLs: {third_mom_expansion_model.exclusion_confidence_level()[0]:.5f}"
)
Observed upper limit on µ: 1.17409
Expected upper limit on µ: 0.81665
Observed exclusion confidence level, 1-CLs: 0.88504
Effective Sigma#
Proposed in Barlow; ‘04 to fit Gaussian distribution on a Poisson distribution with asymmetric uncertainties. In the paper it has been referred as Variable Gaussian technique, here we will use a modified version of this approach.
where \( \sigma_{\rm eff}^i(\theta^i) = \sqrt{\sigma^+_i\sigma^-_i + (\sigma^+_i - \sigma^-_i)(\theta^i - n_b^i)} \). See this link for the documentation of this likelihood prescription. As previous examples, let us initiate the statistical model once more. We will need to convert above covariance matrix into correlation matrix which can be done with covariance_to_correlation
.
Show code cell content
from spey.helper_functions import covariance_to_correlation
pdf_wrapper = spey.get_backend("default_pdf.effective_sigma")
effective_sigma_model = pdf_wrapper(
signal_yields=signal_yields,
background_yields=background_yields,
data=observations,
correlation_matrix=covariance_to_correlation(covariance_matrix),
absolute_uncertainty_envelops=[
(up, dn) for up, dn in zip(upper_unc_envelope, lower_unc_envelope)
],
analysis="cms-sus-20-004-ES",
)
print(effective_sigma_model)
StatisticalModel(analysis='cms-sus-20-004-ES', backend=default_pdf.effective_sigma, calculators=['toy', 'asymptotic', 'chi_square'])
print(f"Observed upper limit on µ: {effective_sigma_model.poi_upper_limit():.5f}")
print(
f"Expected upper limit on µ: {effective_sigma_model.poi_upper_limit(expected=spey.ExpectationType.aposteriori):.5f}"
)
print(
f"Observed exclusion confidence level, 1-CLs: {effective_sigma_model.exclusion_confidence_level()[0]:.5f}"
)
Observed upper limit on µ: 0.92033
Expected upper limit on µ: 0.66874
Observed exclusion confidence level, 1-CLs: 0.96993
Lets compare \(\chi^2(\mu)\) distribution for all three likelihoods
poi = np.linspace(0, 3, 30)
llhd = [corr_background_model.chi2(p) for p in poi]
llhd_tm = [third_mom_expansion_model.chi2(p) for p in poi]
llhd_es = [effective_sigma_model.chi2(p) for p in poi]
Show code cell source
plt.plot(poi, llhd, label=corr_background_model.backend_type)
plt.plot(poi, llhd_tm, label=third_mom_expansion_model.backend_type)
plt.plot(poi, llhd_es, label=effective_sigma_model.backend_type)
plt.legend()
plt.ylabel(r"$\chi^2(\mu)$")
plt.xlabel(r"$\mu$")
plt.xlim([0, 3])
plt.ylim([0, 45])
plt.show()