(Also known as: how to get the most from your data without making a fool of yourself.)
Term:
September 2023
Lecturer: Scott
Oser
Class coordinates: Tuesdays/Thursdays,
14:00-15:15 Pacific time, in person, in Hennings 302
Office
Hours: Mondays, 12:00-13:00 Pacific time by Zoom (for connection
information, see Canvas
page for course, or contact instructor)
TA: Aditi
Pradeep
Topics covered: Interpretation of probability; basic descriptive statistics; common probability distributions; Monte Carlo methods; Bayesian analysis; methods of error propagation; systematic uncertainties; parameter estimation; hypothesis testing and statistical significance; confidence intervals; blind analyses; methods of multivariate analysis; non-parametric tests; periodicity searches; "robust" statistics; deconvolution and unfolding
Prerequisites: Officially, none. However, you will be expected to have some facility with computational techniques and programming in a high-level language, or at least a willingness to learn very quickly. Quite simply, it's not possible to do much data analysis or statistics without being able to program. Almost all homework assignments will have a large computational component, although this class will not teach programming per se. If you don't already know basic computational physics, your time might be better spent taking Physics 210 or Physics 410 instead.
Textbooks: There are two textbooks for this course:
Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, by Roger Barlow
Bayesian Logical Data Analysis for the Physical Sciences, by P.C. Gregory (available online from UBC library here)
Each has a different focus with different strengths and weaknesses, and I'll draw on material from both.
Supplemental material: You may also find these books enlightening:
Numerical Recipes, by William H. Press
Statistical Data Analysis, by Glen Cowan
Practical Statistics for Astronomers, by J.V. Wall and C.R. Jenkins
Probability and Statistics, by Morris H. DeGroot
Your grade will be determined by:
Homework |
30% |
Midterm |
35% |
Final Exam |
35% |
Homework: There will be
approximately five lengthy homework assignments. They will generally
include analytic calculations, essay-type questions, and
computational problems that will require you to analyze data sets and
most usually to write some computer code to do so. You are welcome to
discuss problems informally with your classmates. However, you must
complete the assignment yourself, and if you hand in obviously copied
homework, you should expect a mark of zero on that assignment, or
worse. Assignments are due by 22:00 Pacific time on the day they are
due. See Lecture 1 for detailed instructions on how to submit
assignments through Canvas.
Useful software: This course will require some computational facility on your part. The entire course can be done using free software, and you're not required to buy anything. The most important things you'll need are access to a good plotting package and a library of scientific routines (capable of random number generation, non-linear fitting, and matrix operations at a minimum). I encourage you to use whatever tools your field uses or that you already know, but if you want some recommendations, you may find the following useful:
ROOT: a combined plotting/analysis package developed by the high energy physics community (but of general utility), based on a C++ interpreter. Extremely powerful, with decent tutorials available. Free. Includes most numerical routines you might want, and since it's based on C++ it can work with other libraries or code as well.
gnuplot: a free plotting package with some basic fitting capability (although not enough to do every HW problem). This might be a good option if you're writing standalone code in C/C++/FORTRAN and just need a way of plotting the output.
Mathematica: an integrated plotting and mathematical analysis package. Quite expensive (prohibitively so if you're not a student).
GNU Scientific Library: a free library of computational routines. To some extent it is a freeware equivalent of the routines in Numerical Recipes. Available in C and C++.
Numerical Recipes: Very commonly used. Although the text of the book is available online for free, the routines are proprietary, and you're supposed to buy the book if you use any of the routines. Available in C, C++, and FORTRAN.
numpy, scipy, and matplotlib: For the python fans among you. May God have mercy on your soul.
Programming languages: It's up to you to choose what programming language you feel most comfortable with. These days I'd generally recommend C++ or python, as they are quickly coming to dominate many areas of the physical sciences, and most libraries of scientific routines are available in those languages. But I will confess that I am personally still much more fluent in FORTRAN and C. If you want to use something besides C++, C, or FORTRAN, please be my guest, but I won't be able to offer you much specific advice on coding issues. Not that I will anyway.
Missed exams: There will
be one in-class (timed) midterm exam. If you miss the exam with a
legitimate excuse (proof of illness, family emergency, etc), see me
to discuss make-up options.
Religious holidays: Students
are entitled to request an alternate test date if a scheduled test
date falls on one of their holy days. If you think this may apply to
you, please contact me as soon as possible to make an alternate
arrangement. Please don't put this off until the last minute---you
must give at least two week's notice.
FINAL EXAM: We
will have a take-home final exam. The final
exam will be posted on Canvas at 12:00 PST on December 16, and will
be due on December 19 at 12:00 PST. NO LATE SUBMISSIONS!
COVID safety: For our in-person meetings of this class, it is important that all of us feel as comfortable as possible engaging in class activities while sharing an indoor space. Good quality masks (e.g. N95) that cover our noses and mouths are a primary tool to make it harder for COVID-19 to find a new host. Wearing masks in indoor settings is strongly recommended by public health authorities. If you have not yet had a chance to get vaccinated against COVID-19, vaccines are available to you, free of charge. The higher the rate of vaccination in our community overall, the lower the rate of spread of this virus. You are an important part of the UBC community. Please arrange to get vaccinated if you have not already done so.
If you’re sick, it’s important that you stay home – no matter what you think you may be sick with (e.g., cold, flu, other). If you think you might have COVID symptoms and/or have tested positive for COVID and/or are required to self-quarantine. You can do a self-assessment for COVID symptoms here: https://bc.thrive.health/covid19/en
Do not come to class if you are sick, have COVID symptoms, have recently tested positive for COVID, or are required to quarantine. This precaution will help reduce risk and keep everyone safer. I will not be taking attendance or awarding participation marks, and all lecture notes are available on this page. If you are sick on the day of the in-class midterm exam, stay home --- I will gladly arrange a make-up exam for you.
A word on UBC policies: UBC provides resources to support student learning and to maintain healthy lifestyles but recognizes that sometimes crises arise and so there are additional resources to access including those for survivors of sexual violence. UBC values respect for the person and ideas of all members of the academic community. Harassment and discrimination are not tolerated nor is suppression of academic freedom. UBC provides appropriate accommodation for students with disabilities and for religious, spiritual and cultural observances. UBC values academic honesty and students are expected to acknowledge the ideas generated by others and to uphold the highest academic standards in all of their actions. Details of the policies and how to access support are available here.
Syllabus: The lecture
schedule follows. It may be adjusted as the course proceeds.
Lecture # |
Date |
Topics Covered |
Reading Material (Textbook Sections) |
Assignment Due |
9/5 |
First day of class. Introduction; Interpretations of probability |
B7.1; G1.1-1.4 |
|
|
|
9/7 |
NO CLASS --- TA training day |
|
|
9/12 |
Basic descriptive statistics; random variables; Gaussian and binomial distributions |
B2.1-2.6; B3.1-3.2 |
|
|
9/14 |
Poisson, exponential, and chi^2 distributions; mathematics of manipulating probability distributions |
B3.3-3.5 |
|
|
9/19 |
Monte Carlo and basic computational methods: random number generation, minimization routines, coding hints |
B10.1-10.4 |
|
|
9/21 |
Intro to Bayesian analysis: general principles, basic applications, contrast with frequentist approach, nuisance parameters and systematic uncertainties |
G Ch 3-4 |
HW1 (due Sep 22) |
|
9/26 |
Bayesian analysis: choice of priors, maximum entropy principles |
G Ch 4, 8 |
|
|
9/28 |
The central limit theorem; the Chebyshev limit; covariance matrices and multidimensional Gaussian distributions |
B4.1-4.4 |
|
|
10/3 |
Estimators I: introduction & maximum likelihood method |
B5.1-5.4 |
|
|
10/5 |
Estimators II: least squares methods |
B5.5-5.6, B6.1-6.7 |
|
|
10/10 |
Error propagation methods: meaning and interpretation of error bars; the error propagation equation; dealing with correlations; handling asymmetric and non-Gaussian errors |
B4.1-4.4 |
|
|
|
10/12 |
NO CLASS – “Makeup Monday”. Attend your regular Monday classes instead. |
|
HW2 |
|
10/17 |
IN-CLASS MIDTERM |
|
|
10/19 |
Systematic Uncertainties I: distinction or lack thereof between statistical and systematic uncertainties; Monte Carlo evaluation; covariance matrix approach |
B4.1-4.4 |
|
|
10/24 |
Systematic Uncertainties II: the pull method/"floating systematics", how to evaluate systematics; common mistakes in systematic error propagation |
B4.1-4.4 |
|
|
10/26 |
Hypothesis/significance testing I: introduction, interpretation, significance and power, Neyman-Pearson lemma; trials factors |
B8.1-8.2.2 |
|
|
10/31 |
Hypothesis/significance testing II: likelihood ratio test, goodness of fit, Kolmogorov-Smirnov tests, the two-sample problem and the t-test |
B8.2.3-8.4 |
|
|
|
11/2 |
DISCUSSION DAY |
|
HW3 |
11/7 |
Periodicity studies |
G Appendix B, G Ch 13, + this paper |
|
|
11/9 |
Bayesian analysis: Numerical methods---Laplace's approximation, methods of marginalizing over nuisance parameters, numerical integration, Markov Chain Monte Carlo and the Metropolis-Hastings algorithm |
G11-12 |
|
|
|
11/14 |
NO CLASS – Fall Break |
|
|
11/16 |
Confidence regions: Bayesian and frequentist interpretations; non-physical regions; Feldman-Cousins confidence intervals |
B7.2, this paper |
|
|
11/21 |
Multivariate analysis: linear Fisher discriminants; likelihood ratio approximations; decision trees; machine learning |
class notes |
|
|
11/23 |
DISCUSSION DAY. Also, please read the attached notes and paper on blind analyses. |
HW4 |
||
11/28 |
Non-parametric tests: sign test for the median; the Mann-Whitney test; matched pairs; Spearman's correlation coefficient; run tests |
B8.3.2-8.3.3, B9.1-9.3 |
|
|
11/30 |
Robust methods of parameter estimation; bootstrap method |
Numerical Recipes 15.7; class notes |
|
|
12/5 |
Deconvolution and unfolding |
class notes; see also supplemental text Cowan, Ch 11. |
|
|
12/7 |
Kernel density estimation |
HW5 (due Dec 11) |
Scott Oser (email me) October 25, 2023