Statistical Methods — Mathematical Theory with Data Science Applications — SMD

  • Instructor:
    Marianna Bolla
  • Prerequisites: undergraduate calculus and basic probability
  • Text: lecture notes, handouts
  • Reference books:
      G. K. Bhattacharyya, R. A. Johnson: Statistical Concepts and Methods. Wiley, 1992; C. R. Rao: Statistics and Truth. World Scientific, 1997; Handouts: tables of notable distributions and percentile values of basic test distributions.

Course description: Statistics teaches us how to behave in the face of uncertainties — according to the famous mathematician, Abraham Wald and the book `Statistics and Truth’ of C.R. Rao. Theoretically, we will learn strategies of treating chances in everyday life, where our inference is based on a randomly selected sample from a large population, and hence, we intensively use concepts of probability (laws of large numbers, Bayes rule). Parameter estimation and hypothesis testing (parametric and non-parametric inference) are introduced on a theoretical basis, but applications are intensively discussed and presented on real-life data. Methods of supervised and unsupervised learning are outlined for multivariate data sets; former include regression and discriminant analysis, while latter include factor and cluster analysis. The students are made capable of solving real-world problems by choosing the most convenient method or statistical test. Outputs of the BMDP (biomedical program package) are also analyzed in the classes.


  • Short overview of probability theory (sample spaces, random variables, notable distributions, Bayes rule, laws of large numbers, Central Limit Theorem).
  • Basic concepts of estimation theory, methods of point estimation, ML (maximum likelihood) and method of moments, confidence intervals.
  • Inferences about a population, sampling statistics, sufficiency.
  • Basic concepts of hypothesis testing, concept of a uniformly most powerful test.
  • Parametric inference, comparing two treatments (z, t, F tests).
  • Nonparametric inference: Wilcoxon test and sign test.
  • Analyzing categorized data (two-way classified tables), chi-square test.
  • Introduction to linear models: regression analysis (multivariate linear regression, multiple and partial correlation) and ANOVA (analysis of variance).
  • Methods for reducing the dimension: principal component and factor analysis.
  • Methods for classification: discriminant and cluster analysis.
  • Analyzing outputs of a programs for medical and econometrical data.