Statistical Methods — Mathematical Theory with Data Science Applications — SMD

  • Instructor:
    Marianna Bolla
    Contact: marib@math.bme.hu
  • Prerequisites: undergraduate calculus and linear algebra, basic probability
  • Text: lecture notes, handouts
  • Reference books:
      G. K. Bhattacharyya, R. A. Johnson: Statistical Concepts and Methods. Wiley, 1992; C. R. Rao: Statistics and Truth. World Scientific, 1997; T. Hastie, R. Tibshirani and J. Friedman: The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Springer, 2001.

Course description: Statistics teaches us how to behave in the face of uncertainties, according to the famous mathematician, Abraham Wald. Theoretically, we will learn strategies of treating chances in everyday life, where our inference is based on a randomly selected sample from a large population, and hence, we intensively use concepts of probability (laws of large numbers, Bayes rule). Estimation theory and hypothesis testing are introduced on a theoretical basis, but applications are also discussed. Methods of supervised and unsupervised learning are outlined; former include regression and discriminant analysis, while latter ones factor and cluster analysis. The students are also made capable of selecting the methods and making inference on real-life data, while outputs of a program package for medical data are analyzed.

Topics:

  • Short overview of probability concepts (sample spaces, random variables, notable distributions, Bayes rule, laws of large numbers, Central Limit Theorem).
  • Basic concepts of estimation theory, methods of point estimation, ML (maximum likelihood) and method of moments, confidence intervals.
  • Inferences about a population, sampling statistics, sufficiency.
  • Basic concepts of hypothesis testing, uniformly most powerful (UMP) test.
  • Parametric inference, comparing two treatments (z, t, F tests).
  • Nonparametric inference: Wilcoxon and sign test, rank statistics.
  • Analyzing categorized data (contingency tables), chi-square test.
  • Supervised learning: regression analysis (linear regression, correlation, model fitting) , analysis of variance, and discriminant analysis.
  • Unsupervised learning: principal component, factor and cluster analysis.
  • Analyzing outputs of a program package for medical data.