﻿ BSM

## Statistical Methods — Mathematical Theory with Data Science Applications — SMD

• Instructor:
Marianna Bolla
Contact: marib@math.bme.hu
• Prerequisites: undergraduate calculus and basic probability
• Text: lecture notes, handouts
• Reference books:
G. K. Bhattacharyya, R. A. Johnson: Statistical Concepts and Methods. Wiley, 1992; C. R. Rao: Statistics and Truth. World Scientific, 1997; Handouts: tables of notable distributions and percentile values of basic test distributions.

Course description: Statistics teaches us how to behave in the face of uncertainties — according to the famous mathematician, Abraham Wald and the book `Statistics and Truth’ of C.R. Rao. Theoretically, we will learn strategies of treating chances in everyday life, where our inference is based on a randomly selected sample from a large population, and hence, we intensively use concepts of probability (laws of large numbers, Bayes rule). Parameter estimation and hypothesis testing (parametric and non-parametric inference) are introduced on a theoretical basis, but applications are intensively discussed and presented on real-life data. Methods of supervised and unsupervised learning are outlined for multivariate data sets; former include regression and discriminant analysis, while latter include factor and cluster analysis. The students are made capable of solving real-world problems by choosing the most convenient method or statistical test. Outputs of the BMDP (biomedical program package) are also analyzed in the classes.

Topics:

• Short overview of probability theory (sample spaces, random variables, notable distributions, Bayes rule, laws of large numbers, Central Limit Theorem).
• Basic concepts of estimation theory, methods of point estimation, ML (maximum likelihood) and method of moments, confidence intervals.
• Inferences about a population, sampling statistics, sufficiency.
• Basic concepts of hypothesis testing, concept of a uniformly most powerful test.
• Parametric inference, comparing two treatments (z, t, F tests).
• Nonparametric inference: Wilcoxon test and sign test.
• Analyzing categorized data (two-way classified tables), chi-square test.
• Introduction to linear models: regression analysis (multivariate linear regression, multiple and partial correlation) and ANOVA (analysis of variance).
• Methods for reducing the dimension: principal component and factor analysis.
• Methods for classification: discriminant and cluster analysis.
• Analyzing outputs of a programs for medical and econometrical data.