'; print ''; ?>

Sergey Nikolenko

Main page Books'; print '
Research papers'; print '
Talks and posters'; print '
Students'; print '
Popular science'; print '
Other stuff'; print '

Research'; print '
CS and crypto'; print '
Bioinformatics'; print '
Machine learning'; print '
Algebraic geometry'; print '
Algebra'; print '
Bayesian networks'; print '
Earth sciences'; print '

Teaching'; print '
2014'; print '
ML, KFU'; print '
Game Theory, HSE'; print '
Mech. Design, HSE'; print '
ML, CSClub Kazan'; print '
Game theory, HSE'; print '
Math. logic, AU'; print '
Machine learning, STC'; print '
Machine learning, AU'; print '
2013'; print '
Discrete math, HSE'; print '
Machine learning, STC'; print '
Math. logic, AU'; print '
Cryptography, AU'; print '
2012'; print '
Machine learning, STC'; print '
Math. logic, AU'; print '
Machine learning II, AU'; print '
Machine learning, AU'; print '
Machine learning, EMC'; print '
2011'; print '
Cryptography, AU'; print '
Math. logic, AU'; print '
Machine learning, AU'; print '
2010'; print '
Math. logic, AU'; print '
Machine learning, AU'; print '
Cryptography, AU'; print '
2009'; print '
Crypto in CS Club'; print '
Statistics'; print '
Machine learning, AU'; print '
Cryptography'; print '
2008'; print '
Speech recognition'; print '
MD for CS Club'; print '
ML for CS Club'; print '
Mechanism design'; print '
2007'; print '
Machine Learning'; print '
Probabilistic learning'; print '

External links'; print '
Google Scholar profile'; print '
DBLP profile'; print '
LiveJournal account
nikolenko (in Russian)

Teaching activities

Machine Learning at the Kazan Federal University, 2014

This is a semester-long machine learning course presented at the Kazan Federal University with financial aid from the Dynasty Foundation; see also the course page at the CSClub website.

The course itself (all slides and lecture notes are in Russian):

1. Introduction. History of AI. Probability theory basics. Bayes' theorem and maximal a posteriori hypotheses.: Slides ()
2. Probability distributions. Bernoulli trials. Maximum likelihood, ML estimates for Bernoulli trials and multinomial distribution. Prior distributions, conjugate priors. Beta distribution as a conjugate prior for Bernoulli trials. Predictive distribution: Laplace's rule. Dirichlet distribution as a conjugate prior for multinomial distributions.
3. Gaussian distribution. Maximum likelihood estimates for the Gaussian; why the ML estimate for variance is biased. Multidimensional Gaussian. Conditional and marginal Gaussians.: Slides for lectures 2-3 ()
4. Least squares regression. Least squares as an ML estimate for Gaussian noise.: Slides ()
5. Overfitting. Regularization. Ridge regression and lasso regression. Predictive distribution for linear regression. Classification: 1-of-K representation, linear decision functions. Fischer's linear discriminant.: Slides ()
6. Bayes theorem for classification. LDA and QDA. Logistic regression.: Slides ()
7. Statistical decision theory. Regression function, optimal Bayesian classifier. Nearest neighbors. Curse of dimensionality. Bias-variance-noise decomposition.: Slides ()
8. Reinforcement learning: multiarmed bandits. Greedy policies, exploration vs. exploitation. Confidence intervals. Minimizing regret: UCB1.: Slides ()
9. Reinforcement learning: Markov decision processes. On-policy and off-policy learning. TD-learning. Machine learning in games (backgammon, chess, go).: Slides ()
10. Clustering. Hierarchical clustering, graph-based clustering. The EM algorithm. EM in general, minorization-maximization, why EM improves the likelihood. EM for clustering.: Slides ()
11. Hidden Markov models. Baum-Welch algorithm. Applications of hidden Markov models to speech recognition.: Slides ()
12. Probabilistic graphical models: basic idea, factorizations, d-separation. Directed and undirected models. Factor graphs.: Slides ()
13. Inference on factor graphs. Belief propagation with the message passing algorithm.: Slides ()
14. Case study: Bayesian rating systems. Bradley–Terry models. Expectation Propagation, TrueSkill, and its extensions.: Slides ()
15. Approximate inference in PGMs. Loopy belief propagation. Variational approximations (idea).
16. Sampling and approximate inference with sampling. Markov chain Monte Carlo methods.: Slides ()
17. Case study: text mining. Naive Bayes. Latent Dirichlet allocation and its extensions.: Slides ()
18. Support vector machines. Kernel trick for SVMs.: Slides ()
19. Case study: recommender systems. Nearest neighbors: user-based and item-based. Locality sensitive hashing.
20. Case study: recommender systems. SVD extensions. Additional information in recommender systems. Course review.: Slides for lectures 19-20 ()

Selected references.

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, Information Science and Statistics series, 2006.
Kevin Murphy. Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.