'; print ''; ?>

Sergey Nikolenko

Sergey Nikolenko

Main pageBooks'; print '
Research papers'; print '
Talks and posters'; print '
Students'; print '
Popular science'; print '
Other stuff'; print '

   Research'; print '
CS and crypto'; print '
Bioinformatics'; print '
Machine learning'; print '
Algebraic geometry'; print '
Algebra'; print '
Bayesian networks'; print '
Earth sciences'; print '

   Teaching'; print '
 2014'; print '
ML, KFU'; print '
Game Theory, HSE'; print '
Mech. Design, HSE'; print '
ML, CSClub Kazan'; print '
Game theory, HSE'; print '
Math. logic, AU'; print '
Machine learning, STC'; print '
Machine learning, AU'; print '
 2013'; print '
Discrete math, HSE'; print '
Machine learning, STC'; print '
Math. logic, AU'; print '
Cryptography, AU'; print '
 2012'; print '
Machine learning, STC'; print '
Math. logic, AU'; print '
Machine learning II, AU'; print '
Machine learning, AU'; print '
Machine learning, EMC'; print '
 2011'; print '
Cryptography, AU'; print '
Math. logic, AU'; print '
Machine learning, AU'; print '
 2010'; print '
Math. logic, AU'; print '
Machine learning, AU'; print '
Cryptography, AU'; print '
 2009'; print '
Crypto in CS Club'; print '
Statistics'; print '
Machine learning, AU'; print '
Cryptography'; print '
 2008'; print '
Speech recognition'; print '
MD for CS Club'; print '
ML for CS Club'; print '
Mechanism design'; print '
 2007'; print '
Machine Learning'; print '
Probabilistic learning'; print '

  External links'; print '
Google Scholar profile'; print '
DBLP profile'; print '
LiveJournal account
userinfonikolenko (in Russian)

Teaching activities

Machine Learning at the Kazan Federal University, 2014

This is a semester-long machine learning course presented at the Kazan Federal University with financial aid from the Dynasty Foundation; see also the course page at the CSClub website.

The course itself (all slides and lecture notes are in Russian):

1. Introduction. History of AI. Probability theory basics. Bayes' theorem and maximal a posteriori hypotheses.
Slides ()
2. Probability distributions. Bernoulli trials. Maximum likelihood, ML estimates for Bernoulli trials and multinomial distribution. Prior distributions, conjugate priors. Beta distribution as a conjugate prior for Bernoulli trials. Predictive distribution: Laplace's rule. Dirichlet distribution as a conjugate prior for multinomial distributions.
3. Gaussian distribution. Maximum likelihood estimates for the Gaussian; why the ML estimate for variance is biased. Multidimensional Gaussian. Conditional and marginal Gaussians.
Slides for lectures 2-3 ()
4. Least squares regression. Least squares as an ML estimate for Gaussian noise.
Slides ()
5. Overfitting. Regularization. Ridge regression and lasso regression. Predictive distribution for linear regression. Classification: 1-of-K representation, linear decision functions. Fischer's linear discriminant.
Slides ()
6. Bayes theorem for classification. LDA and QDA. Logistic regression.
Slides ()
7. Statistical decision theory. Regression function, optimal Bayesian classifier. Nearest neighbors. Curse of dimensionality. Bias-variance-noise decomposition.
Slides ()
8. Reinforcement learning: multiarmed bandits. Greedy policies, exploration vs. exploitation. Confidence intervals. Minimizing regret: UCB1.
Slides ()
9. Reinforcement learning: Markov decision processes. On-policy and off-policy learning. TD-learning. Machine learning in games (backgammon, chess, go).
Slides ()
10. Clustering. Hierarchical clustering, graph-based clustering. The EM algorithm. EM in general, minorization-maximization, why EM improves the likelihood. EM for clustering.
Slides ()
11. Hidden Markov models. Baum-Welch algorithm. Applications of hidden Markov models to speech recognition.
Slides ()
12. Probabilistic graphical models: basic idea, factorizations, d-separation. Directed and undirected models. Factor graphs.
Slides ()
13. Inference on factor graphs. Belief propagation with the message passing algorithm.
Slides ()
14. Case study: Bayesian rating systems. Bradley–Terry models. Expectation Propagation, TrueSkill, and its extensions.
Slides ()
15. Approximate inference in PGMs. Loopy belief propagation. Variational approximations (idea).
16. Sampling and approximate inference with sampling. Markov chain Monte Carlo methods.
Slides ()
17. Case study: text mining. Naive Bayes. Latent Dirichlet allocation and its extensions.
Slides ()
18. Support vector machines. Kernel trick for SVMs.
Slides ()
19. Case study: recommender systems. Nearest neighbors: user-based and item-based. Locality sensitive hashing.
20. Case study: recommender systems. SVD extensions. Additional information in recommender systems. Course review.
Slides for lectures 19-20 ()

Selected references.

  1. Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, Information Science and Statistics series, 2006.
  2. Kevin Murphy. Machine Learning: A Probabilistic Perspective, MIT Press, 2012.
  3. David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.