'; print ''; ?>

Sergey Nikolenko

Main page Books'; print '
Research papers'; print '
Talks and posters'; print '
Students'; print '
Popular science'; print '
Other stuff'; print '

Research'; print '
CS and crypto'; print '
Bioinformatics'; print '
Machine learning'; print '
Algebraic geometry'; print '
Algebra'; print '
Bayesian networks'; print '
Earth sciences'; print '

Teaching'; print '
2014'; print '
ML, KFU'; print '
Game Theory, HSE'; print '
Mech. Design, HSE'; print '
ML, CSClub Kazan'; print '
Game theory, HSE'; print '
Math. logic, AU'; print '
Machine learning, STC'; print '
Machine learning, AU'; print '
2013'; print '
Discrete math, HSE'; print '
Machine learning, STC'; print '
Math. logic, AU'; print '
Cryptography, AU'; print '
2012'; print '
Machine learning, STC'; print '
Math. logic, AU'; print '
Machine learning II, AU'; print '
Machine learning, AU'; print '
Machine learning, EMC'; print '
2011'; print '
Cryptography, AU'; print '
Math. logic, AU'; print '
Machine learning, AU'; print '
2010'; print '
Math. logic, AU'; print '
Machine learning, AU'; print '
Cryptography, AU'; print '
2009'; print '
Crypto in CS Club'; print '
Statistics'; print '
Machine learning, AU'; print '
Cryptography'; print '
2008'; print '
Speech recognition'; print '
MD for CS Club'; print '
ML for CS Club'; print '
Mechanism design'; print '
2007'; print '
Machine Learning'; print '
Probabilistic learning'; print '

External links'; print '
Google Scholar profile'; print '
DBLP profile'; print '
LiveJournal account
nikolenko (in Russian)

Teaching activities

Machine Learning at the Speech Technology Center

This is a semester-long course in machine learning presented in 2012 at the chair organized by the Speech Technology Center.

The course itself (all slides and lecture notes are in Russian) is presented below. Most items on the list correspond to two lectures; some, to one.

1. Introduction. History of AI. Probability theory basics. Bayes theorem and maximal a posteriori hypotheses. Example: Laplace's rule.: Slides ()
2. Least squares and nearest neighbors. Statistical decision theory. Linear regression. Linear regression from the Bayesian standpoint. Curse of dimensionality.: Slides ()
3. Example: polynomial approximation, overfitting. Regularization: ridge regression. Bias-variance-noise decomposition. How ridge regression follows from Gaussian priors, different forms of regularization.: Slides (.pdf, 2154kb)
4. Classification. Least squares for classification. Fischer linear discriminant. Perceptron and proof of its convergence. Linear and quadratic discriminant analysis.: Slides (.pdf, 2404kb)
5. Support vector machines. Linear separation and max-margin classifiers. Quadratic optimization. Kernel trick and radial basis functions. SVM variants: ν-SVM, one-class SVM, SVM for regression.: Slides (.pdf, 1282kb)
6. Clustering. Hierarchical clustering. Combinatorial methods, graph algorithms for clustering. The EM algorithm, its formal justification. EM for clustering.: Slides (.pdf, 1129kb)
7. Hidden Markov models. The three problems. Dynamic programming: sum-product and max-sum. The Baum-Welch algorithm. Variations on the HMM theme.: Slides (.pdf, 609kb)
8. Priors. Conjugate priors: beta distribution, the Dirichlet distribution. Conjugate priors for the normal distribution. The exponential family.: Slides ()
9. Probabilistic graphical models. Directed graphical models (BBNs), undirected graphical models, and factor graphs. Example: marginalization on a linear chain. Belief propagation by message passing. The message passing algorithm in the general case (a tree).: Slides ()
10. Approximate inference. Loopy belief propagation. Variational approximations. Simple examples: the univariate Gaussian, variational bound for model complexity. Case study: Latent Dirichlet Allocation.: Slides (on LDA only, )
11. Expectation Propagation: approximate inference for complex factors. Examples. Case study: Bayesian rating models. The Elo rating. Bradley-Terry models and minorization-maximization learning algorithms. The TrueSkill model and its modifications.: Slides (on rating models only, )
12. Reinforcement learning I. Exploration vs. exploitation. Multiarmed bandits.: Slides ()
13. Reinforcement learning II. Markov decision processes. Value functions, Bellman equations. Policy improvement. Monte Carlo approaches, on-policy and off-policy methods. TD-learning.: Slides ()
14. Reinforcement learning III. Gitting indices; Gittins theorem (with proof). Regret minimization, UCB1 and a logarithmic bound on its regret. Trend following: dynamic Gamma--Poisson.: Slides ()
15. Case study: recommender systems.: Slides (.pdf, 1514kb)
16. Model combination. Bayesian averaging. Bootstrapping and bagging. Boosting: AdaBoost. Weak learners: decision trees, learning decision trees. Exponential error minimization. RankBoost.: Slides (.pdf, 777kb)
17. Artificial neural networks. Two-layered networks, error functions. Backpropagation. Case study: learning to rank. RankNet, LambdaRank, MART, LambdaMART.: Slides (.pdf, 1019kb)

Seminar

Talks by students.

1. Naive Bayes. Multinomial vs. multivariate naive Bayes. Semi-naive Bayes. On the optimality of naive Bayes.: Alexander Sizov. Slides (.pdf, 800kb)

Selected references

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, Information Science and Statistics series, 2006.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer, 2009.
David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.