Attention mechanisms in deep learning. The Transformer architecture.
BERT and GPT families. The variational autoencoder (VAE). Discrete latent spaces in VAE: VQ-VAE and VQ-GAN.
Transformer + dVAE = DALL-E. Transformers for images: Visual BERT, ViT, Swin, Perceiver. Multimodal latent spaces: CLIP and BLIP. Our recent work: LAPCA for cross-lingual retrieval and question answering.
Transformers for medical imaging: classification, segmentation, and image synthesis.
Transformers for video processing and retrieval.
Our recent work on math and Transformers: Sinkhorn transformations for postprocessing in video retrieval and topological data analysis for AI-generated text detection and model interpretation.