Contents (Auto-generated)
Top NLP papers
Llama
Llama 2
Normalization Layer : RMSNorm
Transformer vs Llama
Comparison of Llama 1, Llama 2, and Original Transformer Architectures
Mistral
Transformer vs Mistral vs Llama
Mistral Architecture
Sliding Window Attention(SWA) vs Self Attention
KV Cache
Mixure of Experts
Attention Mechanisms and Its Variants
Introduction to Attention Mechanisms
MHA: Multi-Head Attention
MQA: Multi-Query Attention
GQA: Grouped Query Attention
Sliding Window Attention
Other:
Reference:
KV Cache
Position Encoding Methods in Transformers
Why do we need Position Encodings in Transformers?
Absolute Positional Embeddings
Types of Position Encodings
- Sine and Cosine Positional Encodings
- Relative Positional Encodings
- Other Types of Relative Positional Encodings:
Rotary Positional Embeddings
Tranformer Toolkit
Transformer based architectures
Transformer: Optimizer and Regularization
Optimizer
Regularization
Example of a GAN in PyTorch
Generative Adversarial Networks
(To Do)
Fine Tuning
Parameter-Efficient Finetuning (PEFT) Methods
- Taxonomy of PEFT: a birds-eye view
- Additive methods
- Why add parameters?
- Selective methods
- Reparametrization-based methods
Reparametrization-based parameter-efficient finetuning methods leverage low-rank representations to minimize the number of trainable parameters. The notion that neural networks have lowdimensional representations has been widely explored in both empirical and theoretical analysis of deep learning
LoRA: Low-Rank Adaptation
Home
Coding Notebook¶
Low-Level Neural Network Implementation¶
Perceptron Implementation¶
Optimizer¶
Activation Functions
Sigmoid
Softmax
Tanh
ReLU (Rectified Linear Unit)
Leaky ReLU
Parametric ReLU (PReLU)
GELU (Gaussian Error Linear Unit)
Swish function (\(Swish_{ \beta}\))
SiLU - Sigmoid Linear Unit (\(Swish_{1}\))
GLU (Gated Linear Unit)
SwiGLU Activation Function (Llama-2)
Comparison of Activation Functions
Other:
Monotonicity:
References
Evaluation metrics
Normalization
Basics
- Why do we need Data Normalization in Machine Learning?
- Normalization vs. Standardization:
- Types of Normalization Techniques:
- Linear Scaling / Min-Max Normalization:
- Log Scalling :
- Z-score Normalization (Standardization) / Zero-mean Standardization:
- Decimal Scaling:
- Mean Normalization:
- Unit Vector Normalization:
- RMS Normalization:
- References: