Top NLP papers

Attention is All You Need
- Authors: Vaswani et al.
- Published: 2017
- Paper: Attention is All You Need
LSTM: Long Short-Term Memory
- Authors: Hochreiter and Schmidhuber
- Published: 1997
- Paper: Long Short-Term Memory
Offers:
- a solution to the vanishing gradient problem in RNNs
- Solve the problem of long-term dependencies in RNNs
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Authors: Devlin et al.
- Published: 2018
- Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Offers:
- A new pre-training approach for NLP tasks. (Then fine-tune the model on specific tasks such as question answering, sentiment analysis, etc.)
- Achieved state-of-the-art results on 11 NLP tasks
- BERT is a transformer-based model
- BERT is trained on two tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP)
- BERT is trained on a large corpus of text data (BooksCorpus and Wikipedia)
Word2Vec
- Authors: Mikolov et al.
- Published: 2013
- Paper: Efficient Estimation of Word Representations in Vector Space
Offers:
- A method to learn word embeddings
- Word2Vec is a shallow neural network model
- Word2Vec has two models: Continuous Bag of Words (CBOW) and Skip-gram
- Word2Vec is trained on a large corpus of text data
GLoVe: Global Vectors for Word Representation
- Authors: Pennington et al.
- Published: 2014
- Paper: GloVe: Global Vectors for Word Representation
Offers:
- A method to learn word embeddings. Similar to Word2Vec
- GLoVe is a matrix factorization technique
- Woed2Vec uses local context information to learn word embeddings, while GLoVe uses global context information
- Word2Vec uses a shallow neural network model, while GLoVe uses a matrix factorization technique.(GloVe is a count-based model. Linear regression model is used to predict the word vectors)
RNN Encoder-Deocder: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
- Authors: Cho et al.
- Published: 2014
- Paper: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Offers:
- A model architecture for seq2seq tasks
- Encoder-Decoder is used in machine translation, text summarization, and other seq2seq tasks
- Encoder-Decoder is trained to map an input sequence to an output sequence
Attention Mechanism
- Authors: Bahdanau et al.
- Published: 2014
- Paper: Neural Machine Translation by Jointly Learning to Align and Translate
Offers:
- A mechanism to improve the performance of seq2seq models
- Attention mechanism is used in machine translation, text summarization, and other seq2seq tasks
- Attention mechanism allows the model to focus on different parts of the input sequence when generating the output sequence
- Attention mechanism is used in conjunction with RNNs and transformers
Seq2Seq: Sequence to Sequence Learning with Neural Networks
- Authors: Sutskever et al.
- Published: 2014
- Paper: Sequence to Sequence Learning with Neural Networks
Offers:
- A model architecture for seq2seq tasks
- Seq2Seq is used in machine translation, text summarization, and other seq2seq tasks
- Instead of RNN Encoder-Decoder, Seq2Seq uses LSTM Encoder-Decoder
- Other differences between Seq2Seq and RNN Encoder-Decoder include the use of word embeddings and beam search
- Instead of mapping the sentence a, b, c to the sentence α, β, γ the LSTM is asked to map c, b, a to α, β, γ, where α, β, γ is the translation of a, b, c.
- That makes it easy for SGD to “establish communication” between the input and the output
BLEU: A Method for Automatic Evaluation of Machine Translation
- Authors: Papineni et al.
- Published: 2002
- Paper: BLEU: A Method for Automatic Evaluation of Machine Translation
Offers:
- A metric to evaluate the performance of machine translation systems
- BLEU is used to compare the output of a machine translation system with a reference translation
- BLEU is based on the precision of n-grams in the output translation
- BLEU is widely used in machine translation research and development
GPT-3: Language Models are Few-Shot Learners
- Authors: Brown et al.
- Published: 2020
- Paper: Language Models are Few-Shot Learners