Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots

27 Mar 2022 in Posts on Reviews, Nlp, Deep_learning, Retrieval, Dialogue, Response_selection, Adversarial_dataset

TL;DR:

The author argues that training model to recognize the relevance between context and response is important for response selection task.
The overall architecture seems similar with MDFN(Masking Decoupling Fusing Network) in that it fuses information of different level such as word, utterance, and context through attention mechanism.
The experiments on MSN(Multi-hop Selector Network) showed that weighted utterance representations improved the performance on PLM-based response matching model in general.

An Evaluation Dataset and Strategy for Building Robust Multi-turn Response Selection Model

25 Mar 2022 in Posts on Reviews, Nlp, Deep_learning, Retrieval, Dialogue, Response_selection, Adversarial_dataset

TL;DR:

The previous neural response selection model lack of a comprehensive understanding of context, and it results into biased response selection.
The adversarial dataset was reviewed and filtered by experts, and proposed to confirm that model has learned the comprehensive information, not just comparison based on similar tokens.
The proposed debiasing strategy utilizing biased model seems effective to migitate the model’s biased pattern learning.

Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection

17 Mar 2022 in Posts on Reviews, Nlp, Deep_learning, Retrieval, Dialogue, Response_selection, Auxiliary_task

TL;DR:

UMS-BERT is one of approaches suggesting complementary training tasks, in order to deal with the limitation of learning sequential information.
There was performance gain in both PLM based model and non-PLM based model, which means that the auxiliary tasks are substantial to enhance the capabilities for dialogue response selection.
The background of paper, such as proposed problem, approach, and conclusion seems quite similar with BERT-SL.

Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues

15 Mar 2022 in Posts on Reviews, Nlp, Deep_learning, Retrieval, Dialogue, Response_selection, Auxiliary_task

TL;DR:

BERT-SL is one of approaches suggesting complementary training tasks, in order to deal with the limitation of learning sequential information.
There was performance gain in both PLM based model and non-PLM based model, which means that the auxilary tasks are substantial to enhance the capabilities for dialogue response selection.

Filling the Gap of Utterance-aware and Speaker-aware

14 Mar 2022 in Posts on Reviews, Nlp, Deep_learning, Retrieval, Dialogue, Response_selection

TL;DR:

Mask-based Decoupling-Fusing Network (MDFN) is one of architecture-side approaches to deal with the limitation of learning sequential information.
Decoupling and fusing mechanism is proposed for model to encode utterance-level and speaker-level information.

RocketQAv2, A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

06 Mar 2022 in Posts on Reviews, Nlp, Deep_learning, Retrieval, Distillation

TL;DR:

RocketQAv2 is the two-stage retrieval system for Question-Answering task.
The author proposed ‘Dynamic List-wise Distillation’ and ‘Hybrid Data Augmentation’ utilizing the previous version, RocketQAv.

NoisyTune, A Little Noise Can Help You Finetune Pretrained Language Models Better

05 Mar 2022 in Posts on Reviews, Nlp, Deep_learning, Regularization

TL;DR:

NoisyTune is a simple regularization, but consistently improve performance of PLM.
It adds matrix-wise noise considering differences between parameter matrices in the PLM.
It works better on small dataset, which means it mitigates the gap between pretrained data and domain data.

Switch Transformers, Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

28 Mar 2021 in Posts on Reviews, Nlp, Deep_learning, Language_model

TL;DR:

Switch Transformer is sparsely-active transformer, which can reduce optimizing time by introducing MoE(Mixture of Experts) algorithm and parallelizing parts of model.
The advantage of Switch Transformer is that some layers can be parallelized and computation can be accelerated. Efficiency can increase depending on the number of CPU cores. In addition, Switch Transformer shows improvement in quality in low compute resources.
However, optimizing router and MoE layers may be a cause of training instability at the same time.

Improving Language Understanding by Generative Pre-Training

21 Mar 2021 in Posts on Reviews, Nlp, Deep_learning, Language_model, Generative_model

Introduction

Having reading backward from GPT-3 to now GPT-1, one of the interest things is that changes of the degree of potential merit that each author argues unsupervised learning would have are apparent.
In GPT-1, the author suggests that unsupervised pretraining may boost the performance of supervised downstream tasks, compared to GPT-3 which author emphasizes that supervised learning is not indispensable if the model is large enough.

How multilingual is Multilingual BERT?

06 Jun 2020 in Posts on Reviews, Nlp, Deep_learning, Language_model, Multilingual

TL;DR:

M-BERT(Multilingual BERT) is BERT trained on corpora from various languages.
M-BERT does not seem to learn systematic transformation of languages. (complicate syntactic/semantic relationship between languages)
The significant factors of M-BERT’s performance
Vocabulary Memorization: the fraction of Word overlap between languages and
Mapping new vocabularies onto learned structure
Merely pre-training general representation of languages from unannotated corpora guarantees baseline performance of downstream task in some circumstances.

TL;DR:

TL;DR:

TL;DR:

TL;DR:

TL;DR:

TL;DR:

TL;DR:

TL;DR:

Introduction

TL;DR:

Pagination