Multi-hop Selector Network for Multi-turn Response Selection in Retrieval-based Chatbots

TL;DR:

  • The author argues that training model to recognize the relevance between context and response is important for response selection task.
  • The overall architecture seems similar with MDFN(Masking Decoupling Fusing Network) in that it fuses information of different level such as word, utterance, and context through attention mechanism.
  • The experiments on MSN(Multi-hop Selector Network) showed that weighted utterance representations improved the performance on PLM-based response matching model in general.

Read more

An Evaluation Dataset and Strategy for Building Robust Multi-turn Response Selection Model

TL;DR:

  • The previous neural response selection model lack of a comprehensive understanding of context, and it results into biased response selection.
  • The adversarial dataset was reviewed and filtered by experts, and proposed to confirm that model has learned the comprehensive information, not just comparison based on similar tokens.
  • The proposed debiasing strategy utilizing biased model seems effective to migitate the model’s biased pattern learning.

Read more

Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection

TL;DR:

  • UMS-BERT is one of approaches suggesting complementary training tasks, in order to deal with the limitation of learning sequential information.
  • There was performance gain in both PLM based model and non-PLM based model, which means that the auxiliary tasks are substantial to enhance the capabilities for dialogue response selection.
  • The background of paper, such as proposed problem, approach, and conclusion seems quite similar with BERT-SL.

Read more

Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues

TL;DR:

  • BERT-SL is one of approaches suggesting complementary training tasks, in order to deal with the limitation of learning sequential information.
  • There was performance gain in both PLM based model and non-PLM based model, which means that the auxilary tasks are substantial to enhance the capabilities for dialogue response selection.

Read more

Filling the Gap of Utterance-aware and Speaker-aware

TL;DR:

  • Mask-based Decoupling-Fusing Network (MDFN) is one of architecture-side approaches to deal with the limitation of learning sequential information.
  • Decoupling and fusing mechanism is proposed for model to encode utterance-level and speaker-level information.

Read more

Switch Transformers, Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

TL;DR:

  • Switch Transformer is sparsely-active transformer, which can reduce optimizing time by introducing MoE(Mixture of Experts) algorithm and parallelizing parts of model.
  • The advantage of Switch Transformer is that some layers can be parallelized and computation can be accelerated. Efficiency can increase depending on the number of CPU cores. In addition, Switch Transformer shows improvement in quality in low compute resources.
  • However, optimizing router and MoE layers may be a cause of training instability at the same time.

Read more

Improving Language Understanding by Generative Pre-Training

Introduction

  • Having reading backward from GPT-3 to now GPT-1, one of the interest things is that changes of the degree of potential merit that each author argues unsupervised learning would have are apparent.
  • In GPT-1, the author suggests that unsupervised pretraining may boost the performance of supervised downstream tasks, compared to GPT-3 which author emphasizes that supervised learning is not indispensable if the model is large enough.

Read more

How multilingual is Multilingual BERT?

TL;DR:

  • M-BERT(Multilingual BERT) is BERT trained on corpora from various languages.
  • M-BERT does not seem to learn systematic transformation of languages. (complicate syntactic/semantic relationship between languages)
  • The significant factors of M-BERT’s performance
    • Vocabulary Memorization: the fraction of Word overlap between languages and
    • Mapping new vocabularies onto learned structure
  • Merely pre-training general representation of languages from unannotated corpora guarantees baseline performance of downstream task in some circumstances.

Read more

Pagination


© 2022.03. by bigshane