Effective Chatbot Models for Question Answering
Table of Contents:
- Introduction
- Background Information
2.1 Squad 2.0 Dataset
2.2 Unit.ai Dataset
- Models Employed
3.1 Vanilla Sequence to Sequence Model
3.2 LSTM Model
3.3 LSTM Model with Attention
3.4 Transformer Distilbert Model
- Evaluation of Baseline Model
- Improvement with Attention Mechanism
- Introduction to Transformer Encoder
- Distilbert Model: A Distilled Version of Bert
- Pre-processing of Squad Data
- Implementation of Birth Model
- Introduction to Bilbert
- Limitations and Future Enhancements
- Conclusion
Introduction
In this article, we will explore the project on the Unif.AI Chatbot that utilizes different models with varying complexities to build a question answering system. The primary dataset used for this project is the Squad 2.0 dataset, which is a reading comprehension dataset consisting of questions posed on a set of Wikipedia articles. Additionally, the Unit.ai dataset comprises of FAQ style questions and answers related to the logistics of the course. Our objective is to compare the performance of different models and assess their effectiveness in solving question answering problems.
Background Information
2.1 Squad 2.0 Dataset
The Squad 2.0 dataset is a widely used reading comprehension dataset that contains questions posed by crowd workers on a set of Wikipedia articles. The dataset has been labeled using Amazon Mechanical Turk and includes over 53,000 new unanswerable questions about the same paragraphs. The Squad 2.0 data set provides a rich source of information for training and evaluating question answering models.
2.2 Unit.ai Dataset
The Unit.ai dataset consists of an FAQ data set with question and answer pairs. It also includes a separate test data set containing only questions. The dataset focuses on general questions asked by students regarding the logistics of the course. However, it is important to note that the Unit.ai dataset is significantly smaller in size compared to the Squad 2.0 dataset.
Models Employed
3.1 Vanilla Sequence to Sequence Model
The baseline model for this project is a vanilla sequence to sequence model which consists of two recurrent neural networks (RNNs): one in the encoder and the other in the decoder. However, the simplicity of this model does not yield favorable results when it comes to predicting answers, as it often fails to predict any word apart from the end token.
3.2 LSTM Model
To improve upon the baseline model, an LSTM (Long Short-Term Memory) model is employed. The LSTM alleviates the issues of exploding and vanishing gradients faced by RNNs. However, this model still struggles with accurately representing the lengthy Context in the Squad data, resulting in subpar performance.
3.3 LSTM Model with Attention
In an attempt to address the limitations of the LSTM model, attention is introduced. By incorporating attention mechanism with the LSTM-Based sequence to sequence model, some improvements in the prediction results are observed. However, further training is required to optimize the performance of this model.
3.4 Transformer Distilbert Model
The final model used in this project is the Transformer Distilbert model, which is a distilled version of the Bert architecture. Distilbert stands for "Distillated Bert," and it has 40% fewer parameters than the base Bert model while running 60% faster. This model utilizes attention mechanism to learn contextual relations between words and text, resulting in improved performance on the question answering task.
Evaluation of Baseline Model
The baseline model, which is the vanilla sequence to sequence model, performed poorly in terms of predicting accurate answers. Due to the limitations of this model, such as the inability to effectively represent the complete context in a single encoded vector, it struggled to yield satisfactory results.
Improvement with Attention Mechanism
By incorporating attention mechanism with the LSTM-based sequence to sequence model, the predictions showed some semantic Sense. However, further training is necessary to optimize the performance of this model.
Introduction to Transformer Encoder
The Transformer Encoder, an integral part of the Transformer Distilbert model, utilizes attention mechanism to learn contextual relations between words and text. This mechanism alleviates the unidirectionality constraint by using a mask language model, which masks some tokens from the input. The objective is to predict the masked word based on its surroundings.
Distilbert Model: A Distilled Version of Bert
The Distilbert model, which is a smaller and faster variant of the Bert model, has proven to be effective in various natural language processing tasks. It has 40% fewer parameters than the base Bert model while preserving 97% of its performance on the Blue language understanding benchmark.
Pre-processing of Squad Data
To preprocess the Squad data, incorrect offsets of the provided answers are corrected, and an answer end index is added to denote the end position of the answer. The tokenizer used for this task has a maximum sequence length of 512, which means that some answers may be truncated if they exceed this limit. Additionally, the tokenizer provided by the Hugging Face library is utilized for its speed performance benefits.
Implementation of Bert Model
The Bert model is implemented by utilizing the encoder segment, which is pre-trained on a large unlabeled corpus of data. This model can be fine-tuned and tailor-made to address specific natural language processing problems.
Introduction to Bilbert
Bilbert is a variant of the Bert model that yields only a three percent information loss while providing significant improvements in size and speed. This model offers potential benefits for applications requiring faster processing and reduced memory consumption.
Limitations and Future Enhancements
While the project has achieved promising results, there are some limitations and areas for future enhancements. One limitation is the small size of the Unit.ai dataset compared to the Squad 2.0 dataset, which may affect the predictions. Splitting the long context and employing data augmentation techniques could potentially improve the prediction accuracy. Further refinement of the training process, including the use of multiple contexts, could also enhance the performance of the models.
Conclusion
In conclusion, the Unif.ai chatbot project demonstrates the effectiveness of different models in solving question answering problems. By comparing the performance of various models, we can gain insights into their strengths and weaknesses. The implementation of attention mechanism and transformer models, such as the Distilbert model, have shown significant improvements in the prediction accuracy. It is important to consider the limitations of the datasets and explore avenues for future enhancements to further optimize the performance of the question answering system.