Revolutionizing Essay Grading with Multi-Task Learning
Table of Contents:
- Introduction
- What is an Essay?
- The Process of Essay Grading
- Automatic Essay Grading
- Cognitive Aided Automatic Essay Grading
- Terminologies in Gaze Behavior
- Solutions for Collecting Gaze Behavior Data
- Multitask Learning Approach
- Learning Gaze Behavior as an Auxiliary Task
- Data Set Details
- Systems Used for Essay Grading
- Self-Attention System
- Co-Attention Based System
- Hyperparameters Used
- Experiments and Results
- Scene Essay Sets
- Unseen Essay Sets
- Native Speaker Experiments
- Importance of Gaze Behavior Attributes
- Conclusion
- Future Directions
1. Introduction
In this article, we will explore the concept of automatic essay grading using a multitask learning approach with the help of gaze behavior. Automatic essay grading is the process of assigning a score to an essay without human intervention. By incorporating gaze behavior data into the grading process, we can enhance the performance of natural language processing systems. In this article, we will discuss various terminologies related to gaze behavior, solutions for collecting gaze behavior data, and the benefits of using multitask learning for essay grading.
2. What is an Essay?
An essay is a piece of text written in response to a topic called the essay prompt. It is a structured and coherent composition that presents an argument or provides information on a specific subject. Essays are commonly assigned in educational settings to evaluate a student's understanding of a topic and their ability to communicate effectively through writing.
3. The Process of Essay Grading
Essay grading involves evaluating the quality of an essay based on various factors such as content, organization, grammar, and coherence. Traditionally, essay grading was done manually by human graders, which was time-consuming and subjective. However, with the advancements in technology, automatic essay grading has emerged as a viable alternative.
4. Automatic Essay Grading
Automatic essay grading, also known as computer-based essay grading or e-grading, is the process of using machines to assign scores to essays. It eliminates the need for human graders and improves the efficiency of the grading process. There are different approaches to automatic essay grading, and one of the promising methods is incorporating gaze behavior data.
-
- Cognitive Aided Automatic Essay Grading
Cognitively aided automatic essay grading is a technique where cognitive information is used to assist machines in performing automatic essay grading. By analyzing a reader's gaze behavior, which includes fixations and regressions, machines can gain insights into the reader's comprehension and focus areas. This aids in assessing the quality of the essay more accurately.
-
- Terminologies in Gaze Behavior
Gaze behavior refers to the movements and focus of a reader's eyes while reading an essay. Understanding the terminologies related to gaze behavior is essential for effectively utilizing it for automatic essay grading. Some important terminologies include interest areas, fixations, saccades, progressions, and regressions.
-
- Solutions for Collecting Gaze Behavior Data
Collecting gaze behavior data can be a challenging and resource-intensive task. However, there are two solutions that can make this process more feasible. The first solution is type aggregation from an existing gaze behavior dataset, where the gaze behavior of each token is calculated as the mean value of the gaze behavior for that token in the dataset. The second solution is multitask learning, where learning gaze behavior becomes an auxiliary task while the primary task remains essay grading.
5. Multitask Learning Approach
Multitask learning is a machine learning paradigm where information from auxiliary tasks is used to help solve a primary task. In the Context of automatic essay grading, multitask learning involves collecting gaze behavior for a small number of essays and using that data to train a model to score other essays. This approach enhances the performance of the grading model and provides insights into the reader's behavior.
-
- Learning Gaze Behavior as an Auxiliary Task
By considering gaze behavior as an auxiliary task, we can leverage the information it provides to improve the accuracy of essay grading. Multitask learning allows us to simultaneously learn the scoring of essays and the patterns in gaze behavior. This holistic approach enhances the overall performance of the grading system.
-
- Data Set Details
To evaluate the effectiveness of multitask learning with gaze behavior, we use the ASAP Automatic Essay Grading data set, which consists of eight prompts with approximately 13,000 essays. Gaze behavior data is collected for a subset of 48 essays across four source-dependent response prompts. The data set includes attributes such as fixation duration, regressions, interest area attributes, and more.
6. Systems Used for Essay Grading
To implement the multitask learning approach, we utilize two systems: the self-attention system and the co-attention based system. These systems employ deep learning techniques to analyze and score essays based on various features, including gaze behavior attributes. Hyperparameters play a crucial role in the performance of these systems.
-
- Self-Attention System
The self-attention system is a modification of the Dong et al. (2017) self-attention system. It involves splitting the essay into sentences, embedding the sentences, and using word-level CNN and attention pooling layers to obtain sentence-level representations. The gaze behavior data is incorporated at the word level to enhance the scoring process.
-
- Co-Attention Based System
The co-attention based system is similar to the system developed by Zhang and Litman (2018). It involves obtaining representations for both the essay and the corresponding source articles. These representations are then used in co-attention and modeling layers to generate a final output score. Gaze behavior data is integrated into the system to improve the grading accuracy.
-
- Hyperparameters Used
Various hyperparameters are utilized in the self-attention system and the co-attention based system. These hyperparameters include the number of layers, learning rate, dropout rate, batch size, and more. They are tuned to optimize the performance of the grading models.
7. Experiments and Results
To evaluate the effectiveness of the proposed multitask learning approach with gaze behavior, we conduct experiments on both scene essay sets and unseen essay sets. The scene essay sets contain essays from known Prompts, while the unseen essay sets do not have corresponding source articles. We measure the performance using metrics such as quadratic weighted kappa, correct score agreement, and close score agreement.
-
- Scene Essay Sets
In the experiments conducted on scene essay sets, we compare the performance of our systems with other state-of-the-art systems. The results show statistically significant improvements over the baseline co-attention system of Zhang and Litman (2018). The incorporation of gaze behavior data enhances the accuracy of automatic essay grading.
-
- Unseen Essay Sets
When evaluating the performance on unseen essay sets, where there are no corresponding source articles, our self-attention system still outperforms the baseline models. This demonstrates the effectiveness of using multitask learning with gaze behavior even in scenarios with limited data availability.
8. Native Speaker Experiments
Additionally, we perform experiments using gaze behavior data from native speakers alone. While the results are better than using no gaze behavior, they are still inferior to when using gaze behavior from multiple annotators. This highlights the importance of incorporating diverse gaze behavior data to achieve higher accuracy in essay grading.
9. Importance of Gaze Behavior Attributes
Ablation tests are conducted to determine the importance of different gaze behavior attributes in the grading process. The findings indicate that fixation-based attributes, specifically dwell time and first fixation duration, play a crucial role in enhancing the performance of the grading models.
10. Conclusion
In conclusion, our study demonstrates that learning gaze behavior as an auxiliary task leads to significant improvements in automatic essay grading. By considering gaze behavior attributes, such as fixation duration, regressions, and interest areas, we can gain valuable insights into a reader's comprehension and focus areas. This improves the accuracy and reliability of essay grading systems.
11. Future Directions
In future research, we plan to explore the application of gaze behavior in cross-domain automatic essay grading and zero-shot automatic essay grading. These scenarios involve grading essays from different domains or prompts for which no specific training data is available. By further investigating the potential of gaze behavior, we aim to develop more robust and adaptable essay grading systems.
Highlights:
- Automatic essay grading using multitask learning with gaze behavior
- Terminologies and solutions for collecting gaze behavior data
- Comparison of self-attention and co-attention based systems
- Significant improvements in both scene and unseen essay sets
- Importance of fixation-based gaze behavior attributes
- Future directions for cross-domain and zero-shot automatic essay grading
FAQ:
Q: What is automatic essay grading?
A: Automatic essay grading is the process of assigning scores to essays using machines, eliminating the need for human graders.
Q: How does gaze behavior affect essay grading?
A: Gaze behavior provides insights into a reader's comprehension and focus areas, which can enhance the accuracy of essay grading systems.
Q: Can gaze behavior be collected easily?
A: Collecting gaze behavior can be challenging and expensive, but there are solutions such as type aggregation and multitask learning to make it more feasible.
Q: What are the advantages of multitask learning in essay grading?
A: Multitask learning allows us to simultaneously learn essay scores and gaze behavior patterns, resulting in improved grading accuracy.
Q: Are there different systems for automatic essay grading?
A: Yes, there are self-attention and co-attention based systems that utilize deep learning techniques to analyze and score essays.
Q: What are the key findings of this research?
A: The research shows statistically significant improvements in automatic essay grading by incorporating gaze behavior data, particularly fixation-based attributes.
Q: Can gaze behavior of native speakers alone improve grading accuracy?
A: While using native speakers' gaze behavior improves accuracy, incorporating gaze behavior from multiple annotators yields better results.
Q: What are the future directions for research in automatic essay grading?
A: Future research aims to explore the use of gaze behavior in cross-domain and zero-shot essay grading to develop more adaptable grading systems.