Detecting AI Writing & Maintaining Academic Integrity with Turnitin

Detecting AI Writing & Maintaining Academic Integrity with Turnitin

Table of Contents:

  1. Introduction
  2. The Use Case: Academic Integrity and ai writing
  3. Workflow Overview
  4. Uploading the Paper to Turnitin
  5. Processing Papers with SNS and SQS
  6. Lambda Functions and Rules Engine
  7. Breaking Up the Document
  8. Sentences Segmentation with SageMaker
  9. Concurrency Challenges and Serverless Solutions
  10. Inferencing and Document Scoring
  11. Orchestrating Lambdas with Java and Python
  12. Persisting Data in DynamoDB
  13. Micro Front End and Data Retrieval
  14. Global Distribution and Geo-Distributed System
  15. Conclusion

📚 Introduction

In this article, we will delve into the intricacies of an event-driven architecture that focuses on academic integrity in the realm of Generative AI writing. As the education technology advances, it becomes imperative to detect instances of AI-generated content in the academic world. We will explore how this architecture, built on AWS, combines various components to process and analyze academic papers, ensuring the accuracy and fairness of written work.

🎓 The Use Case: Academic Integrity and AI Writing

The foremost objective of this architecture is to detect generative AI writing and preserve academic integrity. Educators and institutions aim to deter the use of AI-generated content in academic submissions. By leveraging machine learning and advanced technologies, this event-driven architecture helps identify and flag instances of AI-generated content, ensuring a level playing field for all students.

🔄 Workflow Overview

Before diving into the technical details, let's have a high-level overview of the workflow involved in processing academic papers for AI-generated content detection. The following steps Outline the main components and stages of the workflow:

  1. Uploading the paper to Turnitin
  2. Processing papers with SNS and SQS
  3. Lambda functions and rules engine
  4. Breaking up the document
  5. Sentences segmentation with SageMaker
  6. Concurrency challenges and serverless solutions
  7. Inferencing and document scoring
  8. Orchestrating Lambdas with Java and Python
  9. Persisting data in DynamoDB
  10. Micro front end and data retrieval
  11. Global distribution and geo-distributed system

Now, let's delve into each step and explore the intricacies of this event-driven architecture.

📥 Uploading the Paper to Turnitin

The workflow begins with a student uploading their paper onto the Turnitin platform. This can be done directly through the Turnitin website or via integrations with other platforms. Once the paper is uploaded, an SNS (Simple Notification Service) topic is triggered, indicating that a paper has been received and is ready for processing.

✉ Processing Papers with SNS and SQS

Upon receiving the notification from SNS, the system leverages SQS (Simple Queue Service) to queue the papers for processing. This approach ensures efficient handling of the high paper submission volume, which can reach up to 2 million papers per day. The SQS queue acts as a buffer, allowing the system to process papers in a scalable manner.

🤖 Lambda Functions and Rules Engine

To handle the processing of papers, the architecture utilizes Lambda functions. The first Lambda acts as a rules engine, determining which papers should be processed based on specific criteria. It considers various factors such as language and size limitations to selectively process the papers. Once the selection is made, the first Lambda invokes a Second Lambda function.

📃 Breaking Up the Document

The second Lambda function focuses on breaking down the document into sentences. It processes the document, excluding non-essential components like headers and footers, to extract the core writing content. By eliminating irrelevant sections, the system can focus solely on analyzing the actual written material.

➗ Sentences Segmentation with SageMaker

Once the document is broken down into sentences, the architecture employs a sliding window approach. The second Lambda function sends segments of sentences to a SageMaker endpoint for inference. This segmentation allows for Parallel processing, enhancing the system's performance.

🖥 Concurrency Challenges and Serverless Solutions

Due to the large Scale of paper processing, concurrency becomes a crucial aspect. While the architecture aims to be serverless, the current limitations require the use of SageMaker EC2 instances to handle the concurrency needs effectively. By utilizing serverless components where possible and resorting to alternative solutions as needed, the system achieves optimal performance.

💡 Inferencing and Document Scoring

The machine learning model, integrated into the architecture, performs inferencing on the segmented sentences. The model analyzes each sentence and returns the results to the last Lambda it interacted with. The final Lambda aggregates the scores for each sentence and derives an overall score for the entire document.

🔄 Orchestrating Lambdas with Java and Python

One unique aspect of this architecture is the use of both Java and Python. As the first Lambda acts as an orchestrator or rules engine, it is written in Java, utilizing existing libraries and resources. The second Lambda, responsible for processing the document and working closely with the machine learning model, is developed using Python for its compatibility and expertise in AI-related tasks.

📊 Persisting Data in DynamoDB

To ensure the persistence of data, the architecture leverages DynamoDB, AWS's NoSQL database service. Each paper, at the sentence level, is stored in DynamoDB, allowing for detailed analysis and retrieval of data when required. By utilizing the established Patterns and libraries, the system efficiently stores and manages the Relevant information related to each paper.

🌐 Micro Front End and Data Retrieval

To provide users, particularly instructors, with access to the processed data, the architecture incorporates a micro front end. This web component can be embedded in existing applications, eliminating the need for extensive application rewriting. The micro front end interacts with the Lambda functions to retrieve the necessary data for displaying to instructors and other relevant stakeholders.

🌍 Global Distribution and Geo-Distributed System

Considering the global nature of education and the need for efficient access, the architecture follows a geo-distributed pattern. The system is deployed in three separate regions, namely Europe, Asia Pacific, and the US. This distribution allows for localized access and enhanced performance based on the user's geographical location.

🎓 Conclusion

In conclusion, this event-driven architecture provides an effective solution for detecting AI-generated content in the academic world while ensuring academic integrity. By utilizing various AWS services, Lambda functions for orchestration, and machine learning inference, the system processes papers efficiently and accurately. The persistence of data in DynamoDB and the integration of a micro front end enable instructors to access and evaluate the processed information seamlessly. With its global distribution, this architecture accommodates users worldwide, facilitating fair and unbiased academic evaluations.

Highlights:

  • Event-driven architecture for academic integrity and AI writing
  • Efficient paper processing with SNS, SQS, and Lambda functions
  • Breakdown of documents and focus on core writing content
  • SageMaker for parallel sentence segmentation and inferencing
  • Challenges and solutions for concurrency in a serverless environment
  • Java and Python integration for orchestrating Lambda functions
  • Data persistence and detailed analysis with DynamoDB
  • Easy data retrieval through a micro front end
  • Global distribution for optimal performance and accessibility

FAQ:

Q: How many papers can be processed per day with this architecture? A: The architecture is capable of processing up to 2 million papers per day.

Q: What are the criteria for determining which papers are processed? A: The first Lambda function considers language and size limitations to decide which papers to process.

Q: Why are separate Lambdas used for orchestration and document processing? A: The first Lambda, written in Java, serves as an orchestrator and rules engine, while the second Lambda, written in Python, handles document processing and AI-related tasks.

Resources:

Find AI tools in Toolify

Join TOOLIFY to find the ai tools

Get started

Sign Up
App rating
4.9
AI Tools
20k+
Trusted Users
5000+
No complicated
No difficulty
Free forever
Browse More Content