Home AI News Detecting AI Writing & Maintaining Academic Integrity with Turnitin

Detecting AI Writing & Maintaining Academic Integrity with Turnitin

Introduction
The Use Case: Academic Integrity and ai writing
Workflow Overview
Uploading the Paper to Turnitin
Processing Papers with SNS and SQS
Lambda Functions and Rules Engine
Breaking Up the Document
Sentences Segmentation with SageMaker
Concurrency Challenges and Serverless Solutions
Inferencing and Document Scoring
Orchestrating Lambdas with Java and Python
Persisting Data in DynamoDB
Micro Front End and Data Retrieval
Global Distribution and Geo-Distributed System
Conclusion

📚 Introduction

In this article, we will delve into the intricacies of an event-driven architecture that focuses on academic integrity in the realm of Generative AI writing. As the education technology advances, it becomes imperative to detect instances of AI-generated content in the academic world. We will explore how this architecture, built on AWS, combines various components to process and analyze academic papers, ensuring the accuracy and fairness of written work.

🎓 The Use Case: Academic Integrity and AI Writing

The foremost objective of this architecture is to detect generative AI writing and preserve academic integrity. Educators and institutions aim to deter the use of AI-generated content in academic submissions. By leveraging machine learning and advanced technologies, this event-driven architecture helps identify and flag instances of AI-generated content, ensuring a level playing field for all students.

🔄 Workflow Overview

Before diving into the technical details, let's have a high-level overview of the workflow involved in processing academic papers for AI-generated content detection. The following steps Outline the main components and stages of the workflow:

Uploading the paper to Turnitin
Processing papers with SNS and SQS
Lambda functions and rules engine
Breaking up the document
Sentences segmentation with SageMaker
Concurrency challenges and serverless solutions
Inferencing and document scoring
Orchestrating Lambdas with Java and Python
Persisting data in DynamoDB
Micro front end and data retrieval
Global distribution and geo-distributed system

Now, let's delve into each step and explore the intricacies of this event-driven architecture.

📥 Uploading the Paper to Turnitin

The workflow begins with a student uploading their paper onto the Turnitin platform. This can be done directly through the Turnitin website or via integrations with other platforms. Once the paper is uploaded, an SNS (Simple Notification Service) topic is triggered, indicating that a paper has been received and is ready for processing.

✉ Processing Papers with SNS and SQS

Upon receiving the notification from SNS, the system leverages SQS (Simple Queue Service) to queue the papers for processing. This approach ensures efficient handling of the high paper submission volume, which can reach up to 2 million papers per day. The SQS queue acts as a buffer, allowing the system to process papers in a scalable manner.

🤖 Lambda Functions and Rules Engine

To handle the processing of papers, the architecture utilizes Lambda functions. The first Lambda acts as a rules engine, determining which papers should be processed based on specific criteria. It considers various factors such as language and size limitations to selectively process the papers. Once the selection is made, the first Lambda invokes a Second Lambda function.

📃 Breaking Up the Document

The second Lambda function focuses on breaking down the document into sentences. It processes the document, excluding non-essential components like headers and footers, to extract the core writing content. By eliminating irrelevant sections, the system can focus solely on analyzing the actual written material.

➗ Sentences Segmentation with SageMaker

Once the document is broken down into sentences, the architecture employs a sliding window approach. The second Lambda function sends segments of sentences to a SageMaker endpoint for inference. This segmentation allows for Parallel processing, enhancing the system's performance.

🖥 Concurrency Challenges and Serverless Solutions

Due to the large Scale of paper processing, concurrency becomes a crucial aspect. While the architecture aims to be serverless, the current limitations require the use of SageMaker EC2 instances to handle the concurrency needs effectively. By utilizing serverless components where possible and resorting to alternative solutions as needed, the system achieves optimal performance.

💡 Inferencing and Document Scoring

The machine learning model, integrated into the architecture, performs inferencing on the segmented sentences. The model analyzes each sentence and returns the results to the last Lambda it interacted with. The final Lambda aggregates the scores for each sentence and derives an overall score for the entire document.

🔄 Orchestrating Lambdas with Java and Python

One unique aspect of this architecture is the use of both Java and Python. As the first Lambda acts as an orchestrator or rules engine, it is written in Java, utilizing existing libraries and resources. The second Lambda, responsible for processing the document and working closely with the machine learning model, is developed using Python for its compatibility and expertise in AI-related tasks.

📊 Persisting Data in DynamoDB

To ensure the persistence of data, the architecture leverages DynamoDB, AWS's NoSQL database service. Each paper, at the sentence level, is stored in DynamoDB, allowing for detailed analysis and retrieval of data when required. By utilizing the established Patterns and libraries, the system efficiently stores and manages the Relevant information related to each paper.

🌐 Micro Front End and Data Retrieval

To provide users, particularly instructors, with access to the processed data, the architecture incorporates a micro front end. This web component can be embedded in existing applications, eliminating the need for extensive application rewriting. The micro front end interacts with the Lambda functions to retrieve the necessary data for displaying to instructors and other relevant stakeholders.

🌍 Global Distribution and Geo-Distributed System

Considering the global nature of education and the need for efficient access, the architecture follows a geo-distributed pattern. The system is deployed in three separate regions, namely Europe, Asia Pacific, and the US. This distribution allows for localized access and enhanced performance based on the user's geographical location.

🎓 Conclusion

In conclusion, this event-driven architecture provides an effective solution for detecting AI-generated content in the academic world while ensuring academic integrity. By utilizing various AWS services, Lambda functions for orchestration, and machine learning inference, the system processes papers efficiently and accurately. The persistence of data in DynamoDB and the integration of a micro front end enable instructors to access and evaluate the processed information seamlessly. With its global distribution, this architecture accommodates users worldwide, facilitating fair and unbiased academic evaluations.

Highlights:

Event-driven architecture for academic integrity and AI writing
Efficient paper processing with SNS, SQS, and Lambda functions
Breakdown of documents and focus on core writing content
SageMaker for parallel sentence segmentation and inferencing
Challenges and solutions for concurrency in a serverless environment
Java and Python integration for orchestrating Lambda functions
Data persistence and detailed analysis with DynamoDB
Easy data retrieval through a micro front end
Global distribution for optimal performance and accessibility

FAQ:

Q: How many papers can be processed per day with this architecture? A: The architecture is capable of processing up to 2 million papers per day.

Q: What are the criteria for determining which papers are processed? A: The first Lambda function considers language and size limitations to decide which papers to process.

Q: Why are separate Lambdas used for orchestration and document processing? A: The first Lambda, written in Java, serves as an orchestrator and rules engine, while the second Lambda, written in Python, handles document processing and AI-related tasks.

Resources: