Build an AI Text Summarizer using Python and Spacy
Table of Contents
- Introduction
- Setting up the Environment
- Preprocessing the Text Data
- Creating a WORD Count Dictionary
- Scoring the Sentences
- Sorting the Sentences
- Generating the Summary
- Conclusion
Building a Text Summarizer using Python and Spacy
In this article, we will walk you through the process of building a text summarizer using Python and the Spacy library. Text summarization is a technique in natural language processing that involves teaching computers to understand and summarize human text. With a text summarizer, we can extract the key information from long Texts and provide a condensed summary. This can be especially useful when dealing with large amounts of news articles or reports.
Setting up the Environment
Before we begin, let's make sure we have the necessary tools and libraries installed. Firstly, we need to install Spacy, a Python library for natural language processing. Open your terminal or command Prompt and run the following command to install Spacy:
pip install spacy
Next, we need to download a pre-trained model for English language processing. Run the following command to download the model:
python -m spacy download en_core_web_sm
Now that we have Spacy installed and the English model downloaded, we can proceed to write the code for our text summarizer.
Preprocessing the Text Data
To begin, we need to preprocess the text data before feeding it into our text summarizer. This involves removing any unnecessary characters or formatting. In our code, we will read the text data from a file and assign it to a variable called text
.
text = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas at mattis tellus, in aliquet nunc. Phasellus in scelerisque sem. Sed in augue eget dolor tincidunt pharetra. Proin condimentum malesuada dolor in finibus. Donec luctus vulputate velit in pharetra. Morbi finibus orci dolor, vitae pulvinar nisl elementum non. Nam vitae mauris non odio placerat hendrerit. Suspendisse potenti. Sed varius pharetra enim, sed luctus tellus interdum non. Sed blandit mauris nec nibh dictum pulvinar. Vestibulum in fermentum diam.
Ut faucibus pellentesque nisl, et mollis lectus aliquam non. Nulla elementum elit et leo lobortis convallis. Proin purus nunc, volutpat vel consequat a, tempor eget odio. Maecenas ut pretium tortor. Suspendisse et dui id risus hendrerit facilisis vitae a est. Sed in elit neque. Integer pharetra at orci at tempus.
Phasellus non cursus justo, et viverra sem. Ut iaculis diam et placerat fringilla. Nullam feugiat pharetra arcu, sit amet consequat nunc bibendum nec. Mauris sed sapien diam. Sed porttitor cursus velit vitae sodales. Maecenas arcu felis, convallis in feugiat vitae, blandit in diam. In hac habitasse platea dictumst. Sed faucibus erat non libero auctor, vel eleifend ipsum lobortis. Aliquam placerat eget mauris in fermentum. Sed tristique, mauris non tincidunt hendrerit, nibh elit viverra nisi, id eleifend sem lectus ut justo. Mauris tempor placerat purus, in gravida mi suscipit eu. Sed condimentum luctus dolor nec malesuada. Sed vitae purus nisl.
Nam varius finibus est vel fringilla. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Sed ligula orci, eleifend a ante ac, porta fringilla enim. Fusce quis nisl eu erat tempor mollis. Aliquam erat volutpat. Curabitur mollis nisl a purus venenatis, eu tincidunt eros iaculis. Vivamus placerat lacinia scelerisque. Sed rutrum magna id turpis iaculis tincidunt. Duis ac nunc non magna pulvinar vehicula nec nec nisl. Sed est nibh, elementum ac aliquet a, tincidunt quis orci. Sed efficitur rhoncus lectus, sed consectetur nunc blandit et. Maecenas a arcu a ipsum venenatis gravida. In hac habitasse platea dictumst. Nullam varius felis sit amet erat efficitur pretium.
"""
Creating a Word Count Dictionary
Once we have the preprocessed text, we will create a word count dictionary to keep track of the frequency of each word in the text. We will iterate through each word in the text and update the count in the dictionary.
word_dict = {}
for word in doc:
word_text = word.lower()
if word_text in word_dict:
word_dict[word_text] += 1
else:
word_dict[word_text] = 1
Scoring the Sentences
Next, we will score each sentence based on the frequency of the words it contains. We will iterate through each sentence in the text, calculate the score by summing the frequencies of its words, and store the score in a list called sentences
.
sentences = []
for i, sentence in enumerate(doc.sents):
sentence_score = 0
for word in sentence:
word_text = word.text.lower()
sentence_score += word_dict.get(word_text, 0)
sentences.append((i, sentence.text.replace('\n', ' '), sentence_score / len(sentence)))
Sorting the Sentences
To generate a summary, we need to sort the sentences based on their scores. We will use the sorted
function and pass a lambda function as the key argument to sort the sentences in descending order.
sorted_sentences = sorted(sentences, key=lambda x: -x[2])
Generating the Summary
Finally, we can generate the summary by selecting the top three sentences with the highest scores. We will iterate through the sorted sentences and append the text of each sentence to a STRING variable called summary_text
.
summary_text = ""
for sentence in sorted_sentences[:3]:
summary_text += sentence[1] + " "
Conclusion
In this Tutorial, we have learned how to build a text summarizer using Python and the Spacy library. With this summarizer, we can extract key information from large texts and generate concise summaries. Feel free to experiment with different texts and explore further features of the Spacy library.
Thank you for reading, and happy coding!
Highlights:
- Building a text summarizer using Python and Spacy
- Preprocessing the text data
- Creating a word count dictionary
- Scoring the sentences
- Sorting the sentences
- Generating the summary
FAQ:
-
Q: Can I use this text summarizer for languages other than English?
A: Yes, you can use different pre-trained models provided by Spacy for languages other than English.
-
Q: How can I improve the accuracy of the text summarizer?
A: You can experiment with different techniques such as using advanced models, incorporating semantic analysis, or applying machine learning algorithms.
-
Q: Can I apply this text summarizer to larger texts?
A: This text summarizer can be applied to larger texts, but it is recommended to optimize the code and consider performance limitations.
Resources: