Master sed: Delete Lines with Ease
Table of Contents
- Introduction
- Using
sed
to delete lines in a file that match a pattern
- Example: Deleting lines that don't start with five spaces
- Inverted matches using the exclamation point
- Real-world use case: Identifying code igniter config item references
- Introduction to code igniter and config items
- The challenge of multiple code styles
- Writing a script to scan the code base
- Using greedy and specific regular expression Patterns
- Separating matches into separate files
- Comparing and analyzing the results using
sort
and diff
- Programmatically converting config references
- Conclusion
- FAQ
Using sed
to delete lines in a file that match a pattern
In this section, we will explore how to use the sed
command to delete lines in a file that match a specific pattern. This can be useful in various scenarios where we need to remove certain lines from a file Based on a given condition.
Example: Deleting lines that don't start with five spaces
To illustrate the usage of sed
for deleting lines, let's consider a Scenario where we want to delete all the lines in a file that don't start with five spaces. We can achieve this by using the sed
command with a regular expression pattern.
sed '/^[^ ]/d' filename
In the above command, ^[^ ]
represents the regular expression pattern where ^
denotes the start of a line, [^ ]
denotes any character that is not a space, and d
is the sed
command to delete the matching lines. By executing this command, all the lines that don't start with five spaces will be deleted from the file.
Inverted matches using the exclamation point
In some cases, we might need to perform an inverted match using sed
to delete lines that don't match a given pattern. The exclamation point (!
) can be used to negate the pattern.
sed '/^[^ ]/!d' filename
By adding the exclamation point before the d
command, we effectively delete all the lines that do match the pattern rather than the lines that don't.
Real-world use case: Identifying code igniter config item references
Now, let's dive into a real-world use case where the usage of sed
and pattern matching became essential. We will explore how to identify code igniter config item references within a large codebase.
Introduction to code igniter and config items
Code igniter is a PHP web framework widely used for developing web applications. Config items are key-value pairs used to store application settings in code igniter. However, config item references can appear in different styles due to variations in coding practices.
The challenge of multiple code styles
In a specific client project, we encountered a 10-year-old codebase with thousands of different config item references. Surprisingly, there were over a dozen different code styles used to reference these config items. Identifying and resolving these references manually would have been tedious and time-consuming.
Writing a script to scan the code base
To tackle the challenge, we decided to write a script that could scan the entire codebase and identify all the code igniter config item references. This script utilized the grep
command with a combination of greedy and more specific regular expression patterns.
Using greedy and specific regular expression patterns
We started with a greedy match, using the general pattern config
, to identify a large number of config item references, even including false positives. As we analyzed and observed the patterns more closely, we gradually tightened the regular expressions to be more specific, eliminating false positives.
Separating matches into separate files
To analyze the matches efficiently, we separated the results into two separate files: one containing the matches from the initial greedy match and another containing the matches from the more specific pattern. These files followed a similar format to the example shown earlier.
Comparing and analyzing the results using sort
and diff
To distinguish false positives, we used the sort
and diff
commands to compare both files. This allowed us to easily identify the differences between the matches obtained from the greedy match and the more specific pattern. By iterating this process, we gradually removed all the false positives and obtained a refined list of legitimate config item references.
Programmatically converting config references
With the refined list of config item references, we proceeded to programmatically convert them into a different format as part of the client's migration to a different framework component. This step, however, is beyond the scope of this article. Nevertheless, the removal of false positives greatly facilitated the conversion process and ensured the identification of all legitimate references.
Conclusion
In this article, we explored the usage of the sed
command for deleting lines in a file that match or don't match a specific pattern. We also delved into a real-world use case where pattern matching played a crucial role in identifying code igniter config item references within a complex codebase. By using a combination of greedy and specific regular expression patterns, we were able to separate false positives and obtain accurate results.