I have several files with text that needs replacing. The text starts and ends with the same pattern each time, but the content in between the patterns is variable. The patterns can appear in the middle of lines, and the content between them often spans multiple lines.
There will only be a single occurrence of the start and end pattern in each file.
I need a command line method to replace the text between the patterns, including the patterns themselves. Outputting to a new file, or editing in place is fine.
A command that operates on a single file will work, as I can loop through the files and apply the command myself. I attempted a sed
solution but could only manage to replace entire lines.
An example of text would be:
Cable Type ID:135, Installation ID:62, Alpha Conductor Origin:
Tolerance Report B74 - 3rd June 1996, Beta Conductor Origin:
Tolerance Report B74 - 3rd June 1996, Phase Conductor Size:
45mm, Security: Security-Start Bs86gKI-734Lw#32_nP/5589Zfb8Wj-
sW93j9b Security-End, Location ID:889, Protective Earth Size:
67mm, Protective Earth Max Current (A): 4, Overload Time...
The start pattern is Security-Start
and the end pattern is Security-End
. I want to replace the patterns and everything in between with the word REDACTED
.
I would like the output to be:
Cable Type ID:135, Installation ID:62, Alpha Conductor Origin:
Tolerance Report B74 - 3rd June 1996, Beta Conductor Origin:
Tolerance Report B74 - 3rd June 1996, Phase Conductor Size:
45mm, Security: REDACTED, Location ID:889, Protective Earth Size:
67mm, Protective Earth Max Current (A): 4, Overload Time...
Please note that the text between the two patterns may be so long that it spans several lines, it is fairly random in length. This is not clear in the example above
Any language which is available by default on an Ubuntu system will be fine. My first thoughts are 'sed' or 'awk', but whatever you're comfortable with will be fine.
It should work for you:
/Security-Start/
search for "Security-Start"s/Security-Start.*Security-End/REDACTED/
at the final result.For more than of two line use this one:
Read here
If the files are not too large, then you could use perl in slurp mode:
The
-0777
command line parameter effectively unsets the record separator so that the whole file is slurped. Thes
regex modifier causes perl to include newline characters in.
, making the expression match across lines.Alternatively, with a sed loop:
With GNU sed, you can replace
t; ba
(branch out on successful replacement; (otherwise) branch to:a
) byTa
(branch to:a
on unsuccessful replacement).A more manual approach would be to replace all newline character in the input file with NULLs, use a simple
perl
non-greedy regex to do the replacement and then put the newlines back:Here's how you could do it with awk: