cat -e file.txt
gives:
{"yellow":"mango"}^M$
^M$
{"yellow":"banana"}^M$
^M$
{"yellow":"blabla"}^M$
^M$
and I would like to just have:
{""yellow":"mango"}^M$
{"yellow":"banana"}^M$
{"yellow":"blabla"}^M$
in place for all files with txt extention in folder. So I tried:
find . -type f -name "*.txt" -print0 | xargs -0 sed -i "s/^M$^M$/^M$/g"
to no avail. Does anyone have a better idea?
head -n 3 file.txt | od -bc
yields:
0000000 173 042 171 145 154 154 157 167 042 072 042 155 141 156 147 157
{ " y e l l o w " : " m a n g o
0000020 042 175 015 012 015 012 173 042 142 141 142 141 142 042 072 042
" } \r \n \r \n { " b a b a b " : "
0000040 155 141 156 147 157 042 175 015 012
m a n g o " } \r \n
0000051
this:
awk 1 RS='\r\n' ORS= < file.txt
removes the new lines completely (so it's not good: I want to keep one of the successive two on each line, but it does something).
You can use
sed -z 's/\r\n\r\n/\r\n/g'
.Normally
sed
only works on one line at a time. By using the-z
option,sed
will work on lines, which are seperated by0
bytes, which normally don't exist in a text file, so the whole file will be treated as one line and newlines can be replaced.(found on stackoverflow and added explanation)
You can also delete lines that contain only the carriage return.
With GNU Sed:
For a minimal but POSIX compliant machine (here we need to generate the carriage return with Printf):
^
matches the start-of-line and the last$
, the end-of-line (\n
).For example:
If it's okay to remove all blank lines, you can do:
And if you prefer to overwrite your file(s), you can use the
-i
(in-place) switch:The above line will copy the original files as
*.bak
files. If you don't care about having backups, then you can just leave out the.bak
part, like this:(You can even use wildcards, so instead of
file1 file2 file3 ...
you can writefile*
.)The advantage of this approach is that it makes changes to your files all at once (instead of having to run it once for each file).
But remember: This will only keep lines that contain at least one non-whitespace character. So if a line consists only of five spaces, a tab, a carriage return, and a line-feed character, it won't be kept.
I think you could use awk's Record Separator and Output Record Separator to achieve the goal, which should be more efficient on very large files than
sed -z ...
.Using Raku (the language formerly known as Perl6)
The example above only prints lines that contain non-whitespace characters (
\S
matches a single character that is not whitespace). A very readable version below:HTH.
https://raku.org
https://rakudo.org/downloads