How can I use docker without sudo?

Question

Zanna

Asked: 2016-09-18 14:37:40 +0800 CST2016-09-18 14:37:40 +0800 CST 2016-09-18 14:37:40 +0800 CST

Series of sed commands work on command line, but not in a script

772

I'm working with the .csv output of this SE data query which looks like this (only with 5022 entries):

"{
  ""id"": 281952,
  ""title"": ""Flash 11.2 No Longer Supported by Google Play""
}"
"{
  ""id"": 281993,
  ""title"": ""Netbeans won't open in Ubuntu""
}"

(And it has line ^M endings between [number], and ""title""). I need it to look like this:

281952,Flash 11.2 No Longer Supported by Google Play
281993,Netbeans won't open in Ubuntu

I fixed this in a certain text editor which shall remain nameless quite easily, but I wanted to make a script so that I don't have to do it again every time the query is refreshed & so others can use it. I used sed...

This series of commands works perfectly (although it may well be inefficient; it is just a trial-and-error solution):

# Print the ^M and remove them, write to a new file:
cat -v QueryR* | sed 's/\^M//' > QueryNew
# remove all the other junk:
sed -i 's/{//' QueryNew
sed -i 's/}//' QueryNew
sed -i 's/""//g' QueryNew
sed -i 's/^"//' QueryNew
sed -i '/,/{N;/\n.*title:\s/{s/,\n.*title:\s/,\ /}}' QueryNew
sed -i 's/^\s\+//' QueryNew
sed -i '/^\s*$/d' QueryNew
sed -i 's/^id:\ //' QueryNew
sed -i 's/,\ /,/' QueryNew
sed -i 's/\\//g' QueryNew

So, why doesn't this? Only the ^M and {} get removed, and everything else is still there.

#!/bin/bash
cat -v QueryR* | sed 's/\^M//' > QueryNew
sed -i '{
       s/{//
       s/}//
       s/""//g
       s/^"//
       /,/{N;/\n.*title:\s/{s/,\n.*title:\s/,\ /}}
       s/^\s\+//
       /^\s*$/d
       s/^id:\ //
       s/,\ /,/
       s/\\//g
}' QueryNew

I'm sure my mistake is really obvious...

6 Answers

Voted

steeldriver · Answer 1 · 2016-09-18T15:28:12+08:00

Using cat -v to turn CR characters into literal ^M sequences seems fundamentally ugly to me - if you need to remove DOS line endings, use dos2unix, tr, or sed 's/\r$//'

If you insist on using sed, then I suggest you print the bits you do want, rather than trying to delete all the random bits you don't - for example

$ sed -rn -e 's/\"//g' -e 's/(.*): (.*)\r/\2/p' QueryR | paste -d '' - -
281952,Flash 11.2 No Longer Supported by Google Play
281993,Netbeans won't open in Ubuntu

You could get fancy and roll the quote removal into the key-value extraction by matching zero or more quotes at each end of the value sequence

$ sed -rn 's/(.*): \"*([^"]*)\"*\r/\2/p' QueryR | paste -d '' - -
281952,Flash 11.2 No Longer Supported by Google Play
281993,Netbeans won't open in Ubuntu

You could get really fancy and emulate the paste in sed by first joining pairs of lines on the ,\r$ ending and then matching the key-value pairs multiply (g) and non-greedily

$ sed -rn '/,\r$/ {N; s/([^:]*): \"*([^:"]*)\"*\r\n?/\2/gp}' QueryR
281952,Flash 11.2 No Longer Supported by Google Play
281993,Netbeans won't open in Ubuntu

(Personally I'd favor the KISS approach and use the first one).

FWIW, since your input appears to be over-quoted JSON, I'd suggest installing a proper JSON parser such as jq

sudo apt-get install jq

You can then do something like

$ sed -e 's/["]["]/"/g' -e 's/"{/{/' -e 's/}"/}/' QueryR | jq '.id, .title' | paste -d, - -
281952,"Flash 11.2 No Longer Supported by Google Play"
281993,"Netbeans won't open in Ubuntu"

which removes the superfluous quotes and then uses jq to extract the fields of interest - note that jq seems to handle the DOS-style line endings, so there's no need to to take special steps to remove those.

Change to jq '.[]' to dump all the attribute-value pairs.

Credit for inspiration and basic jq syntax taken from Overcoming newlines with grep -o

Zanna · Answer 2 · 2016-09-18T23:47:58+08:00

Zanna

2016-09-18T23:47:58+08:002016-09-18T23:47:58+08:00

I fixed it thanks to steeldriver & further tinkering. Unrefined but works.

sed  '{
       s/"{//
       s/}"//
       s/^"//
       /,\r/{N;/\n.*title.*:\s/{s/,\r\n.*title.*:\s/,/}}
       s/""//g
       s/^\s\+//
       /^\s*$/d
       s/^id:\ //
       s/\\//g
}' QueryR* | tee "$1"

translation:
s/"{// Remove "{
s/}"// Remove }"
s/^"// Remove " from start of line
/,\r/{N;/\n.*title.*:\s/{s/,\r\n.*title.*:\s/,\ /}} match ,\r on one line and [whatever]title[whatever]: on the next line, replace all that with ,
s/""//g Remove all the remaining double double quotes
s/^\s\+// Remove whitespace from start of lines
/^\s*$/d Remove empty lines
s/^id:\ // Remove id: and space after it
s/\\//g Remove backslashes (escape chars for " added to some title fields)
tee "$1" specify an outfile when running the script, for example ./queryclean newquery.csv

5

kcdtv · Answer 3 · 2016-09-18T15:18:37+08:00

kcdtv

2016-09-18T15:18:37+08:002016-09-18T15:18:37+08:00

This is not exactly answering your question or solving your issue, but to get rid off the unwanted characters you can use tr:

cat QueryR | tr -d '}{:"'

and you'll get:

4

Sergiy Kolodyazhnyy · Answer 4 · 2016-09-19T03:43:53+08:00

Sergiy Kolodyazhnyy

2016-09-19T03:43:53+08:002016-09-19T03:43:53+08:00

While the question asks for sed, one could work around sed's issues with Python:

from __future__ import print_function
import sys

with open(sys.argv[1]) as f:
     for line in f:
         if '""id""' in line:
            print(line.strip().split(':')[1],end="")
         if '""title""' in line:
            title = " ".join(line.strip().split(':')[1:])
            print(title.replace('""'," "))

This code is compliant with both python2 and python3 , so either will work

Sample run:

bash-4.3$ cat questions.txt 
"{
  ""id"": 281952,
  ""title"": ""Flash 11.2 No Longer Supported by Google Play""
}"
"{
  ""id"": 281993,
  ""title"": ""Netbeans won't open in Ubuntu""
}"
bash-4.3$ python3 parse_questions.py questions.txt 
 281952,  Flash 11.2 No Longer Supported by Google Play 
 281993,  Netbeans won't open in Ubuntu

4

terdon · Answer 5 · 2016-09-19T03:46:33+08:00

terdon

2016-09-19T03:46:33+08:002016-09-19T03:46:33+08:00

Three more approaches:

awk

$ awk -F'": ' '/\"id\"/{id=$NF;} 
              /\"title\"/{
                t=$NF; 
                sub(/^""/,"",t); 
                sub(/""$/,"",t); 
                print id,t
              }' OFS="" file 
281952,Flash 11.2 No Longer Supported by Google Play
281993,Netbeans won't open in Ubuntu

Perl

$ perl -lne '$id=$1 if /id"":\s*(\d+)/; 
             if(/title"":\s*""(.*)""/){print "$id,$1"}' file 
281952,Flash 11.2 No Longer Supported by Google Play
281993,Netbeans won't open in Ubuntu

GNU grep with perl compatible regexes and simple perl:

$ grep -oP '(id"":\s*\K.*)|(title"":\s*""\K.*(?=""))' file | 
    perl -pe 'chomp if $.%2'
281952,Flash 11.2 No Longer Supported by Google Play
281993,Netbeans won't open in Ubuntu

4

Anwar · Answer 6 · 2016-09-29T10:35:51+08:00

Anwar

2016-09-29T10:35:51+08:002016-09-29T10:35:51+08:00

This is another script written in Ruby. It will retain the commas in title, which can be easily imported into any spreadsheet program without breaking the columns.

csvfile = File.open('query-fixed.csv', 'w')

File.open('QueryResults2.csv') do |f|
    content = f.read
    content.gsub!(/\r\n?/, "\n")
    content.each_line do |line|
        id, title = '', ''
        if line.match('\"id\"')
            id = line.split(':')[1].strip[0..-2]
            csvfile.write(id + ',')
        end
        if line.match('\"title\"')
            title = line.partition(':')[2].scan(/"(.*)"/)[0][0]
            csvfile.write(title + "\n")
        end
    end
end

After the program is run the produced output will look like these

281952,"Flash 11.2 No Longer Supported by Google Play"
281993,"Netbeans won't open in Ubuntu"

1

Series of sed commands work on command line, but not in a script

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?