How can I use docker without sudo?

Question

BeMy Friend

Asked: 2014-10-28 20:15:34 +0800 CST2014-10-28 20:15:34 +0800 CST 2014-10-28 20:15:34 +0800 CST

How can I delete a file if it starts with <html> in bash?

772

I need a bash command to delete the entire file if the file itself begins with <html>.

I'm not sure the best way to go about this...

Context: I download a series of files via curl requests. Most time the downloads and processing work fine. But other times the download request results in a 404 for whatever reason. When I get those, the contents of the downloaded file begins with a html tag. When the rest of my processing hits this file, it hangs. So I want to run a command prior to my other processing to cat each of the files and delete the file if it has this html tag.

4 Answers

Voted

hvd · Answer 1 · 2014-10-29T04:32:45+08:00

hvd

2014-10-29T04:32:45+08:002014-10-29T04:32:45+08:00

To address the question that prompted you to ask this one, rather than the one you actually asked:

curl can tell you the status code in addition to downloading the file. You do not need to check the file's contents for that. An example of how to check the status is

status=$(curl -w '%{http_code}' "${url}" -o "${file}")
test "${status}" -eq 200 || rm -- "${file}"

The various options you can use with -w are documented in the manual, and depending on your needs, you may want to extend this to output more information and parse it, and/or change the check of the status code to allow more than merely 200.

20

Sylvain Pineau · Answer 2 · 2014-10-28T23:35:35+08:00

Sylvain Pineau

2014-10-28T23:35:35+08:002014-10-28T23:35:35+08:00

You could use this find command to delete all files only containing only the <html> pattern in the first line:

find . -type f -exec sh -c 'sed q "$0" | grep -qP "^<html>$" && rm "$0"' {} \;

12

Seth · Answer 3 · 2014-10-28T20:21:34+08:00

Seth

2014-10-28T20:21:34+08:002014-10-28T20:21:34+08:00

I just tested this, it works.

Run shopt first because we don't want to parse ls:

shopt -s nullglob

then use a simple bash for loop to find files that begin with <html> and remove them:

for i in *; do if [ "$(head -n 1 "$i")" == '<html>' ]; then rm "$i"; fi; done

It would be safer to use:

for i in *; do if [ "$(head -n 1 "$i")" == '<html>' ]; then rm -i "$i"; fi; done

to have rm ask before removing any files, just in case.

Note that shopt isn't strictly needed but it prevents certain issues from occurring if the directory is empty or there happens to be a file with an asterisk in its name.

8

Siyuan Ren · Answer 4 · 2014-11-02T08:07:46+08:00

Siyuan Ren

2014-11-02T08:07:46+08:002014-11-02T08:07:46+08:00

Not every automating task should be done with shell. Here is a Python script instead

#!/usr/bin/env python
import os

def is_html_file(file_name):
    # Actually, try/except is better
    # But not very readable for someone not familiar with python
    if not os.path.isfile(file_name):
        return False
    with open(file_name, 'rb') as f:
        # A lot of HTML file starts with doctype
        # It is better to check that too
        return f.read(6) == '<html>'

def main():
    # Use os.walk if recursion is needed
    for fn in os.listdir('.'):
        if is_html_file(fn):
            print 'Removing', fn, '...'
            os.remove(fn)

main()

Maybe it is more verbose than the equivalent bash commands, but it is

More readable
More extensible
Never ever going to be screwed up by file names with spaces and shell metacharacters, however careless you are.

1

How can I delete a file if it starts with <html> in bash?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?