How can I use docker without sudo?

Question

Anton Duzenko

Asked: 2024-06-27 23:30:10 +0800 CST2024-06-27 23:30:10 +0800 CST 2024-06-27 23:30:10 +0800 CST

How to move files matching patterns from file?

772

I have a file storage directory used together with a MySQL DB.

Some of the files in the directory are orphaned (i.e. created by mistake, deleted in the DB but not on disk, otherwise not used).

I was able to generate list of such files without file extension but now what's the best way to move them out of the storage directory. The problem is that the storage is multi-level, so I have to find each file first somehow.

Sample of the orphan list content (200K lines in total):

Directory structure (example):

If you wonder how I ended up with such a file:

first, saved list of files in directory recursively to one file per https://stackoverflow.com/a/5456136/505984
secondly, dumped DB table id's to another file with MySQL CLI (because each filename without the extension matches the DB record ID)
diff'ed the two files as advised here: https://stackoverflow.com/a/25407317/505984

2 Answers

Voted

knu · Answer 1 · 2024-06-28T01:05:18+08:00

Best Answer

knu

2024-06-28T01:05:18+08:002024-06-28T01:05:18+08:00

I will concentrate on making the selection because you said:

The problem is that the storage is multi-level, so I have to find each file first somehow.

Assuming the bash shell:

find /path/to/data/ -type f \
| grep -f <( awk '{print $0".pdf"}' tmp/orphans ) \
| further processing goes here, possibly with xargs

Clarification: find makes a list of all files in the tree.

grep -f somefile applies a filter to the piped output of find

<( something ) is an ephemeral file

awk '{print $0".pdf"}' appends every line in the orphans list with ".pdf", so grep -f does not match directory names

tmp/orphans is the input list of orphans

2

steeldriver · Answer 2 · 2024-07-01T07:34:50+08:00

Your question isn't exactly clear, but assuming you want to

turn a list of files without extension into a list of patterns that will match files with arbitrary extensions
pass that list of patterns efficiently to a sequence of find commands

then you could try something like this, using sed and GNU parallel (available from the Ubuntu universe repository):

export srcdir=path/to/data

sed 's/$/.*/' orphans.txt | parallel -I%% --env srcdir -X '
  find "$srcdir" -type f \( %% -false \) -print
' ::: -name :::: - ::: -o

With the -X option, parallel will try to fit as many arguments as possible into each find invocation; the arguments are drawn from three input sources:

the string -name
a line from standard input
the string -o

thus building up an argument list that looks like -name '1406.*' -o -name '6179.*' -o -name '17526.*' -o ... (which you can verify by adding the parallel --dry-run option). The final -false predicate mops up the trailing -o and -I%% changes parallel's replacement string from the default {} so as not to conflict with find's default replacement string.

This is certainly more efficient than running find 200,0000 times - but may not be as efficient as running a single find command and grepping the result, as suggested in this answer. On my 64-bit Ubuntu 24.04 VM it manages to cram a file of 200,0000 integer IDs (generated from the bash $RANDOM variable) into approximately 100 invocations of find.

If that successfully identifies the files, you can change -print to something like -exec mv -vnt "$dstdir" {} + where similar to srcdir you export dstdir=path/to/newdata and pass it to parallel via --env destdir:

export srcdir=path/to/data
export dstdir=path/to/newdata

sed 's/$/.*/' orphans.txt | 
  parallel -I%% --env srcdir --env dstdir -X '
    find "$srcdir" -type f \( %% -false \) -exec mv -vnt "$dstdir" {} +
  ' ::: -name :::: - ::: -o

(you can use -execdir in place of -exec provided $dstdir is an absolute path).

How to move files matching patterns from file?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?