I have a server which receives a file per client each day into a directory. The filenames are constructed as follows:
uuid_datestring_other-data
For example:
d6f60016-0011-49c4-8fca-e2b3496ad5a7_20160204_023-ERROR
uuid
is a standard format uuid.datestring
is the output fromdate +%Y%m%d
.other-data
is variable in length but will never contain an underscore.
I have a file of the format:
#
d6f60016-0011-49c4-8fca-e2b3496ad5a7 client1
d5873483-5b98-4895-ab09-9891d80a13da client2
be0ed6a6-e73a-4f33-b755-47226ff22401 another_client
...
I need to check that every uuid listed in the file has a corresponding file in the directory, using bash.
I've got this far, but feel like I'm coming from the wrong direction by using an if statement, and that I need to loop through the files in the source directory.
The source_directory and uuid_list variables have been assigned earlier in the script:
# Check the entries in the file list
while read -r uuid name; do
# Ignore comment lines
[[ $uuid = \#* ]] && continue
if [[ -f "${source_directory}/${uuid}*" ]]
then
echo "File for ${name} has arrived"
else
echo "PANIC! - No File for ${name}"
fi
done < "${uuid_list}"
How should I check that the files in my list exist in the directory? I'd like to use bash functionality as far as possible, but am not against using commands if need be.
Walk over the files, create an associative array over the uuids contained in their names (I used parameter expansion to extract the uuid). The, read the list, check the associative array for each uuid and report whether the file was recorded or not.
Here's a more "bashy" and concise approach:
Note that while the above is pretty and will work fine for a few files, its speed depends on the number of UUIDs and will be very slow if you need to process many. If that is the case, either use @choroba's solution or, for something truly fast, avoid the shell and call
perl
:Just to illustrate the time differences, I tested my bash approach, choroba's and my perl on a file with 20000 UUIDs of which 18001 had a corresponding file name. Note that each test was run by redirecting the script's output to
/dev/null
.My bash (~3.5 min)
Choroba's (bash, ~0.7 sec)
My perl (~0.1 sec):
This is pure Bash (i.e. no external commands), and it's the most coincise approach that I can think of.
But performance-wise is really not much better than what you currently have.
It will read each line from
path/to/file
; for each line, it will store the first field in$uuid
and prints a message if a file matching the patternpath/to/directory/$uuid*
is not found:Call it with
path/to/script path/to/file path/to/directory
.Sample output using the sample input file in the question on a test directory hierarchy containing the sample file in the question:
The idea here is not to worry about reporting errors the shell will report for you. If you try to
<
open a file which doesn't exist your shell will complain. In fact, it will prepend your script's$0
and the line number on which the error occurred to the error output when it does... This is good information that is provided by default already - so don't bother.You also don't need to take the file in line-by-line like that - it can be awfully slow. This expands the whole thing in a single shot out to a white-space delimited array of arguments and it handles two at a time. If your data is consistent with your example, then
$1
will always be your uuid and$2
will be your$name
. Ifbash
can open a match to your uuid - and only one such match exists - thenprintf
happens. Otherwise it doesn't and the shell writes diagnostics to stderr about why.The way I'd approach it is to get uuids from file first, then use
find
For readabilty,
Example with a list of files in
/etc/
, looking for passwd, group,fstab, and THISDOESNTEXIST filenames.Since you've mentioned the directory is flat,you could use the
-printf "%f\n"
option to just print filename itselfWhat this doesn't do is to list missing files.
find
's small disadvantage is that it doesn't tell you if it doesn't find a file, only when it matches something. What one could do , however , is to check the output - if the output is empty , then we have a file missingMore readable:
And here's how it performs as a small script:
One could use
stat
as alternative, since it's a flat directory, but the code bellow won't work recursively for subdirectories if you ever decide to add those:If we take the
stat
idea and run with it, we could use the exit code of stat as indication for whether a file exists or not. Effectivelly, we want to do this:Sample run: