Is there a name-parsing tool that is part of the official RPM tool package?
I have a list of filenames. Each is the filename of an RPM package. I don't have the actual packages, just the filenames. For each I need to extract the package name and version ($NAME and $VERSION). The reason I need this is I am writing a script that then makes sure that "yum install $VERSION" installs $VERSION. This is part of a system that builds packages and verifies they are properly uploaded.
The list of filenames looks like:
$ cat /tmp/packages.txt
/home/builder/packages/testing-dev/CentOS/6/x86_64/emacs-mercurial-2.8-3.el6.x86_64.rpm
/home/builder/packages/testing-dev/CentOS/6/x86_64/emacs-mercurial-el-2.8-3.el6.x86_64.rpm
/home/builder/packages/testing-dev/CentOS/6/x86_64/mercurial-2.8-3.el6.x86_64.rpm
/home/builder/packages/testing-dev/CentOS/6/x86_64/mercurial-hgk-2.8-3.el6.x86_64.rpm
/home/builder/packages/testing-dev/CentOS/6/x86_64/python-redis-2.8.0-2.el6.noarch.rpm
/home/builder/packages/testing-dev/CentOS/6/x86_64/redis-2.6.16-1.el6.1.x86_64.rpm
/home/builder/packages/testing-dev/CentOS/6/x86_64/sei_dnsmaster-1.0-99.el6.x86_64.rpm
I found the following code which is a BASH function that does the task:
function parse_rpm() { RPM=$1;B=${RPM##*/};B=${B%.rpm};A=${B##*.};B=${B%.*};R=${B##*-};B=${B%-*};V=${B##*-};B=${B%-*};N=$B;echo "$N $V $R $A"; }
for i in $(</tmp/packages.txt) ; do
parse_rpm $i
done
It works. Mostly. There are some exceptions:
$ parse_rpm CentOS/6/x86_64/sei_dnsmaster-1.0-99.el6.x86_64.rpm
sei_dnsmaster 1.0 99.el6 x86_64
Notice that it didn't get the version correctly (it should be 1.0-99)
I'm wondering (1) if there is a tool in the rpmdev package that does this correctly. (2) If not, is there an official regex I could use. (3) What is the python equivalent of that regex?
Thanks in advance!
You don't need to do any of this; RPM has a query format argument which will let you specify exactly the data you want to receive. It will even output without line endings if you don't specify them.
For instance:
The complete list of variables you can use can be obtained with:
Note that in the case of
RELEASE
, output like84.el6
is normal and expected, since this is actually how RPM packages are versioned when packaged by or for a distribution.I've been told the official way to do what I'm seeking is in Python:
I've written a short Python program that does what I need. I will offer the script to the rpmdev project for inclusion.
I worked out regular expressions that fit all the data I was able to test them with. I had to use a mixture of greedy and non-greedy matches. That said, here is my perl and python versions:
Perl:
Python:
I'd rather have a regex that comes from the RPM project. The one that I invented above will have to do do for now.
Rpm files can have some funky file names in extreme cases, but generally you can split the NVR on the hyphens. The catch is the N (name) portion of the NVR may contain hyphens and underscores, but the V(version) and R(release) are guaranteed to not have any extraneous hyphens. So you can start by trimming off the VR portion to derive a Name.
Building on that you can isolate the Version and Release portion.
Just split the hyphen again to isolate the part you need. And obviously clean out the arch and rpm file extension strings, which is a given. Just giving you an idea of how it could be approached in bash.
Use the -q --queryformat options from rpm as said before, if you want to do this on a non installed package you can specify the rpm with the
-p
option, like this:e.g.
gives me
so just splitting the filename is wrong!
so pay attention, this is not the correct details of the rpm, e.g.
1.fedora
is actually1.fc10
in the rpm.IMHO the simplest shell way is:
That is: reverse each line, using slash cut just the first part (emanelif), then using hyphen cut all but the first two parts (i.e. leave behind ESAELER including emanelif eth fo tser and NOISREV) and reverse the enil back.
With your example file:
To get other parts is excercise on reading cut(1).
If you're familiar with Regular Expressions and/or Perl, that's quite easy.
or the regex alone :
If you split it that's :
[^\-]+
(escaped because hyphen have a special meaning in character groups)[^\-]+?
([^\-]+?)
([^\-]+?)-
.rpm
) :([^\-]+?)-(.*).rpm$
(the dollar means "end of line")m#([^\-]+?)-(.*).rpm$#
Done ! Just get both parts in the variables
$1
and$2
Comment on the first one-liner :
I was in a directory with many rpm files, hence the
ls
.perl -p
is equivalent to ;Which explain that I had to put a null-string in
$_
to avoid perl printing back the line after I have extracted and custom-printed it. Note that I could have used substitutions to aoid this little 'hack'.You can utilize
dnf info
. Here is an example Bash script to get values and to set as a variable:It will give a result even if the package is not installed.
By far the easiest way is, to create a regex which will do one thing, work ;-) Substring Regex w're disc.:
That's the substring regex in perl !
Example:
Output:
4.8.5-36.0.1.el7_6.2.x86_64
So the version is right. The regex covers 99,9% of RPM packages names in a single line ;-)
Short explanation: (?:) Non-capturing group (i don't want this part -> use it for speed. Always)
\w+: Word Characters (including _)
? Optional match
\S+: Non-Space character
{1,} Not-capture-match the whole group 1 or more
Worx for you and for me !