I got Infiniband running on RHEL 6.3
[root@master ~]# ibv_devinfo
hca_id: mthca0
transport: InfiniBand (0)
fw_ver: 4.7.927
node_guid: 0017:08ff:ffd0:6f1c
sys_image_guid: 0017:08ff:ffd0:6f1f
vendor_id: 0x08f1
vendor_part_id: 25208
hw_ver: 0xA0
board_id: VLT0060010001
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 2
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
but it's only working as root.
when trying from a non-super user, I got nothing :
[nicolas@master ~]$ ibv_devices
device node GUID
------ ----------------
mthca0 001708ffffd06f1c
So, how to allow regular users to use infiniband ?
Ok, this is a bug in RHEL 6.3 release
Udev rule is missing :
/etc/udev/rules.d/90-rdma.rules
see https://www.centos.org/modules/newbb/viewtopic.php?topic_id=38586&forum=55
It is better to simply update the package with the repaired version, rdma-3.3-4. More details here: http://rhn.redhat.com/errata/RHBA-2012-1423.html
here is more complete info for persons looking to solve this Issue faced on RH 6.3 Linux 2.6.32-279.9.1.el6.x86_64 #1 SMP Fri Aug 31 09:04:24 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
1.Create the missing file as root:
on the management node (ie. head node, service node etc)
2.Copy this file via ssh or any preferred method to any compute node in the cluster.
etc
3.Verify that the file is created in folder of every compute node in
/etc/udev/rules.d
4.Restart all the compute nodes and management nodes.
NOTE: a. After the change the user will still get this result when running the command
but don't worry just run your preferred mpi application and will be fine.
b. The issue is regardless the use of any HCA vendor, is directly connected to the OS.
c. This seems to be caused by a change made in upstream to the rdma package (no more udev rules), the infiniband devices get created by the kernel with the wrong permissions. This problem has been reported as by users of CentOS 6.3 and Scientific Linux 6.3
Hope will help others
I guess you get into a similar situation like me.
I ran the rping and ib_write_bw, with the output like
Couldn't allocate MR
this is as Dotan said that
the solution is simple , as here Dotan said https://www.rdmamojo.com/2014/10/11/working-rdma-redhatcentos-7/