I'm having trouble with NFS over RoCE on Ubuntu 16.04 using the latest OFED package provided be Mellanox (MLNX_OFED_LINUX-3.3-1.0.4.0-ubuntu16.04-x86_64.tgz
). My cards are Mellanox 10Gbe and are RoCE v1 enabled.
Works with Inbox drivers/software but not no much with latest OFED
I managed to get NFS working with RoCE by following the docs on this site using the Inbox drivers/software (included with Ubuntu 16.04). I was having some little issues and I know the Ubuntu stuff is quite out of date so I wanted to install the latest OFED/mlx4 drivers, etc... as per recommendations on mellanox.com. So I did that. All went as planned. IP functionality is all there and RDMA tools/tests all work. Everything seems to work great. Except one thing.
The svcrdma and xprtrdma modules won't load. So no RDMA support for NFS for me. I get the following errors. I also get the same error if I install only the latest mlx4 driver from the Mellanox site and leave the rest of the packages alone.
I have a feeling this can be resolved somehow - like by recompiling kernel modules and such but that is over my head at the moment. Or maybe I just messed something up (crossing fingers)? Can anyone help?
Someone commented on this Mellanox community article that they had the same issue with Ubuntu 14.04: https://community.mellanox.com/docs/DOC-2132 According to the same document, it should work just fine with CentOS 7. What's the difference?
The end result I want is to have the latest driver and software (preferably) working on Ubuntu 16.04 with NFS over RoCE. If not the latest OFED package, at least the latest mlx4 driver. I read somewhere that newer kernel versions will have updated drivers and RDMA code (I forgot most of what I read). If this goes nowhere, my answer may have to be to wait for a newer Ubuntu release.
Thanks
Error messages when loading modules
NFS server:
# modprobe svcrdma
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument
dmesg errors:
[105699.696980] rpcrdma: Unknown symbol rdma_event_msg (err 0)
[105699.697056] rpcrdma: disagrees about version of symbol ib_create_cq
[105699.697059] rpcrdma: Unknown symbol ib_create_cq (err -22)
[105699.697069] rpcrdma: disagrees about version of symbol rdma_resolve_addr
[105699.697071] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
[105699.697183] rpcrdma: Unknown symbol ib_event_msg (err 0)
[105699.697213] rpcrdma: disagrees about version of symbol ib_dereg_mr
[105699.697215] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
[105699.697224] rpcrdma: disagrees about version of symbol ib_query_qp
[105699.697226] rpcrdma: Unknown symbol ib_query_qp (err -22)
[105699.697236] rpcrdma: disagrees about version of symbol rdma_disconnect
[105699.697238] rpcrdma: Unknown symbol rdma_disconnect (err -22)
[105699.697245] rpcrdma: disagrees about version of symbol ib_alloc_fmr
[105699.697247] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
[105699.697294] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
[105699.697295] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
[105699.697301] rpcrdma: disagrees about version of symbol rdma_resolve_route
[105699.697303] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
[105699.697398] rpcrdma: disagrees about version of symbol rdma_bind_addr
[105699.697400] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
[105699.697441] rpcrdma: disagrees about version of symbol rdma_create_qp
[105699.697443] rpcrdma: Unknown symbol rdma_create_qp (err -22)
[105699.697479] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
[105699.697487] rpcrdma: disagrees about version of symbol ib_destroy_cq
[105699.697489] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
[105699.697494] rpcrdma: disagrees about version of symbol rdma_create_id
[105699.697496] rpcrdma: Unknown symbol rdma_create_id (err -22)
[105699.697582] rpcrdma: disagrees about version of symbol rdma_listen
[105699.697584] rpcrdma: Unknown symbol rdma_listen (err -22)
[105699.697587] rpcrdma: disagrees about version of symbol rdma_destroy_qp
[105699.697589] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
[105699.697597] rpcrdma: disagrees about version of symbol ib_query_device
[105699.697599] rpcrdma: Unknown symbol ib_query_device (err -22)
[105699.697606] rpcrdma: disagrees about version of symbol ib_get_dma_mr
[105699.697607] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
[105699.697617] rpcrdma: disagrees about version of symbol ib_alloc_pd
[105699.697618] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
[105699.697673] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
[105699.697734] rpcrdma: disagrees about version of symbol rdma_connect
[105699.697736] rpcrdma: Unknown symbol rdma_connect (err -22)
[105699.697769] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
[105699.697842] rpcrdma: disagrees about version of symbol rdma_destroy_id
[105699.697844] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
[105699.697872] rpcrdma: disagrees about version of symbol rdma_accept
[105699.697874] rpcrdma: Unknown symbol rdma_accept (err -22)
[105699.697882] rpcrdma: disagrees about version of symbol ib_destroy_qp
[105699.697883] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
[105699.697964] rpcrdma: disagrees about version of symbol ib_dealloc_pd
[105699.697965] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)
NFS client:
# modprobe xprtrdma
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument
dmesg errors:
[106055.692454] rpcrdma: Unknown symbol rdma_event_msg (err 0)
[106055.692480] rpcrdma: disagrees about version of symbol ib_create_cq
[106055.692481] rpcrdma: Unknown symbol ib_create_cq (err -22)
[106055.692484] rpcrdma: disagrees about version of symbol rdma_resolve_addr
[106055.692485] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
[106055.692520] rpcrdma: Unknown symbol ib_event_msg (err 0)
[106055.692529] rpcrdma: disagrees about version of symbol ib_dereg_mr
[106055.692530] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
[106055.692532] rpcrdma: disagrees about version of symbol ib_query_qp
[106055.692533] rpcrdma: Unknown symbol ib_query_qp (err -22)
[106055.692536] rpcrdma: disagrees about version of symbol rdma_disconnect
[106055.692536] rpcrdma: Unknown symbol rdma_disconnect (err -22)
[106055.692538] rpcrdma: disagrees about version of symbol ib_alloc_fmr
[106055.692539] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
[106055.692552] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
[106055.692553] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
[106055.692554] rpcrdma: disagrees about version of symbol rdma_resolve_route
[106055.692555] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
[106055.692565] rpcrdma: disagrees about version of symbol rdma_bind_addr
[106055.692565] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
[106055.692573] rpcrdma: disagrees about version of symbol rdma_create_qp
[106055.692574] rpcrdma: Unknown symbol rdma_create_qp (err -22)
[106055.692583] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
[106055.692585] rpcrdma: disagrees about version of symbol ib_destroy_cq
[106055.692585] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
[106055.692587] rpcrdma: disagrees about version of symbol rdma_create_id
[106055.692587] rpcrdma: Unknown symbol rdma_create_id (err -22)
[106055.692613] rpcrdma: disagrees about version of symbol rdma_listen
[106055.692614] rpcrdma: Unknown symbol rdma_listen (err -22)
[106055.692615] rpcrdma: disagrees about version of symbol rdma_destroy_qp
[106055.692615] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
[106055.692617] rpcrdma: disagrees about version of symbol ib_query_device
[106055.692618] rpcrdma: Unknown symbol ib_query_device (err -22)
[106055.692619] rpcrdma: disagrees about version of symbol ib_get_dma_mr
[106055.692620] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
[106055.692622] rpcrdma: disagrees about version of symbol ib_alloc_pd
[106055.692623] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
[106055.692638] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
[106055.692657] rpcrdma: disagrees about version of symbol rdma_connect
[106055.692658] rpcrdma: Unknown symbol rdma_connect (err -22)
[106055.692668] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
[106055.692690] rpcrdma: disagrees about version of symbol rdma_destroy_id
[106055.692690] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
[106055.692698] rpcrdma: disagrees about version of symbol rdma_accept
[106055.692699] rpcrdma: Unknown symbol rdma_accept (err -22)
[106055.692701] rpcrdma: disagrees about version of symbol ib_destroy_qp
[106055.692701] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
[106055.692724] rpcrdma: disagrees about version of symbol ib_dealloc_pd
[106055.692725] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)