When using ABAQUS 6.14 (but also ABAQUS 2018) on ubuntu 18.04 everything seems to work fine except the termination of the standard
process (the process started when performing an implicit analysis -- if you are not familiar with this it doesn't matter).
The analysis indeed works as one can also see in a log file (the .sta
file, for those who are familiar with abaqus) the message THE ANALYSIS HAS COMPLETED SUCCESSFULLY
. The output database contains the analysis results. However, after the analysis has been completed, the process standard
remains in a sleeping status using 0% CPU and keeping the same amount of RAM as when it was running.
From strace
I get:
[pid 23191] close(8) = 0
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23193] <... futex resumed> ) = 0
[pid 23191] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23191] munmap(0x7f3ab130b000, 327680) = 0
[pid 23191] munmap(0x7f3ab136b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab16db000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0fbb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0ddb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0a0b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab03fb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab050b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab00cb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab02eb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab14eb000, 1114112) = 0
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 8, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 12, NULL <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
Like if the two processes were in a deadlock state. Moreover, the commands
pid -p 7002
and
pid -p 7010
do give an empty output. The dirs /proc/7002
and /proc/7010
do not exist.
The only abaqus-related processes executing are
david 6995 0.0 0.1 295428 51388 pts/0 S 17:00 0:00 /opt/abaqus/6.14-1/code/bin/python /opt/abaqus/6.14-1
david 6998 0.0 0.2 368744 97948 pts/0 S 17:00 0:00 /opt/abaqus/6.14-1/code/bin/python std_inst.com
david 7001 0.1 0.0 122076 20096 pts/0 Sl 17:00 0:03 /opt/abaqus/6.14-1/code/bin/eliT_DriverLM -job std_in
david 7008 0.4 0.5 735812 185364 pts/0 Sl 17:00 0:07 /opt/abaqus/6.14-1/code/bin/standard -standard -acade
On ubuntu 16.04 the exact same version works like a charm. Here the same strace
on ubuntu 16.04 (with the same kernel version as my 18.04, i.e. 4.15.0-29):
3890 close(8) = 0
3892 <... select resumed> ) = 0 (Timeout)
3892 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3892 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3892 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3892 <... futex resumed> ) = 0
3890 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 select(7, [4 5 6], NULL, NULL, {0, 20000} <unfinished ...>
3890 munmap(0x7f29c7adb000, 327680) = 0
3890 munmap(0x7f29c7b3b000, 1114112) = 0
3890 munmap(0x7f29c7eab000, 1114112) = 0
3890 munmap(0x7f29c778b000, 1114112) = 0
3890 munmap(0x7f29c75ab000, 1114112) = 0
3890 munmap(0x7f29c71db000, 1114112) = 0
3890 munmap(0x7f29c6bcb000, 1114112) = 0
3890 munmap(0x7f29c6cdb000, 1114112) = 0
3890 munmap(0x7f29c689b000, 1114112) = 0
3890 munmap(0x7f29c6abb000, 1114112) = 0
3890 munmap(0x7f29c7cbb000, 1114112) = 0
3890 exit_group(0) = ?
3891 +++ exited with 0 +++
3893 +++ exited with 0 +++
3892 +++ exited with 0 +++
3890 +++ exited with 0 +++
3880 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3890
3880 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3890, si_uid=1000, si_status=0, si_utime=107, si_stime=7} ---
Has someone a good idea how to solve this? Or in which direction should I investigate further.
I found a solution that circumvents the deadlock by using a singularity container as proposed by Will Furnass here: http://learningpatterns.me/posts-output/2018-01-30-abaqus-singularity/
Although a bit complicated in the first place, it works like a charm when setup properly. I modified my aliases for abaqus on my host system (Manjaro/Arch linux) such that they point to the install in the singularity container and execute the command in the containers environment. However, since I need Intel Fortran Compiler, I generated a basic centos 7 container and modified it afterwards to install compilers and abaqus (v2019 in this case) rather than using the .def script as proposed by Will Furnass.
It takes some time to setup but now I have a container image I can work with on any system that runs singularity, which is quite nice :)
EDIT: I also tested copying a working install to a more recent linux system (and avoiding a fresh install of abaqus), I can confirm that this didn't work in my case (CentOS 7 install copied to Manajaro system).
I would like to present my work around for this issue. I've made a python wrapper for the abq2018 solver which checks the .sta file for completeness. Once the .sta file is complete, any process named standard will be killed. I've found that the solver exits gracefully when standard is killed and the analysis is complete.
This work around is not a perfect solution. Current issues with this work around:
How to use this workaround:
chmod +x abq
abq job=Job-1
. This will execute Job-1.inp, then this will kill the standard solver once Job-1.sta is completed.Code for abq is below
Dassaults System published a bug-fix this month:
You need to update to
Abaqus 2018
toAbaqus 2018-HF16
from https://software.3ds.com/ more details can be found at https://github.com/willfurnass/abaqus-2017-centos-7-singularity/issues/5#issue-713025844I tried it with updating
Abaqus 2020
toAbaqus 2020-HF5
and it worked for Ubuntu 20.04 as well as Fedora 32.I met this problem with Linux mint 19 too. Abaqus 6.14-5 installed in Linux Mint 19. It cannot be terminated automatically but seen form .sta file, the analysis is completed. I think this problem is related to the kernel. By the way, do you find any solutions now?