I want to start a big simulation on an ubuntu desktop computer. I have physical (not remote) access to this PC. This simulation may take some weeks. The command to start the process is:
mpirun -np 100 icoFoam -parallel | tee log
Where icoFoam
is the executable and -parallel
is needed as its option.
This command prints data in terminal. Some times the terminal is closed or the OS is logged-out randomly during long simulations and due to this, the process terminates. I tried to figure it out by a couple of alternative commands:
nohup mpirun -np 100 icoFoam -parallel > log &
nohup mpirun -np 100 icoFoam -parallel > log & disown &
nohup mpirun -np 100 icoFoam -parallel | tee log & disown &
nohup mpirun -np 100 icoFoam -parallel | tee log & disown & > /dev/null 2>& 1 & nohup mpirun -np 100 icoFoam -parallel > /dev/null 2>& 1 &
systemd-run --scope --user mpirun -np 100 icoFoam -parallel | tee log &
systemd-run --scope --user mpirun -np 100 icoFoam -parallel | tee log & disown &
systemd-run --scope --user nohup mpirun -np 100 icoFoam -parallel | tee log & disown &
tmux
Results
Except form tmux
, using any one of these commands, the process is terminated when I close the terminal.
tmux
is also terminated when I log out from my user account.
My Findings
- As the simplest workaround, I mixed nohup and disown ( from here ).
- I guessed that commands including
tee
, are terminated because ofSIGPIPE
caused by closing the terminal (from here). Therefore I used redirection to a log file or/dev/null
(from here), both of which were also terminated by closing the terminal. - I also examined
systemd-run
. But it also is terminated by closing the terminal. To see whether the program has installed its own handler, I executed this:
nohup mpirun -np 100 icoFoam -parallel > log & grep Sig /proc/$!/status
Which returns
SigIgn: 0000000000000000
Therefore, I guess this is the case, i.e.
mpirun
has installed its own handler overriding the protection ofnohup
(from here).- I don't know if it is possible to send a custom handler to
mpirun
in order not to overridenohup
.
My Question
I want to execute the following command such that it prints output in the terminal as long as the terminal is not closed, and also the process is not terminated by closing the terminal or logging out from the user account.
mpirun -np 100 icoFoam -parallel
OS: Ubuntu 18.04
Executable: OpenFOAM
mpirun (Open MPI): 2.1.1
Update
By log out, I mean pressing log out button (image), not lock-screen (super+L)
Thank you in advance.
The problem is that you are starting the job from within a desktop environment, so the jobs are children of that desktop. When the desktop ends, for whatever reason, all children automatically end, too. 'nohup' won't save them - logout removes the display that output should print to, which should also cause a fatal error.
Consider running tmux in a tty instead of a terminal window. Then the process can run forever regardless of whatever the desktop is doing.
I have the same configuration (Ubuntu 18.04, OpenFoam v7, Open MPI 2.1.1) here and I am facing the same issues. The only solution that helped were the steps described in this post:
screen
and press Enter.In the screen console, you can then input your commands according to your needs, p. e.
Press CtrlA and CtrlD to "detach" the terminal to the created "screen".
mpi
processes.screen -DR
. It should open the last screen.exit
, if you want to exit the screen.Notation: If you created more than one screen,
screen -DR
shows a list with all screen sessions. Typescreen -r [session number]
to go to the screen orscreen -X -S 63896 quit
to quit the screen. It is a bit clumsy workaround, but I hope that helps, looking forward that this bug (or feature?) is resolved in future versions.For further information refer to
man screen
.A second way is to use
setsid
to runmpirun
in a new session. The advantage is that this session is not killed when the terminal is closed (hang-up signal,SIGHUP
), as suggested in general here and more specifically here. The syntax is simple:setsid mpirun -np 100 icoFoam -parallel > log &
In order to terminate
mpirun
manually for any reason, kill one of theicofoam
processes withhtop
, press F9 and send aSIGKILL
by pressing 9. All othericofoam
processes and thempirun
process should then be killed too. As an alternative, typekillall mpirun
as proposed here.