How can I use docker without sudo?

Question

Kickaha

Asked: 2019-03-19 07:56:15 +0800 CST2019-03-19 07:56:15 +0800 CST 2019-03-19 07:56:15 +0800 CST

How to diagnose the process causing catastrophic failure using digital forensics

772

Ubuntu 18.04 lsb, I'm trying to figure out the best way to diagnose whats happening when the Amazon Ec2 instance (free tier) hangs.

There are experimental services running and there may be / is a memory leak.

For quality of life I'm using a utility called lnav to help me browse system logs. Also I've installed a utility called monitorix to visiualise whats happening.

Can I / how do I to identify the specific process causing the problem from system logs ? Which log might help me? (/var/log/syslog does not help)

These charts show high CPU load associated with system swap space being consumed until catastrophic failure occurs.

But this does not tell me the specific process. How can I do this through the terminal?

Is there some other process monitoring I could to configure ?

Any help appriciated...

Edit: Thanks to the hint from @Rinzwind sar is now installed and cron is running it every 2 mins... but it doesn't give process level info. So with help from this other answer:

pidstat 5 > pidhist.log pipes out to a text file, and running it in persistant session will aid diagnosis when the event happens again.

@heynnema suggested iotop

Running iotop -P -a which is top for file I/O as a totaliser. It indicated that the experimental process (a mono service) was the one consuming the most swap with SWAPIN
****

It's more visible in top

We see can see the same pattern of consumption, and then after restarting the process return to normal ~20% from monitorix.

The system is stable for weeks on end between these random events. The evidence from iotop proves the underlying issue is within the experimental process!

Yet, this is still a run time diagnosis. Is there a way to determine from existing logs which process was at fault after the fact? to do that without without preemptive monitoring and logging.

That proof of what went wrong is the critical issue to be resolved. how can we do that without waiting for it to reoccur if no logging is enabled? kernel logs???

Thanks for any help.

2 Answers

Voted

heynnema · Answer 1 · 2019-03-20T07:03:17+08:00

heynnema

2019-03-20T07:03:17+08:002019-03-20T07:03:17+08:00

From the comments...

We looked at the output of free -h and sysctl vm.swappiness and cat /etc/fstab, and installed iotop to determine why swap if used so much.

There are a few reasons why the system is thrashing.

you don't have enough RAM
you don't have enough swap
vm.swappiness has been modified incorrectly

The fix...

add more RAM
increase /swapfile space
set vm.swappiness to 60-90 (60 is default)

1

Kickaha · Answer 2 · 2019-03-21T06:47:42+08:00

Best Answer

Kickaha

2019-03-21T06:47:42+08:002019-03-21T06:47:42+08:00

We don't add RAM to solve this issue.

Identifying a process causing a memory leak has nothing to do with system configuration.

iotop -P -a helped identify the process consuming swap during reoccurance of the event.

Steps for digital forensic log investigation would be a better solution.

0

How to diagnose the process causing catastrophic failure using digital forensics

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?