In the place I work, there are multiple users who log in to any server through ssh and perform various tasks. In general, they are supposed only to run commands on the servers, not copy files out of the servers. And yes, they have root access.
What I need to find out is, the amount of data transferred out of each server by a user. In other words, from the moment a user logins to the moment they disconnect, I need to find out the total data transferred i.e. sent as well as received, between their server and the ssh server. This might help in tracking users who might have transferred lots of sensitive data out of the servers. There are some huge files on the server.
Since the users have root access, nothing you do on the machine itself is trustworthy. You should use a traffic sniffer that is configured to see all of the traffic to and from the server under observation (by setting up a monitor port in a smart ethernet switch). A traffic sniffer won't be able to decrypt the contents of any SSH session, but it can see the quantity of data exchanged, and that's all you're interested in.
I can think of two ways to correlate TCP streams observed on the traffic sniffer with users.
If you can identify which user the traffic belongs to by the IP address of the remote system (the SSH client), for example, if the clients all have static IP addresses, use that.
Otherwise, configure remote syslog on the server under observation and get the information from the authentication log on the remote syslog server. OpenSSH logs a message like this every time a session is opened:
Because the TCP port number and username are both found in the syslog message, you can identify which user the TCP stream (as captured by the packet sniffer) belongs to.
For the second solution, there is still an opportunity for the users to compromise the system. Since they have root access they can modify the log messages that get sent to the remote syslog server if they really want to. Still, using a remote syslog server makes it a little more challenging for the users to interfere with the logging, and impossible to do it after the fact.