I am running an ETL process using cron on a daily basis on my server every 2 hourly. The ETL process populates a reporting database which runs Greenplum. I have noticed that the ETL process normally will get stalled and "Idle in Transactions" usually is the one holding it up. For these kinds of processes, how do I go about investigating what kinds of services it originated from ? I am guessing but I am not really sure, as when I run "sudo /etc/init.d/apache2 graceful" it usually clears the Idle transactions off.
I am running the ETL process on an Ubuntu server using the Sun Java. Would appreciate some debugging techniques or solutions to improve the process.
"Idle in Transaction" means that a transaction was started on a database connection and not completed and there is no longer any queries running.
In the process list of the database server (for example:
ps -ef | grep "idle in"
) you will find the connection that is in that state. It will show something like:postgres 15268 12917 0 22:36 ? 00:00:03 postgres: user user x.x.x.x(59830) idle in transaction
The
(59830)
is the port on thex.x.x.x
machine.On the
x.x.x.x
machine, you can then run the following to find which process has established that database connection:netstat -np | grep 59830
This will give you something like:
tcp6 0 0 x.x.x.x:59830 dbserver:5432 ESTABLISHED 25254/java
(or Apache, or whatever is the process). In this example,
25254
is the PID of the process.So that answers your question in the body of your post.
To handle, of course is a more complicated subject. Why is this connection starting a transaction and not completing it = bad coding. Resolution: code correctly.
Note:
Pentaho PDI has a bad habit of also leaving transactions idle for a long time. Let's say you have a step in PDI that updates some rows. It goes:
input step --> filter step --> update step
And let's say you set the commit batch to 100 on the
update step
. You have say, 75 rows through the update step and theinput step
is still pulling in rows and thefilter step
filtering rows but due to the condition, nothing is going over to theupdate step
for a little while because no rows match your criteria in thefilter step
. So what do you have? A database connection which isidle in transaction
(75 rows updated but not committed).So this is all fine except it's annoying for the DBA who gets alerted about that long running transaction.
But now, let's say you have another step that branches off the
filter step
and updates the same table but differently and that somehow, one record is part of the 75 updated rows (but not committed) and thatupdate step 2
now needs to update this row. What happens? A stall.update step 2
can't update the row untilupdate step
has committed the batch.Not saying that it's what you are experiencing as you seem to have discovered, but not confirmed, that your blocking transaction is running under Apache, not PDI. But the above is a description of a problem that can occur to illustrate what is generally happening.