Ping a Specific Port

Question

Kyle Brandt

Asked: 2014-12-05 14:41:43 +0800 CST2014-12-05 14:41:43 +0800 CST 2014-12-05 14:41:43 +0800 CST

HBASE Space Used Started Climbing Rapidly

772

Update 4,215:
After looking at space usage inside of hdfs, I see that .oldlogs is using a lot of space:

1485820612766  /hbase/.oldlogs

So new questions:

What is it?
How do I clean it up?
How do I keep it from growing again
What caused it to start growing in the first place?
Also .archive is big too, what is that, my snapshots?

Also as homework scollector will no monitor the disk space usage of various hdfs directories....

Also looks like the following error started filling the logs repeatedly around that time, not sure what they mean exactly:

2014-11-25 01:44:47,673 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog
java.io.IOException: Reflection
    at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:310)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1405)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1349)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1511)
    at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1301)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:308)
    ... 5 more
Caused by: java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[10.7.0.231:50010, 10.7.0.233:50010], original=[10.7.0.231:50010, 10.7.0.233:50010])
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:857)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:917)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:821)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
2014-11-25 01:44:47,673 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: Error while syncing, requesting close of hlog

My Journey:

On my HBASE cluster that stores openTSBD data, my diskspace started to climb rather rapidly (even though from what I can tell our insert rate has been consistent):

enter image description here

The disks that are increasing are the HDFS storage disks. The directories are roughly evenly sized.

My setup is a HBASE cluster (made with cloudera) that has 3 machines with an hdfs replication factor of 3. There is also another cluster with a single machine that the main cluster replicates to. The replica doesn't show this same change in growth:

enter image description here

I am taking snapshots on the master, but list_snapshots from hbase shell doesn't show any going back more than a day, so I think those are being culled as they should be. My hbase experience isn't great, any suggestions on what else to look at?

Making Progress...:

[root@ny-tsdb01 ~]# hadoop fs -dus /hbase/*
dus: DEPRECATED: Please use 'du -s' instead.
3308  /hbase/-ROOT-
377401  /hbase/.META.
220097161480  /hbase/.archive
0  /hbase/.corrupt
1537972074  /hbase/.logs
1485820612766  /hbase/.oldlogs
8948367  /hbase/.snapshot
0  /hbase/.tmp
38  /hbase/hbase.id
3  /hbase/hbase.version
192819186494  /hbase/tsdb
905  /hbase/tsdb-meta
899  /hbase/tsdb-tree
1218051  /hbase/tsdb-uid

2 Answers

Voted

Kyle Brandt · Answer 1 · 2014-12-05T17:56:31+08:00

Kyle Brandt

2014-12-05T17:56:31+08:002014-12-05T17:56:31+08:00

I think my replication went bad. I seems that .oldlogs is where the Write-Ahead-Logs (WALS) go according to this safari article. They should be cleaned up but were not for some reason.

I used the following to clean it up:

HADOOP_USER_NAME=hdfs hadoop fs -rm -skipTrash /hbase/.oldlogs/*

Since I noticed this while working on building a replacement cluster as the replication target, I've stopped replication for now, and doesn't seem that the directory is growing unbounded anymore. This is something I will monitor going forward. In particular because it seems this might be a bug according to hbase issue 3489.

2

Aleksandr Levchuk · Answer 2 · 2014-12-07T00:58:00+08:00

Aleksandr Levchuk

2014-12-07T00:58:00+08:002014-12-07T00:58:00+08:00

HBase is crash-safe and .logs is the location of WALs (hlogs) that are needed for crash recovery. Once the memory of regionservers is flushed to hfiles, WALs are no longer needed for crash recovery and they are moved to .oldlogs. Old logs are usually used for cluster-to-cluster replication. .oldlogs have a configurable retention period, for example 3 days. In this case, if something broke your replication, then you have 3 days to fix replication without the need to reseed. Hope this helps to investigate what happened on Nov 24 causing a growth in .oldlogs size and when to expect automatic deletion of hlogs in .oldlogs

1

HBASE Space Used Started Climbing Rapidly

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?