hdfs questions - Page 1

Kyle Brandt

Asked: 2014-12-05 14:41:43 +0800 CST

HBASE Space Used Started Climbing Rapidly

7

Update 4,215:
After looking at space usage inside of hdfs, I see that .oldlogs is using a lot of space:

1485820612766  /hbase/.oldlogs

So new questions:

What is it?
How do I clean it up?
How do I keep it from growing again
What caused it to start growing in the first place?
Also .archive is big too, what is that, my snapshots?

Also as homework scollector will no monitor the disk space usage of various hdfs directories....

Also looks like the following error started filling the logs repeatedly around that time, not sure what they mean exactly:

2014-11-25 01:44:47,673 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog
java.io.IOException: Reflection
    at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:310)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1405)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1349)
    at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1511)
    at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1301)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:308)
    ... 5 more
Caused by: java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[10.7.0.231:50010, 10.7.0.233:50010], original=[10.7.0.231:50010, 10.7.0.233:50010])
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:857)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:917)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:821)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)
2014-11-25 01:44:47,673 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: Error while syncing, requesting close of hlog

My Journey:

On my HBASE cluster that stores openTSBD data, my diskspace started to climb rather rapidly (even though from what I can tell our insert rate has been consistent):

enter image description here

The disks that are increasing are the HDFS storage disks. The directories are roughly evenly sized.

My setup is a HBASE cluster (made with cloudera) that has 3 machines with an hdfs replication factor of 3. There is also another cluster with a single machine that the main cluster replicates to. The replica doesn't show this same change in growth:

enter image description here

I am taking snapshots on the master, but list_snapshots from hbase shell doesn't show any going back more than a day, so I think those are being culled as they should be. My hbase experience isn't great, any suggestions on what else to look at?

Making Progress...:

[root@ny-tsdb01 ~]# hadoop fs -dus /hbase/*
dus: DEPRECATED: Please use 'du -s' instead.
3308  /hbase/-ROOT-
377401  /hbase/.META.
220097161480  /hbase/.archive
0  /hbase/.corrupt
1537972074  /hbase/.logs
1485820612766  /hbase/.oldlogs
8948367  /hbase/.snapshot
0  /hbase/.tmp
38  /hbase/hbase.id
3  /hbase/hbase.version
192819186494  /hbase/tsdb
905  /hbase/tsdb-meta
899  /hbase/tsdb-tree
1218051  /hbase/tsdb-uid

JasCav

Asked: 2014-04-16 14:04:58 +0800 CST

java.lang.NullPointerException When Doing A Read in HDFS

4

I have had a 10 node HBase cluster up and running for the past 4 months. The cluster was setup on VMs in a corporate environment which I do not control, but everything has been working great...until today.

Today, every part of the system was down. I restarted the system and everything came back up for a bit, but then would go down again (particularly HBase...but I think that was because of this following issue).

There is an error in the HDFS logs that says:

HdfsCanaryCdh4{hdfs://hbase-1.internal:8020} for hdfs://hbase-1.internal:8020: Failed to read /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2014_04_15-17_39_25. Error: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException
    at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:334)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1343)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

When I jump onto the Name Node and run:

sudo -u hdfs hdfs dfs -cat /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2014_04_15-17_39_25

I get back a single line that says: cat: java.lang.NullPointerException.

I also double checked that the disks weren't full (they aren't) and that I have connectivity (everything appears normal - nobody has touched this system as I was the only one with access).

I am completely stumped what is happening here. I can provide more details if needed, but I'm not even sure where to go from here.

Update

Per Mark's request in the comments, the output of:

sudo -u hdfs hdfs dfs -lsr /tmp/

is

drwxrwxrwx   - hdfs supergroup          0 2014-04-16 09:48 /tmp/.cloudera_health_monitoring_canary_files
-rw-rw-rw-   3 hdfs supergroup         56 2014-04-15 16:59 /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2014_04_15-16_59_24
[continues like this for all the files in the directory]

BigChief

Asked: 2011-08-12 07:22:19 +0800 CST

Hadoop HDFS: set file block size from commandline?

6

I need to set the block-size of a file when I load it into HDFS, to some value lower than the cluster block size. For example, if HDFS is using 64mb blocks, I may want a large file to be copied in with 32mb blocks.

I've done this before within a Hadoop workload using the org.apache.hadoop.fs.FileSystem.create() function, but is there a way to do it from the commandline ?

monster

Asked: 2011-04-23 03:20:30 +0800 CST

Ceph: Why is a greater number of "placement groups" a "bad thing"?

5

I have been researching distributed databases and file systems, and while I was originally mostly interested in Hadoop/HBase because I'm a Java programmer, I found this very interesting document about Ceph, which as a major plus point, is now integrated in the Linux kernel.

Ceph as a scalable alternative to the HDFS

There is one thing that I didn't understand, and I'm hoping one of you can explain it to me. Here it is:

A simple hash function maps the object identifier (OID) to a placement group, a group of OSDs that stores an object and all its replicas. There are a limited number of placement groups to create an upper bound on the number of OSDs that store replicas of objects stored on any given OSD. The higher that number, the higher the likelihood that a failure of multiple nodes will lead to data loss. If, for example, each OSD has replica relations to every other OSD, the failure of just three nodes in the entire cluster can wipe out data that is stored on all three replicas.

Can you explain to me why a greater number of placement groups increases the likelihood of data loss? I would have thought it's the other way around.

Van Gale

Asked: 2009-07-15 02:13:54 +0800 CST

What is meant by "streaming data access" in HDFS?

4

According to the HDFS Architecture page HDFS was designed for "streaming data access". I'm not sure what that means exactly, but would guess it means an operation like seek is either disabled or has sub-optimal performance. Would this be correct?

I'm interested in using HDFS for storing audio/video files that need to be streamed to browser clients. Most of the streams will be start to finish, but some could have a high number of seeks.

Maybe there is another file system that could do this better?

HBASE Space Used Started Climbing Rapidly

java.lang.NullPointerException When Doing A Read in HDFS

Hadoop HDFS: set file block size from commandline?

Ceph: Why is a greater number of "placement groups" a "bad thing"?

What is meant by "streaming data access" in HDFS?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

Questions[hdfs](server)