Ping a Specific Port

Question

vy32

Asked: 2013-01-30 09:22:44 +0800 CST2013-01-30 09:22:44 +0800 CST 2013-01-30 09:22:44 +0800 CST

How do we configure Lustre to block client requests when under load, rather than failing?

772

We are using Lustre in a cluster with approximately 200TB of storage, 12 Object Storage Targets (that connect to a DDN storage system using QDR Infiniband), and roughly 160 quad and 8-core compute notes. Most of the users of this system have no problems at all, but my tasks are I/O intensive. When I run an array job that has 250-500 processes that are simultaneously pounding the file system typically between 10 and 20 of my processes will fail. The log files indicate that the load on the OSTs are going over 2 and that the Lustre client is returning either bad data or failed read() function calls.

Currently the only way we have of resolving my problem is to run fewer simultaneous jobs. This is unsatisfactory, because there is no way to know in advance if my workload will be CPU-heavy or I/O heavy. Besides, just turning down the load isn't the way to run a supercomptuer: we would like it to run slower when running under load, not produce incorrect answers.

I'd like to know how to configure Lustre so that clients block when the load on the OSTs goes too high, rather than having the clients get bad data.

How do I configure Lustre to make the clients block?

1 Answers

Voted

utopiabound · Answer 1 · 2013-09-25T10:24:52+08:00

Best Answer

utopiabound

2013-09-25T10:24:52+08:002013-09-25T10:24:52+08:00

Have you thought of adding more OSSs and spreading out the OSTs? That should decrease the load. In that vein, what kind of I/O pattern are you doing? Do you have many large files, if so, are they striped? Default striping is 1, which means each file resides on only 1 OST, that can be changed on a per file (at create) or on a per directory basis (for new files).

You could also try increasing the timeouts in lustre (lctl get_param/set_param) namely:

timeout
ldlm_timeout

1

How do we configure Lustre to block client requests when under load, rather than failing?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?