Ping a Specific Port

Question

gordon-quad

Asked: 2015-11-17 13:31:53 +0800 CST2015-11-17 13:31:53 +0800 CST 2015-11-17 13:31:53 +0800 CST

Make ceph minimize spread of file parts over OSDs

772

I am considering an option of ceph as distributed filesystem for my home-made MAID (massive array of idle drives).

As far as I understand, Ceph oriented for cluster use and spread data evenly over OSDs (with respect to CRUSH maps) and tries to utilize parallelism of read operations over different nodes.

In my case I don't need to maximize spread and throughput, in ideal case it should fill first N OSDs (where N is replication factor) and only then start filling next N OSDs, to minimize amount of required active drives for adjacent data retrieval.

Can I somehow achieve such behaviour by tweaking placement groups count and CRUSH maps? Or if it is not possible can I at least make ceph stop splitting files into more than one block?

1 Answers

Voted

Diamond · Answer 1 · 2015-11-27T07:46:00+08:00

I don't think something similar to what you want to achieve is possible with ceph. As far as I understand, ceph is a distributed file system and that it ensures high fault tolerance by using replication. Read here:

Ceph aims primarily to be completely distributed without a single point of failure, scalable to the exabyte level, and freely available.

ceph's power is it's scalability and high availability:

Scalability and High Availability

In traditional architectures, clients talk to a centralized component (e.g., a gateway, broker, API, facade, etc.), which acts as a single point of entry to a complex subsystem. This imposes a limit to both performance and scalability, while introducing a single point of failure (i.e., if the centralized component goes down, the whole system goes down, too).

Ceph eliminates the centralized gateway to enable clients to interact with Ceph OSD Daemons directly. Ceph OSD Daemons create object replicas on other Ceph Nodes to ensure data safety and high availability. Ceph also uses a cluster of monitors to ensure high availability. To eliminate centralization, Ceph uses an algorithm called CRUSH.

What I'm trying to point out is that, ceph is made to take care of the physical disk's usage in a cluster environment in a way to ensure more resilience, high availability and transparency. Not quiet what you are looking for.

If you are worried about performance or disk I/O, there is this option called Primary Affinity, which can be employed, for example to prioritze SAAS disks over SATA. Read more here and here.

Primary Affinity

When a Ceph Client reads or writes data, it always contacts the primary OSD in the acting set. For set [2, 3, 4], osd.2 is the primary. Sometimes an OSD isn’t well suited to act as a primary compared to other OSDs (e.g., it has a slow disk or a slow controller). To prevent performance bottlenecks (especially on read operations) while maximizing utilization of your hardware, you can set a Ceph OSD’s primary affinity so that CRUSH is less likely to use the OSD as a primary in an acting set.

ceph osd primary-affinity <osd-id> <weight>

Primary affinity is 1 by default (i.e., an OSD may act as a primary). You may set the OSD primary range from 0-1, where 0 means that the OSD may NOT be used as a primary and 1 means that an OSD may be used as a primary. When the weight is < 1, it is less likely that CRUSH will select the Ceph OSD Daemon to act as a primary.

I know this doesn't exactly answer all your questions, but may be provide some food for thought.

See details here: http://docs.ceph.com/docs/master/rados/operations/crush-map/#primary-affinity

And here is nice blog explaining the ceph cluster.

Make ceph minimize spread of file parts over OSDs

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?