I am considering an option of ceph as distributed filesystem for my home-made MAID (massive array of idle drives).
As far as I understand, Ceph oriented for cluster use and spread data evenly over OSDs (with respect to CRUSH maps) and tries to utilize parallelism of read operations over different nodes.
In my case I don't need to maximize spread and throughput, in ideal case it should fill first N OSDs (where N is replication factor) and only then start filling next N OSDs, to minimize amount of required active drives for adjacent data retrieval.
Can I somehow achieve such behaviour by tweaking placement groups count and CRUSH maps? Or if it is not possible can I at least make ceph stop splitting files into more than one block?
I don't think something similar to what you want to achieve is possible with ceph. As far as I understand, ceph is a distributed file system and that it ensures high fault tolerance by using replication. Read here:
Ceph aims primarily to be completely distributed without a single point of failure, scalable to the exabyte level, and freely available.
ceph's power is it's scalability and high availability:
What I'm trying to point out is that, ceph is made to take care of the physical disk's usage in a cluster environment in a way to ensure more resilience, high availability and transparency. Not quiet what you are looking for.
If you are worried about performance or disk I/O, there is this option called Primary Affinity, which can be employed, for example to prioritze SAAS disks over SATA. Read more here and here.
I know this doesn't exactly answer all your questions, but may be provide some food for thought.
See details here: http://docs.ceph.com/docs/master/rados/operations/crush-map/#primary-affinity
And here is nice blog explaining the ceph cluster.