I have a ceph-rgw installation with a large bucket (~60M objects) and 16 osds, the bucket index is sharded into 997 shards. In this environment single directory listing takes more than 30 seconds:
$ time rclone lsd t:bucket/non/existent/path/ --contimeout=1h --timeout=1h
real 0m34.816s
This is very annoying, and may clients (e.g. rclone itself) do a list-dir op before a PUT to check/verify something. (Preventing clients from a sending list_objects/list_bucket is not a good option)
The log of rgw daemon
is normal. Part of log is:
08:57:45.267+0000 7f0492db2700 1 ====== starting new request req=0x7f05039a9620 =====
08:57:45.267+0000 7f0492db2700 20 req 412648 0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/bucket
08:57:45.267+0000 7f0492db2700 10 req 412648 0.000000000s canonical request = GET
08:57:45.267+0000 7f0492db2700 2 req 412648 0.000000000s s3:list_bucket verifying op params
08:57:45.267+0000 7f0492db2700 2 req 412648 0.000000000s s3:list_bucket pre-executing
08:57:45.267+0000 7f0492db2700 2 req 412648 0.000000000s s3:list_bucket executing
08:57:45.267+0000 7f0492db2700 20 req 412648 0.000000000s s3:list_bucket RGWRados::Bucket::List::list_objects_ordered starting attempt 1
08:57:45.267+0000 7f0492db2700 10 req 412648 0.000000000s s3:list_bucket RGWRados::cls_bucket_list_ordered: :bucket[e6fb9c7c-74a2-4819-a0ed-e740d4eb590c.4751590.1]) start_after="[]", prefix="/non/existent/path/" num_entries=1001, list_versions=0, expansion_factor=1
08:57:45.271+0000 7f0492db2700 10 req 412648 0.004000000s s3:list_bucket RGWRados::cls_bucket_list_ordered request from each of 997 shard(s) for 8 entries to get 1001 total entries
08:58:07.495+0000 7f04efe6c700 10 librados: Objecter returned from call r=0
08:58:08.779+0000 7f04cd627700 4 rgw rados thread: no peers, exiting
08:58:18.803+0000 7f0492db2700 2 req 412648 33.535980225s s3:list_bucket completing
08:58:18.803+0000 7f047bd84700 2 req 412648 33.535980225s s3:list_bucket op status=0
08:58:18.803+0000 7f047bd84700 2 req 412648 33.535980225s s3:list_bucket http status=200
08:58:18.803+0000 7f047bd84700 1 ====== req done req=0x7f05039a9620 op status=0 http_status=200 latency=33.535980225s ======
08:58:18.803+0000 7f047bd84700 1 beast: 0x7f05039a9620: 192.168.1.1 - rgwuser [10/Nov/2021:08:57:45.267 +0000] "GET /bucket?delimiter=%!F(MISSING)&max-keys=1000&prefix=non%!F(MISSING)existent%!F(MISSING)path%!F(MISSING) HTTP/1.1" 200 413 - "rclone/v1.57.0" - latency=33.535980225s
The environment detail is: Ceph Version: 16.2.5 Installed with rook, Each OSD is about ~4T with a 256G SSD Metadata device.