Ping a Specific Port

Question

Zulakis

Asked: 2024-07-03 19:59:01 +0800 CST2024-07-03 19:59:01 +0800 CST 2024-07-03 19:59:01 +0800 CST

How does sort compare strings?

772

I would expect bash sort to compare strings like this:

Start at the first char (of both strings)
If the chars are equal, proceed to the next char
If they are unequal, return greater/lesser result to sort algorithm
If there are no more chars, return equals

For some reason, it seems like this is not a case.

Let's take the following input:

a
b
.
-

This is sorted by bash sort as

-
.
a
b

Now, for input

b.de
bb.de

I would expect the following sort result:

b.de
bb.de

Because the first char is equal, and for the second char, . comes before b (as seen in the first test).

For some reason, this is not the case, the strings are sorted like this:

bb.de
b.de

Why is sort behaving this way, and is there a way to make it behave "as expected"?

I have tested the same examples with python, and python sorts as expected.

2 Answers

Voted

Jasen · Answer 1 · 2024-07-03T21:07:59+08:00

Best Answer

Jasen

2024-07-03T21:07:59+08:002024-07-03T21:07:59+08:00

Sort by default does a locale aware sort which uses the lexicographical rules for your locale. see strcoll(3)

ltrace(3) got me this:

strcoll("b.de", "bb.de") = 20

locate-aware comparisons seem to split strings into words and sort on that. as words nver start with '.' sort sees a 0 lenfgh words and puts that at the start of the list. however '.' is alloerd in wordd eg: "Jr." "Ph.D"

if you require a byte-wise comparison instead export LC_COLLATE=C or LC_COLLATE=POSIX

3

Matthew Ife · Answer 2 · 2024-07-03T20:22:59+08:00

Matthew Ife

2024-07-03T20:22:59+08:002024-07-03T20:22:59+08:00

I checked the coreutils package and if you dont provide any arguments, it looks as if it (eventually) uses the C strcmp routine. The only case that isn't true is where the values in lines can be interpreted as integers.

The man page of which says:

In glibc, as in most other implementations, the return value is the arithmetic result of subtracting the last compared byte in s2 from the last compared byte in s1. (If the two characters are equal, this difference is 0.)

This means that the strcmp of bb.de and b.de really is down to the last character.

That is if 'd' < 'e' which (in ascii at least) would be if 100 < 101 which is true.

-1

How does sort compare strings?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?