How can I use docker without sudo?

Question

Peter Schramm

Asked: 2020-11-04 02:09:22 +0800 CST2020-11-04 02:09:22 +0800 CST 2020-11-04 02:09:22 +0800 CST

Why is character "£" in a string interpreted strange in the command cut?

772

I'm developing a bash script and came up with the following strange behaviour!

$ echo £ |cut -c 1
�

The sign £ is passed to the next command cut whose filter is picking one character only.

When I modify the filter in the cut command to pick 2 characters, then the £ is passed through!

$ echo £ |cut -c 1-2
£

Not a severe problem, I have a workaround solution in the script, but why does the filter in the cut command require 2 positions instead of 1 when picking a £ sign?

2 Answers

Voted

FedKad · Answer 1 · 2020-11-04T02:46:16+08:00

FedKad

2020-11-04T02:46:16+08:002020-11-04T02:46:16+08:00

The cut command in Ubuntu is not multi-byte character aware. Characters are the same as bytes for this version of the cut command.

The pound sign (£) is a UTF-8 character that consists of two bytes (c2 and a3):

$ echo £ | od -t x1
0000000 c2 a3 0a
0000003

Note: The 0a character is the "New Line" (ASCII "Line Feed" character).

When you cut the first character from the line, you are selecting only the c2 part of £, and this is not a valid UTF-8 character. As a result you get the strange question mark � (the replacement character) on screen:

$ echo £ | cut -c 1 | od -t x1
0000000 c2 0a
0000002

Note: The above was tested with the latest version of cut in Ubuntu 20.10 (GNU coreutils version 8.32).

If you want to select multi-byte characters, you can use the grep (GNU grep version 3.4) command like this:

$ echo x£β | grep -o '^.'
x
$ echo x£β | grep -o '^..'
x£
$ echo x£β | grep -o '^...'
x£β

_{This answer was improved with the help of the comments.}

43

Ravexina · Answer 2 · 2020-11-04T02:47:24+08:00

Ravexina

2020-11-04T02:47:24+08:002020-11-04T02:47:24+08:00

In UTF-8 encoding, the hex value of £ is 0xC2 0xA3 (c2a3) which is 11000010 10100011 in binary.

So it's two bytes (like two character). cut -c considers each byte a character which produces �.

$ echo -n £ | xxd
00000000: c2a3                                     ..

$ echo -n £ | wc --bytes
2

18

Why is character "£" in a string interpreted strange in the command cut?

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?