How can I use docker without sudo?

Question

Seyed Mohammad

Asked: 2015-04-15 03:38:44 +0800 CST2015-04-15 03:38:44 +0800 CST 2015-04-15 03:38:44 +0800 CST

Convert Text File Encoding

772

I frequently encounter text files (such as subtitle files in my native language, Persian) with character encoding problems. These files are created on Windows, and saved with an unsuitable encoding (seems to be ANSI), which looks gibberish and unreadable, like this:

enter image description here

In Windows, one can fix this easily using Notepad++ to convert the encoding to UTF-8, like below:

enter image description here

And the correct readable result is like this:

enter image description here

I've searched a lot for a similar solution on GNU/Linux, but unfortunately the suggested solutions (e.g this question) don't work. Most of all, I've seen people suggest iconv and recode but I have had no luck with these tools. I've tested many commands, including the followings, and all have failed:

$ recode ISO-8859-15..UTF8 file.txt
$ iconv -f ISO8859-15 -t UTF-8 file.txt > out.txt
$ iconv -f WINDOWS-1252 -t UTF-8 file.txt > out.txt

None of these worked!

I'm using Ubuntu-14.04 and I'm looking for a simple solution (either GUI or CLI) that works just as Notepad++ does.

One important aspect of being "simple" is that the user is not required to determine the source encoding; rather the source encoding should be automatically detected by the tool and only the target encoding should be provided by the user. But nevertheless, I will also be glad to know about a working solution that requires the source encoding to be provided.

If someone needs a test-case to examine different solutions, the above example is accessible via this link.

8 Answers

Voted

Incnis Mrsi · Answer 1 · 2015-08-19T05:38:02+08:00

These Windows files with Persian text are encoded in Windows-1256. So it can be deciphered by command similar to OP tried, but with different charsets. Namely:

~~recode Windows-1256..UTF-8 <Windows_file.txt > UTF8_file.txt~~
(denounced upon original poster’s complaints; see comments)

iconv -f Windows-1256 Windows_file.txt > UTF8_file.txt

This one assumes that the LANG environment variable is set to a UTF-8 locale. To convert to any encoding (UTF-8 or otherwise), regardless of the current locale, one can say:

iconv -f Windows-1256 Windows_file.txt -t ${output_encoding} > ${output_file}

Original poster is also confused with semantic of text recoding tools (recode, iconv). For source encoding (source.. or -f) one must specify encoding with which the file is saved (by the program that created it). Not some (naïve) guesses based on mojibake characters in programs that try (but fail) to read it. Trying either ISO-8859-15 or WINDOWS-1252 for a Persian text was obviously an impasse: these encodings merely do not contain any Persian letter.

Seyed Mohammad · Answer 2 · 2015-10-06T08:02:54+08:00

Best Answer

Seyed Mohammad

2015-10-06T08:02:54+08:002015-10-06T08:02:54+08:00

The working solution I found is using the Microsoft Visual Studio Code text editor which is Freeware and available for Linux.

Open the file you want to convert its encoding in VS-Code. At the bottom of the window, there are a few buttons. One of them is related to the file encoding, as shown below:

Clicking this button pops up an overhead menu which includes two items. From this menu select the "Reopen with Encoding" option, just like below:

This will open another menu which includes a list of different encoding, as shown below. Now select "Arabic (Windows 1256)":

This will fix the gibberish text like this:

Now click the encoding button again and this time select the "Save with Encoding" option, just as below:

And in the new menu select the "UTF-8" option:

This will save the corrected file using the UTF-8 encoding:

Done! :)

3

Ken Mollerup · Answer 3 · 2015-04-15T10:02:51+08:00

Ken Mollerup

2015-04-15T10:02:51+08:002015-04-15T10:02:51+08:00

I don't know if this works with Farsi: I use Gedit, it gives a fault with wrong encoding, and I can chose what I want to translate to UTF-8, it was just text not lit format, but here is a screenshot!

enter image description here

Sorry I finally got through my text files, so now they are all converted.

I loved notepad++ too, miss it still.

2

Giorgos_tph · Answer 4 · 2017-04-15T16:12:30+08:00

Giorgos_tph

2017-04-15T16:12:30+08:002017-04-15T16:12:30+08:00

Apart from iconv, which is a very useful tool either on its own or in a script, there is a really simple solution I found trying to figure out same problem for Greek charsets (Windows-1253 + ISO-8859-7).

All you need to do is to open the text file through Gedit's "Open" dialog and not by double-clicking it. At the bottom of the dialog box there is a drop-down for Encoding, which is set to "Automatically Detected". Change it to "Windows-125x" or other suitable codeset and the text will be perfectly readable in Gedit. You can then save it using UTF-8 encoding, just to be sure you won't have the same issue again in the future...

2

Seyed Mohammad · Answer 5 · 2015-08-21T04:02:16+08:00

Seyed Mohammad

2015-08-21T04:02:16+08:002015-08-21T04:02:16+08:00

As a complementary solution to the problem, I have prepared a useful Bash script based on the iconv command from Incnis Mrsi's answer:

#!/bin/bash

if [ $# -lt 1 ]
then
   echo 'Specify at least one file to fix.'
   exit 1
fi

# Temp file to store conversion attempt(s).
tmp='tmp.fixed'

for file in "$@"
do
  # Try to fix the file encoding.
  if iconv -f WINDOWS-1256 "$file" -t UTF-8 > $tmp; then
    echo "Fixed: '$file'"
    cat $tmp > "$file"
  else
    echo "Failed to fix: '$file'"
  fi
done
rm $tmp

Save this script as fix-encoding.sh, give it execute permission using chmod +x fix-encoding.sh and use it like this:

./fix-encoding.sh myfile.txt my2ndfile.srt my3rdfile.sub

This script will try to fix the encoding of any number of files it is provided as input. Note that the files will be fixed in-place, so the contents will be overwritten.

1

Christos · Answer 6 · 2017-04-21T13:36:46+08:00

Christos

2017-04-21T13:36:46+08:002017-04-21T13:36:46+08:00

If you like working in GUI instead of CLI, like I do:

Open file with Geany (editor)
Go to File menu -> Reload as
Choose the assumed encoding to change the gibberish into identifiable characters in your language. For example, to read Greek subs I would reload as West European -> Greek (Windows-1253)
Go to Document menu > Set Encoding -> Unicode -> UTF-8
Save

1

muru · Answer 7 · 2021-04-09T07:28:16+08:00

muru

2021-04-09T07:28:16+08:002021-04-09T07:28:16+08:00

You can use Vim to do the encoding conversion:

vim '+set fileencoding=utf-8' '+wq' file.txt

But this depends on Vim detecting the original encoding correctly. To make it use the correct one if it doesn't, you can do something like:

vim '+e ++enc=cp1256 file.txt | set fileencoding=utf-8 | wq'

Or, to save to a different file instead of doing it in place:

vim '+e ++enc=cp1256 file.txt | w ++enc=utf-8 file-utf.txt | q'

1

Amir Mahdavi · Answer 8 · 2021-02-09T04:13:34+08:00

Amir Mahdavi

2021-02-09T04:13:34+08:002021-02-09T04:13:34+08:00

I figured out it in manjaro with gaupol and work perfect but you must do it one by one and don't have batch mode

https://github.com/otsaloma/gaupol https://pkgs.org/download/gaupol

Just open a file (no matter source encoding) Save As (Shift + Ctrl + S) In opened window, change Encoding to UTF-8 Hit Save and finished

0

Convert Text File Encoding

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?