Ping a Specific Port

Question

Mike B

Asked: 2012-10-22 18:21:59 +0800 CST2012-10-22 18:21:59 +0800 CST 2012-10-22 18:21:59 +0800 CST

How can I easily convert HTML special entities from a standard input stream in Linux?

772

CentOS

Is there an easy way to convert HTML special entities from a data stream? I'm passing data to a bash script and sometimes that data includes special entities. For example:

"test" & test $test ! test @ # $ % ^ & *

I'm not sure why some characters show up fine and other don't but unfortunately, I don't have control over the data coming in.

I'm thinking I might be able to use SED here but that seems like it would be cumbersome and possibly prone to false positives. Is there a Linux command I could pipe to that specializes in decoding this type of data?

6 Answers

Voted

Jason Tan · Answer 1 · 2012-10-22T21:51:48+08:00

Jason Tan

2012-10-22T21:51:48+08:002012-10-22T21:51:48+08:00

Perl is (as always) your friend. I think this will do it:

perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'

E.g.:

echo '"test" &amp; test $test ! test @ # $ % ^ &amp; *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'

With output:

someguy@somehost ~]$ echo '"test" &amp; test $test ! test @ # $ % ^ &amp; *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'
"test" & test $test ! test @ # $ % ^ & *

14

Michael Hampton · Answer 2 · 2012-10-22T18:33:16+08:00

Best Answer

Michael Hampton

2012-10-22T18:33:16+08:002012-10-22T18:33:16+08:00

PHP is well suited to this. This example requires PHP 5:

cat file.html | php -R 'echo html_entity_decode($argn);'

10

Skippy le Grand Gourou · Answer 3 · 2017-04-04T01:57:29+08:00

Skippy le Grand Gourou

2017-04-04T01:57:29+08:002017-04-04T01:57:29+08:00

recode seems available on default packages repositories of main GNU/Linux distributions. E.g. to decode HTML entities into UTF-8 :

…|recode html..utf8

10

ariddell · Answer 4 · 2018-03-28T04:59:51+08:00

ariddell

2018-03-28T04:59:51+08:002018-03-28T04:59:51+08:00

With Python 3:

python3 -c 'import html,sys; print(html.unescape(sys.stdin.read()), end="")' < file.html

5

bobom · Answer 5 · 2013-10-05T10:42:18+08:00

bobom

2013-10-05T10:42:18+08:002013-10-05T10:42:18+08:00

Takes text file from stdin:

#!/bin/bash
#
while read lin; do
  newl=${lin//&gt;/>}
  newl=${newl//&lt;/<}
  newl=${newl//&amp;/<}
  # ...other entites
  echo "$newl"
done

It probably needs bash >= version 4

0

HappyFace · Answer 6 · 2020-08-02T14:25:20+08:00

HappyFace

2020-08-02T14:25:20+08:002020-08-02T14:25:20+08:00

I use this script. Save it as html2utf.py, and use it ala echo $some_html | html2utf.py.

#!/usr/bin/env python3
"""
An alternative for `perl -Mopen=locale -MHTML::Entities -pe '$_ = decode_entities($_)'` (which you can use by `cpanm HTML::Entities`) and `recode html..`.
"""

import fileinput
import html

for line in fileinput.input():
    print(html.unescape(line.rstrip('\n')))

0

How can I easily convert HTML special entities from a standard input stream in Linux?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?