Ping a Specific Port

Question

lepe

Asked: 2021-05-23 01:08:28 +0800 CST2021-05-23 01:08:28 +0800 CST 2021-05-23 01:08:28 +0800 CST

How to match Japanese in spamassassin?

772

I live in Japan. Recently there has been a lot of spam coming from China with messages written in Chinese. As spamassassin does not contain rules for Chinese, most of those emails pass with low score.

I would like to identify when an email is written in Chinese only. As most of the Japanese kanjis are included in the Chinese range (U+E400 to U+E9FF) one way to identify Japanese is to look at the Hiragana (U+3040 to U+309F) and the Katakana (U+30A0 to U+30FF). If it contains either Hiragana or Katakana I can safely assume is Japanese, otherwise is Chinese.

If I test individual characters, for example: あ or ア they match correctly, but when I use ranges it doesn't work. This is what we have tried:

body    CHINESE       /[\xe4-\xe9]/                 <--- this form seems to work fine
body    JAPANESE      /[\x30-\x31]/                 <--- not sure what is actually matching
body    JAPANESE      /(あ|え)/                      <---- this matches single character just fine
body    JAPANESE      /[あ-ん]/                      <--- doesn't work
body    JAPANESE      /[U+3040-U+30FF]/              <--- doesn't work
body    JAPANESE      /[\xe3\x81\x81-\xe3\x82\x96]/  <--- doesn't work
body    JAPANESE      /[\x{3040}-\x{30FF}]/          <--- doesn't work

I really don't know anymore what am I doing. I know some of the above make no sense...

What is the correct way to specify those ranges?

1 Answers

Voted

AnFi · Answer 1 · 2021-05-23T07:31:07+08:00

Best Answer

AnFi

2021-05-23T07:31:07+08:002021-05-23T07:31:07+08:00

Have you tried to use Mail::SpamAssassin::Plugin::TextCat (language detector)?
IMHO You should consider/evaluate it first.

header LANGUAGE_ZH X-Languages =~ /\b(?:zh)\b/
describe LANGUAGE_ZH Chinese language
score LANGUAGE_ZH 1.0

header LANGUAGE_JA X-Languages =~ /\b(?:ja)\b/
describe LANGUAGE_JA Japanese language
score LANGUAGE_JA -0.1

You can modify it to match "only one language detected/guessed" or some mixes of languages.

WARNING: Make sure the plugin is loaded by your SpamAssassin configuration.
It is configured in /etc/spamassassin/v310.pre file on Debian Linux.

5

How to match Japanese in spamassassin?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?