Ping a Specific Port

Question

Zareh Kasparian

Asked: 2022-02-13 11:40:35 +0800 CST2022-02-13 11:40:35 +0800 CST 2022-02-13 11:40:35 +0800 CST

multiple portion selection of a string in python

772

I have a log file as below:

12-02-2022 15:18:22 +0330 SOCK5.6699 00000 user144 97.251.107.125:38605 1.1.1.1:443 51766 169369 0 CONNECT 1.1.1.1:443
12-02-2022 15:18:27 +0330 SOCK5.6699 00094 user156 32.99.193.2:51242 1.1.1.1:443 715 388 0 CONNECT 1.1.1.1:443
12-02-2022 15:18:56 +0330 SOCK5.6699 00000 user105 191.184.66.98:40048 1.1.1.1:443 18105 29029 0 CONNECT 1.1.1.1:443
12-02-2022 15:18:56 +0330 SOCK5.6699 00000 user105 191.184.66.98:40070 1.1.1.1:443 674 26805 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:24 +0330 SOCK5.6699 00000 user143 112.199.63.119:60682 1.1.1.1:443 475 445 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:37 +0330 SOCK5.6699 00000 user105 191.184.66.98:40102 1.1.1.1:443 12913 18780 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:42 +0330 SOCK5.6699 00000 user143 112.199.63.119:60688 1.1.1.1:443 4530 34717 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:44 +0330 SOCK5.6699 00000 user127 212.167.145.49:2972 1.1.1.1:443 827 267 0 CONNECT 1.1.1.1:443

my goal is to extract two portions of this log file:

Username
IP address of the user source

below is a sample of the portions of data needed.

12-02-2022 15:18:22 +0330 SOCK5.6699 00000 user144 97.251.107.125:38605 1.1.1.1:443 51766 169369 0 CONNECT 1.1.1.1:443

So I wrote a Python script to extract both items and store them in separate lists and then joined them with zip function.

import pprint
import collections

iplist=[]
for l in data:
    ip_port=l[53:71]
    iplist.append(ip_port.split(':')[0])


userlist=[]
for u in data:
    user=u[42:52]
    userlist.append(user.replace(" ", ""))

a=list(zip(iplist,userlist))
most_ip=collections.Counter(a).most_common(5)
pprint.pprint(most_ip)

This code works fine, and I'm able to get the top used ip with its corresponding username. Also need to mention that I didn't use re module, since it was listing the second IP (destination IP which is 1.1.1.1- which I don't care about it)

Question: Is there any other way(more neat wey) than the way I've written the code?

2 Answers

Voted

Zareh Kasparian · Answer 1 · 2022-02-15T09:52:44+08:00

Zareh Kasparian

2022-02-15T09:52:44+08:002022-02-15T09:52:44+08:00

With the suggestion of "shearn89" I have edited my code as below:

much simpler with a single iteration.

userlist=[]
iplist=[]
for i in data:
    ip=i.split(' ')[6].split(':')[0]
    user=i.split(' ')[5]
    iplist.append(ip)
    userlist.append(user)

top_used=collections.Counter(zip(iplist,userlist)).most_common(5)
pprint.pprint(top_used)

1

Misc08 · Answer 2 · 2022-02-18T15:36:29+08:00

Best Answer

Misc08

2022-02-18T15:36:29+08:002022-02-18T15:36:29+08:00

There are many capabilities to optimize also your new code. The two things catching me most:

Do not execute split() more than once for each line of the log, just execute split() once and store the result in a variable, because each execution of this functions needs some time (even its not much, but will add up the more data you process).

s = i.split(' ')
ip=s[6].split(':')[0]
user=s[5]

Why creating two list and then zipping them together afterwards? Just store the tuples directly in a list:

l = []
for i in data:
   s = i.split(' ')
   ip=s[6].split(':')[0]
   user=s[5]
   l.append(tuple((ip, user)))
top_used=collections.Counter(l).most_common(5)

1

multiple portion selection of a string in python

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?