Following code is to extract /support/security/*.html links from a file(urlfile contain about 1000 links) to urlsort file using regex,But i'm weak in regex can anyone show me how to do that...?
#!/usr/bin/env python
import re,sys
fileHandle = open('urlfile', 'r')
f1 = open('urlsort', 'w')
for line in fileHandle.readlines():
links = re.findall(r"(\/support\/security\/*.html.*?)", line)
for link in links:
sys.stdout = f1
print ('%s' % (link[0]))
sys.stdout = sys.__stdout__
f1.close()
fileHandle.close()
Your regex has two mistakes, a missing
.
before the first*
and an extra?
near the end.Here is some code that writes urls matching your pattern to
urlsort
using some python idioms.