python检查URL是否能正常访问的脚本
python检查URL访问状态的思路:
1.首先2000个URL。可以放在一个txt文本内
2.通过python 把内容内的URL一条一条放进数组内
3.打开一个模拟的浏览器,进行访问。
4.如果正常访问就输出正常,错误就输出错误
直接简单粗暴甩代码。因为涉及到隐私,图片打了码
脚本一
import urllib.request
import time
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/49.0.2')]
#这个是你放网址的文件名,改过来就可以了
file = open('test.txt')
lines = file.readlines()
aa=[]
for line in lines:
temp=line.replace('\n','')
aa.append(temp)
print(aa)
print('开始检查:')
count= 0 # 计算txt中网站的数量
newfile = open("URL_open.txt","a")
for a in aa:
tempUrl = a
try :
opener.open(tempUrl)
print(tempUrl+'没问题')
newfile.write(a+"\n")
except urllib.error.HTTPError:
print(tempUrl+'=访问页面出错')
time.sleep(2)
except urllib.error.URLError:
print(tempUrl+'=访问页面出错')
time.sleep(2)
time.sleep(0.1)
newfile.close()脚本二
import requests
import sys
f = open('url2.txt', 'r')
url = f.readlines()
length = len(url)
url_result_success=[]
url_result_failed=[]
for i in range(0,length):
try:
response = requests.get(url[i].strip(), verify=False, allow_redirects=True, timeout=5)
if response.status_code != 200:
raise requests.RequestException(u"Status code error: {}".format(response.status_code))
except requests.RequestException as e:
url_result_failed.append(url[i])
continue
url_result_success.append(url[i])
f.close()
result_len = len(url_result_success)
for i in range(0,result_len):
print('网址:%s' % url_result_success[i].strip()+' -> 200 ok')出现 HTTPConnectionPool(host='xx.xx.xx.xx', port=xx): Max retries exceeded with url:(Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000015A25025EB8>...)) 连接超时的错误解决
错误原因:http连接太多没有关闭导致
解决办法:
1、增加重试连接次数
requests.adapters.DEFAULT_RETRIES = 5
2、关闭多余的连接
requests使用了urllib3库,默认的http connection是keep-alive的,requests设置False关闭。
s = requests.session()
s.keep_alive = False