python检查URL是否能正常访问的脚本
python检查URL访问状态的思路:
1.首先2000个URL。可以放在一个txt文本内
2.通过python 把内容内的URL一条一条放进数组内
3.打开一个模拟的浏览器,进行访问。
4.如果正常访问就输出正常,错误就输出错误
直接简单粗暴甩代码。因为涉及到隐私,图片打了码
脚本一
import urllib.request import time opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/49.0.2')] #这个是你放网址的文件名,改过来就可以了 file = open('test.txt') lines = file.readlines() aa=[] for line in lines: temp=line.replace('\n','') aa.append(temp) print(aa) print('开始检查:') count= 0 # 计算txt中网站的数量 newfile = open("URL_open.txt","a") for a in aa: tempUrl = a try : opener.open(tempUrl) print(tempUrl+'没问题') newfile.write(a+"\n") except urllib.error.HTTPError: print(tempUrl+'=访问页面出错') time.sleep(2) except urllib.error.URLError: print(tempUrl+'=访问页面出错') time.sleep(2) time.sleep(0.1) newfile.close()
脚本二
import requests import sys f = open('url2.txt', 'r') url = f.readlines() length = len(url) url_result_success=[] url_result_failed=[] for i in range(0,length): try: response = requests.get(url[i].strip(), verify=False, allow_redirects=True, timeout=5) if response.status_code != 200: raise requests.RequestException(u"Status code error: {}".format(response.status_code)) except requests.RequestException as e: url_result_failed.append(url[i]) continue url_result_success.append(url[i]) f.close() result_len = len(url_result_success) for i in range(0,result_len): print('网址:%s' % url_result_success[i].strip()+' -> 200 ok')
出现 HTTPConnectionPool(host='xx.xx.xx.xx', port=xx): Max retries exceeded with url:(Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000015A25025EB8>...)) 连接超时的错误解决
错误原因:http连接太多没有关闭导致
解决办法:
1、增加重试连接次数
requests.adapters.DEFAULT_RETRIES = 5
2、关闭多余的连接
requests使用了urllib3库,默认的http connection是keep-alive的,requests设置False关闭。
s = requests.session()
s.keep_alive = False