行业新闻

Python大法之告别脚本小子系列---各类URL采集器编写

Python大法之告别脚本小子系列---各类URL采集器编写

本文作者:i春秋签约作家——阿甫哥哥


系列文章专辑:

0x04 简易BaiduURL采集脚本编写
先是爬去单页的URL,举个栗子是爬去阿甫哥哥这个关键字的URL

#-*- coding: UTF-8 -*-
import requests
from bs4 import BeautifulSoup as bs
import re
def getfromBaidu(word):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'}
    url = 'https://www.baidu.com.cn/s?wd=' + word + ' WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'}
    for k in range(0,(pageout-1)*10,10):
            url = 'https://www.baidu.com.cn/s?wd=' + word + ' WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'}
    r = requests.get(url=url,cookies=cookie,headers=headers)
    rows = re.findall(r'input type=\"hidden\" name=\"formhash\" value=\"(.*?)\" />', r.content)
    if len(rows)!=0:
        formhash = rows[0]
        print '[-]Formhash is: ' + formhash
    else:
        print '[-]None formhash!'
    if '您今天已经签到过了或者签到时间还未开始' in r.text:
        print '[-]Already signed!!'
    else:
        sign_url = 'https://bbs.ichunqiu.com/plugin.php?id=dsu_paulsign:sign&operation=qiandao&infloat=1&inajax=1'
        sign_payload = {
        'formhash':formhash,
        'qdxq':'fd',
        'qdmode':'2',
        'todaysay':'',
        'fastreply':0,
        }
        sign_req = requests.post(url=sign_url,data=sign_payload,headers=headers,cookies=cookie)
        if '签到成功' in sign_req.text:
            print '[-]Sign success!!'
        else:
            print '[-]Something error...'
    time.sleep(60)
def main(h=0, m=0):
    while True:
        while True:
            now = datetime.datetime.now()
            if now.hour==h and now.minute==m:
                break
            time.sleep(20)
        sign()
if __name__ == '__main__':
    main()


>>>>>>  黑客入门必备技能  带你入坑和逗比表哥们一起聊聊黑客的事儿,他们说高精尖的技术比农药都好玩~

关闭