网站爬虫限制默认在心中
robots.txt
爬一个网站怎么预测爬的量
每个网站都使用各种各样的技术,怎么确定网站使用的技术
pip install builtwith
>>> import builtwith
>>> builtwith.parse('http://www.douban.com')
{u'javascript-frameworks': [u'jQuery'], u'tag-managers': [u'Google Tag Manager'], u'analytics': [u'Piwik']}
#网站的所属者 pip install python-whois >>> print whois.whois('cnblogs.com') { "updated_date": [ "2014-11-12 00:00:00", "2014-11-12 01:07:15" ], "status": [ "clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited", "clientTransferProhibited https://icann.org/epp#clientTransferProhibited" ], "name": "du yong", "dnssec": "unsigned", "city": "Shanghai", "expiration_date": [ "2021-11-12 00:00:00", "2021-11-11 04:00:00" ], "zipcode": "201203", "domain_name": [ "CNBLOGS.COM", "cnblogs.com" ], "country": "CN", "whois_server": "whois.35.com", "state": "Shanghai", "registrar": "35 Technology Co., Ltd.", "referral_url": "http://www.35.com", "address": "Room 312, No.22 BOXIA Rd, Pudong New District", "name_servers": [ "NS3.DNSV4.COM", "NS4.DNSV4.COM", "ns3.dnsv4.com", "ns4.dnsv4.com" ], "org": "Shanghai Yucheng Information Technology Co. Ltd.", "creation_date": [ "2003-11-12 00:00:00", "2003-11-11 04:00:00" ], "emails": [ "abuse@35.cn", "dudu.yz@gmail.com" ] }
另外有需要云服务器可以了解下创新互联scvps.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。