robots.txt里限制了爬虫的时间段和频次,百度不管不顾
sudo cat access.log |grep 'http://www.baidu.com/search/spider.html' | awk '{print $1}' | awk -F'.' '{print $1"."$2"."$3"."$4}'
一早上 5个ip段 300多ip死劲爬啊 - -
123.125.71.0
220.181.108.0
180.76.15.0
220.181.38.0
183.60.243.0
列举部分IP
220.181.108.103
220.181.108.107
220.181.108.97
220.181.108.85
220.181.108.120
220.181.108.102
220.181.108.87
220.181.108.83
220.181.108.77
220.181.108.93
220.181.108.109
220.181.108.90
220.181.108.119
220.181.108.81
220.181.108.104
220.181.108.91
220.181.108.114
220.181.108.99
220.181.108.108
220.181.108.92
220.181.108.101
123.125.71.91
123.125.71.81
123.125.71.110
123.125.71.115
123.125.71.107
123.125.71.108
123.125.71.96
123.125.71.80
123.125.71.94
123.125.71.89
123.125.71.95
123.125.71.98
123.125.71.111
123.125.71.101
123.125.71.88
123.125.71.103
123.125.71.97
123.125.71.113
180.76.15.149
180.76.15.137
180.76.15.159
180.76.15.140
180.76.15.150
180.76.15.136
180.76.15.155
180.76.15.152
180.76.15.160
180.76.15.163
180.76.15.157
180.76.15.158
180.76.15.134
180.76.15.161
180.76.15.151
180.76.15.142
180.76.15.145