V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
ninestep
V2EX  ›  Python

python urllib2 代理访问 google 报错[Errno 10060]

  •  1
     
  •   ninestep · 2016-01-03 22:53:39 +08:00 · 21544 次点击
    这是一个创建于 3006 天前的主题,其中的信息可能已经有所发展或是发生改变。

    我想要抓取 google 搜索结果,设置全局代理可以抓取,但是在 urllib 中就会报错[Errno 10060]
    代理绝对可用,谁遇到过这种事情
    我的代码

    name=urllib.quote(wd)
    proxy ='127.0.0.1:8787'
    opener = urllib2.build_opener( urllib2.ProxyHandler({'socks':proxy}) )
    urllib2.install_opener( opener )
    url='https://www.google.co.jp/search?hl=en&q=intitle:%s+site:%s'%(name,url)
    # url='http://www.baidu.com/s?wd=intitle:%s+site:%s'%(name,url)
    request=urllib2.Request(url)
    user_agents = ['Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130406 Firefox/23.0', \
    'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0', \
    'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533+ \
    (KHTML, like Gecko) Element Browser 5.0', \
    'IBM WebExplorer /v0.94', 'Galaxy/1.0 [en] (Mac OS X 10.5.6; U; en)', \
    'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)', \
    'Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14', \
    'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) \
    Version/6.0 Mobile/10A5355d Safari/8536.25', \
    'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) \
    Chrome/28.0.1468.0 Safari/537.36', \
    'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; TheWorld)']
    index = random.randint(0, 9)
    user_agent = user_agents[index]
    request.add_header('User-Agent',user_agent)
    try:
    html=urllib2.urlopen(request,timeout=120)
    except urllib2.URLError, e:
    print(e)
    return False
    else:
    text=html.read()

    目前尚无回复
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   我们的愿景   ·   实用小工具   ·   3054 人在线   最高记录 6543   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 27ms · UTC 12:48 · PVG 20:48 · LAX 05:48 · JFK 08:48
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.