onsfhg
V2EX  ›  问与答

帮我看一下 python3 爬虫代码哪里出错了?提示编码错误。

  •  
  •   onsfhg · May 16, 2015 · 2960 views
    This topic created in 4040 days ago, the information mentioned may be changed or developed.
    #coding=utf-8
    import requests
    from bs4 import BeautifulSoup
    page = 1
    url = 'http://www.v2ex.com/recent?p=' + str(page)
    print(url)
    response = requests.get(url)
    soup = BeautifulSoup(response.text)
    print(soup.title.text)


    Traceback (most recent call last):
    File "lasttest.py", line 9, in <module>
    print(soup.title.text)
    UnicodeEncodeError: 'gbk' codec can' encode character
    8 replies    2015-05-16 21:11:24 +08:00
    billlee
        1
    billlee  
       May 16, 2015
    print(soup.title.text.decode('编码方案'))
    omengye
        2
    omengye  
       May 16, 2015
    soup = BeautifulSoup(response.text.encode('utf-8'))
    onsfhg
        3
    onsfhg  
    OP
       May 16, 2015
    @billlee @omengye 我之前也是试过,都不行
    omengye
        4
    omengye  
       May 16, 2015
    @onsfhg 用你的代码在pycharm里是没有错误的
    onsfhg
        5
    onsfhg  
    OP
       May 16, 2015
    奇怪,运行出错,修改一下在 python2.7上没出错
    omengye
        6
    omengye  
       May 16, 2015
    @onsfhg 这应该是一个输出的问题。如果你是在windows cmd里尝试输出utf-8,需要先将cmd代码页设置为utf-8
    imn1
        7
    imn1  
       May 16, 2015
    cmd + enter
    chcp 65001 + enter
    run...
    killpanda
        8
    killpanda  
       May 16, 2015 via iPhone
    soup = BeautifulSoup(response.text, from_encoding='foo')
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   4207 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 67ms · UTC 00:59 · PVG 08:59 · LAX 17:59 · JFK 20:59
    ♥ Do have faith in what you're doing.