python 2.7 - Getting Inner Nested Tag Data with BeautifulSoup -
i want information in inner tag, keep returning empty. code:
import requests bs4 import beautifulsoup url = "http://www.krak.dk/cafe/s%c3%b8g.cs?consumer=suggest?search_word=cafe" r = requests.get(url) soup = beautifulsoup(r.content, 'html.parser') gendata = soup.find_all("ol", {"class": "hit-list"}) print gendata infox in gendata: print inforx.text
what missing?
the html broken, need different parser, can use lxml if have it:
soup = beautifulsoup(r.content, 'lxml')
or use html5lib:
soup = beautifulsoup(r.content, 'html5lib')
lxml has dependencies libxml, html5lib can installed pip.
in [9]: url = "http://www.krak.dk/cafe/s%c3%b8g.cs?consumer=suggest?search_word=cafe" in [10]: r = requests.get(url) in [11]: soup = beautifulsoup(r.content, 'html.parser') in [12]: len(soup.find_all("ol", {"class": "hit-list"}))out[12]: 0 in [13]: soup = beautifulsoup(r.content, 'lxml') in [14]: len(soup.find_all("ol", {"class": "hit-list"})) out[14]: 1 in [15]: soup = beautifulsoup(r.content, 'html5lib') in [16]: len(soup.find_all("ol", {"class": "hit-list"})) out[16]: 1
there 1 hit-list
can use find in place of find_all , can use use id soup.find(id="hit-list")
. if run html thorugh w3c's html validator can see there lots of issues.
Comments
Post a Comment