본문 바로가기

Python/실습

[웹크롤링] 교보문고 베스트셀러 Top20 크롤링하기

requests, beautifulsoup4 설치

$ pip install requests beautifulsoup4

코드

from urllib.request import urlopen
from bs4 import BeautifulSoup as bs

html = urlopen("http://www.kyobobook.co.kr/bestSellerNew/bestseller.laf") # 교보문고 베스트셀러

bsObject = bs(html, "html.parser") 
week_standard = bsObject.find('h4', {'class':'title_best_basic'}).find('small').text # 집계기준 날짜
bestseller_contents = bsObject.find('ul', {'class':'list_type01'})
bestseller_list = bestseller_contents.findAll('div', {'class':'detail'})
title_list = [b.find('div', {'class': 'title'}).find('strong').text for b in bestseller_list] # 제목
subtitle_list = [b.find('div', {'class': 'subtitle'}).text.strip() for b in bestseller_list] # 책 설명

print("\n"+week_standard+"\n\n")
for i in range(len(title_list)):
    print("Title: "+title_list[i])
    print("Description: "+subtitle_list[i]+"\n")

결과