크롤링(crawling)

1. 텍스트 크롤링

가. beautiful soup4 라이브러리 설치

pip install bs4

 

나. 테스트 코드

from bs4 import BeautifulSoup
from urllib.request import urlopen

with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
soup = BeautifulSoup(response, 'html.parser')
for anchor in soup.find_all('a'):
  print(anchor.get('href', '/'))

response2 = urlopen('https://en.wikipedia.org/wiki/Main_Page')  
soup2 = BeautifulSoup(response2, 'html.parser')
i = 1
for anchor2 in soup2.select('span.ah_k'):
  print(str(i) + "위 : " + anchor2.get_text() + '\n')
  i = i + 1

 

다. 실행

python index.py

 

2. 이미지 크롤링

가. google image download 라이브러리 설치

pip install google_images_download --use-feature=2020-resolve

또는

$ git clone https://github.com/hardikvasa/google-images-download.git
$ cd google-images-download && sudo python setup.py install

 

나. 테스트 코드

from google_images_download import google_images_download

response = google_images_download.googleimagesdownload()

arguments = {"keywords":"장원영, 안유진","limit":20,"print_urls":True}
paths = response.download(arguments)
print(paths)

 

다. 실행

python google.py