1. 텍스트 크롤링
가. beautiful soup4 라이브러리 설치
pip install bs4
나. 테스트 코드
from bs4 import BeautifulSoup from urllib.request import urlopen with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response: soup = BeautifulSoup(response, 'html.parser') for anchor in soup.find_all('a'): print(anchor.get('href', '/')) response2 = urlopen('https://en.wikipedia.org/wiki/Main_Page') soup2 = BeautifulSoup(response2, 'html.parser') i = 1 for anchor2 in soup2.select('span.ah_k'): print(str(i) + "위 : " + anchor2.get_text() + '\n') i = i + 1
다. 실행
python index.py
2. 이미지 크롤링
가. google image download 라이브러리 설치
pip install google_images_download --use-feature=2020-resolve
또는
$ git clone https://github.com/hardikvasa/google-images-download.git
$ cd google-images-download && sudo python setup.py install
나. 테스트 코드
from google_images_download import google_images_download response = google_images_download.googleimagesdownload() arguments = {"keywords":"장원영, 안유진","limit":20,"print_urls":True} paths = response.download(arguments) print(paths)
다. 실행
python google.py