web scraping

ÆÄÀ̽㠺äƼǮ¼öÇÁ¸¦ ÀÌ¿ëÇØ À¥ ½ºÅ©·¦ÇÎÀ» Çغ¸ÀÚ.

³×À̹öÀÇ ÅõÀÚ Á¤º¸¸¦ ÀÌ¿ëÇØ »ï¼ºÀüÀÚÀÇ ÇöÀç°¡¿Í »óÇÑ°¡, per¸¦ Ãâ·ÂÇÑ´Ù.

ºäƼǮ¼öÇÁÀÇ select ±¸¹®¸¸ ÀÌ¿ë ÇÒ °ÍÀÌ´Ù.

soup.select("div")   ÅÂ±× Ã£±â
soup.select("#per")  ¾ÆÀ̵ð·Î ã±â
soup.select(".per")  Ŭ·¡½º·Î ã±â

soup.select("div a")  ÅÂ±× ÀÚ¼Õ ÅÂ±× Ã£±â
soup.select("head > title")  ÅÂ±× ¹Ù·Î ¾Æ·¡ ÀÚ½Ä Ã£±â

soup.select("#code ~ .price")  ¾ÆÀ̵ð codeÀÇ ÅÂ±×¿Í ÇüÁ¦ °ü°èÁß price Ŭ·¡½ºÀÎ ¸ðµç ű×
soup.select("#code ~ +price")  ¾ÆÀ̵ð codeÀÇ ÅÂ±×¿Í ÇüÁ¦ °ü°èÁß price Ŭ·¡½ºÀΠù¹ø° ű×

»ï¼ºÀüÀÚÀÇ ÇöÀç°¡¸¦ Ãâ·Â

import requests
from bs4 import BeautifulSoup

url = 'https://finance.naver.com/item/sise.naver?code=005930'

response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html5lib')

price = soup.select_one("#_nowVal").text
print('ÇöÀç°¡=', price)

°á°ú)
ÇöÀç°¡= 76,700

»ï¼ºÀüÀÚ per Ãâ·Â

import requests
from bs4 import BeautifulSoup

url = 'https://finance.naver.com/item/sise.naver?code=005930'

response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html5lib')

per = soup.select_one("#_sise_per")
per = per.get_text()
per = per.replace('\t', '')
per = per.replace('\n', '')
per = per.replace('\r', '')
per = per.replace(' ', '')
print(f'per=[{per}]')

per´Â ¾ÆÀ̵ð _sise_per·Î ã´Â´Ù.

°á°ú)
per=[35.99]

»ï¼ºÀüÀÚ »óÇÑ°¡ Ãâ·Â


»óÇÑ°¡´Â ¾ÆÀ̵𰡠¾ø´Ù. ÀÌ·²¶§´Â Å©·ÒÀÇ °³¹ßâÀ» ÀÌ¿ëÇØ ¿ä¼ÒÀÇ À§Ä¡¸¦ ãÀ»¼ö ÀÖ´Ù.

Å©·Ò¿¡¼­ ¿øÇÏ´Â ¿ä¼ÒÀÇ °ª ¾òÀ»·Á¸é ´ÙÀ½°ú °°ÀÌ ÇÑ´Ù.
  1. ¿øÇÏ´Â ¿ä¼ÒÀ§·Î ¸¶¿ì½º À̵¿
  2. ¸¶¿ì½º ¿À¸¥ÂÊ ¹öÆ° Ŭ¸¯ > °Ë»ç ¸Þ´º ¼±ÅÃ
  3. Å©·Ò °³¹ßºä¿¡¼­ ¿ä¼Ò ÅÇ ¼±ÅÃ
  4. ¿øÇÏ´Â ¿ä¼ÒÀ§¿¡¼­ ¸¶¿ì½º ¿À¸¥ÂÊ ¹öÆ° Ŭ¸¯  > º¹»ç > selector º¹»ç





¸Þ¸ðÀå¿¡ ºÙ¿© ³Ö±â¸¦ ÇÏ¸é ´ÙÀ½°ú °°ÀÌ ³ª¿Â´Ù.

#content > div.section.inner_sub > div:nth-child(1) > table > tbody > tr:nth-child(8) > td:nth-child(2) > span


import requests
from bs4 import BeautifulSoup

url = 'https://finance.naver.com/item/sise.naver?code=005930'

response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html5lib')

price = soup.select_one("#_nowVal").text
print('ÇöÀç°¡=', price)

price = soup.select_one("#content > div.section.inner_sub > div:nth-child(1) > table > tbody > tr:nth-child(8) > td:nth-child(2) > span")
price = price.get_text()
price = price.replace('\t', '')
price = price.replace('\n', '')
price = price.replace('\r', '')
price = price.replace(' ', '')
print(f'»óÇÑ°¡=[{price}]')

°á°ú)
ÇöÀç°¡= 76,700
»óÇÑ°¡=[99,100]