BeautifulSoup4

1. p = re.compile("원하는 형태") 을 통해 형태를 받아 온다

2. 다양한 함수를 통해 확인을 한다

m = p.match("비교할 문자열 ") : 주어진 문자열의 처음부터 일치하는지 확인

m = p.search("비교할 문자열 ") : 주어진 문자열 중에 일치하는게 있는지 확인

lst = p.findall("비교할 문자열 ") : 일치하는 모든 것을 리스트 형태로 반환

※원하는 형태

. (ca.e) : 하나의 문자를 의미 > care, cafe, case (O) | caffe (X)

^ (^de) : 문자열의 시작 > desk, destination (O) | fade (X)

$ (se$) : 문자열의 끝 > case, base (O) | face (X)

import requests
res = requests.get("<http://naver.com>") #url에 대한 정보를 받아와 res에 저장
res.raise_for_status() #문제 없을땐 pass, 문제 있을땐 error
with open("mycrowling.html", "w", encoding= "utf8") as f :
    f.write(res.text)

Untitled

네이버에서 권한을 주지 않으면 다음의 코드를 실행시켜보면 403 error 가 난다

다음의 사이트를 통해 나의 User Agent 정보를 확인하자

**https://www.whatismybrowser.com/detect/what-is-my-user-agent/**

import requests
url  = "<http://naver.com>"
headers = {"User-Agent" : "본인의 User-agent"}
res = requests.get(url, headers = headers)
res.raise_for_status() #문제 없을땐 pass, 문제 있을땐 error
with open("mycrowling.html", "w", encoding= "utf8") as f :
    f.write(res.text)

다음의 코드를 사용하면 동일한 정보를 얻을 수 있다

한계점 :