How to handle Data, and Images(20) Web Crawling
Using Web Crawling to get Data
Lesson Notes in .ipynb file
How to handle Data, and Images(20) - Web Crawling
Topics
- Web Crawler
- Getting HTML code from specific website (1)
- Getting HTML code from specific website (2)
- Summary
Web Crawler
- Web Crawling is an automated method to retrieve data
- Can get needed information from any service
- Usually used with Python
Getting HTML code from specific website (1)
1
2
3
4
5
6
7
8
9
import requests
# make a request on specific URL
request = requests.get('https://hyeonukim.github.io/devblog/')
# after connecting, get website's source code
html = request.text.strip()
print(html)
Getting HTML code from specific website (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import requests
from bs4 import BeautifulSoup
# make a request on specific URL
request = requests.get('https://hyeonukim.github.io/devblog/')
# after connecting, get website's source code
html = request.text.strip()
# turn HTML source code into python's BeautifulSoup variable
soup = BeautifulSoup(html, 'html.parser')
# get all the elements consisting of <a> tag
links = soup.select('a')
# for all link
for link in links:
# if link has 'href' attribute
if link.has_attr('href'):
# href attribute has text 'posts' included
if link.get('href').find('posts') != -1:
print(link.text)
Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Leetcode 167. Two Sum II - Input Array Is SortedExplanation for Leetcode 167 - Two Sum II - Input Array Is Sorted, and its solution in Python. Jan 18, 2025 Leetcode, Two Pointers, Medium
Leetcode 15. 3SumExplanation for Leetcode 15 - 3Sum, and its solution in Python. Jan 18, 2025 Leetcode, Two Pointers, Medium
Leetcode 36. Valid SudokuExplanation for Leetcode 36 - Valid Sudoku, and its solution in Python. Jan 17, 2025 Leetcode, Arrays & Hashing, Medium
Leetcode 238. Product of Array Except SelfExplanation for Leetcode 238 - Product of Array Except Self, and its solution in Python. Jan 17, 2025 Leetcode, Arrays & Hashing, Medium
Leetcode 128. Longest Consecutive SequenceExplanation for Leetcode 128 - Longest Consecutive Sequence, and its solution in Python. Jan 17, 2025 Leetcode, Arrays & Hashing, Medium
How to handle Data, and Images(19) Matplotlib UsageDrawing different types of Graphs with Matploblib Jan 16, 2025 Handling Data and Images, Matplotlib
How to handle Data, and Images(18) Matplotlib IntroIntroduction to Matplotlib and how to plot a line graph and save the figure Jan 16, 2025 Handling Data and Images, Matplotlib
Leetcode 49. Group AnagramsExplanation for Leetcode 49 - Group Anagrams, and its solution in Python. Jan 16, 2025 Leetcode, Arrays & Hashing, Medium
Leetcode 347. Top K Frequent ElementsExplanation for Leetcode 347 - Top K Frequent Elements, and its solution in Python. Jan 16, 2025 Leetcode, Arrays & Hashing, Medium
How to handle Data, and Images(18) Pandas UsageMore in depth of some operations that Pandas offers and their usage Jan 15, 2025 Handling Data and Images, Pandas
Leetcode 15. 3Sum
Leetcode 167. Two Sum II - Input Array Is Sorted
Leetcode 238. Product of Array Except Self
Leetcode 128. Longest Consecutive Sequence
Leetcode 36. Valid Sudoku
Summary
- Press f12 on website to get the HTML code
- we can select specific class with BeautifulSoup.select
This post is licensed under CC BY 4.0 by the author.