Web技術入門編02 :HTMLを取得しよう - Python編 | プログラミング学習サイト【paizaラーニング】

# coding: utf-8
import requests

uri = ''
html = requests.get(uri)
print(html.text)

# coding: utf-8
import requests

uri = ''
html = requests.get(uri)
print(html.text)

# coding: utf-8
import requests
from bs4 import BeautifulSoup

uri = 'http://localhost/~ubuntu/paijo.html'
html = requests.get(uri)

soup = BeautifulSoup(html.text, 'html.parser')
print(soup)

# coding: utf-8
import requests
from bs4 import BeautifulSoup

uri = 'http://localhost/~ubuntu/paijo.html'
html = requests.get(uri)

soup = BeautifulSoup(html.text, 'html.parser')
print(soup)

# coding: utf-8
import requests
from bs4 import BeautifulSoup

uri = 'http://localhost/~ubuntu/paijo.html'
html = requests.get(uri)

soup = BeautifulSoup(html.text, 'html.parser')

for element in soup.find_all('div'):
    print(element)

# coding: utf-8
import requests
from bs4 import BeautifulSoup

uri = 'http://localhost/~ubuntu/paijo.html'
html = requests.get(uri)

soup = BeautifulSoup(html.text, 'html.parser')

for element in soup.find_all('div'):
    print(element)

この動画を見るにはpaiza会員登録のうえ
有料会員登録が必要です

無料会員登録して始める

問題ログインすると模範解答や入力を見ることができます

演習課題「PythonでWebページを読み込む」

右側の環境で、ホームディレクトリにfetch.pyを用意してあり、PythonでWebページを読み込んで出力するコードが記述してあります。
このコードで、以下のサンプルページを読み込むよう、コードを修正してください。

```
http://localhost/~ubuntu/paijo.html
```

採点して、すべてのジャッジに正解すれば演習課題クリアです！

問題ログインすると模範解答や入力を見ることができます

演習課題「Pythonで、Webページのタイトルを取り出す」

右側の環境で、ホームディレクトリにfetch.pyを用意してあり、PythonでWebページを読み込んで出力するコードが記述してあります。
このサンプルページのタイトルだけを出力するよう、コードを修正してください。

採点して、すべてのジャッジに正解すれば演習課題クリアです！

問題ログインすると模範解答や入力を見ることができます

演習課題「Pythonで、Webページの要素をまとめて取り出す」

右側の環境で、ホームディレクトリにfetch.pyを用意してあり、PythonでWebページを読み込んで出力するコードが記述してあります。
このサンプルページにある以下の要素を出力するよう、コードを修正してください。

```
divタグで、class属性が「p-head」
```

採点して、すべてのジャッジに正解すれば演習課題クリアです！

Tips

動画へ戻る

次のチャプターへ前のチャプターへ

※有料会員になるとこの動画をご利用いただけます

詳しい説明を読む

＃07:HTMLを取得しよう - Python編

WebページのHTMLを取得して、指定の情報を取り出すプログラムをPythonで作ってみましょう。まずは、簡単なWebページを対象にして、基本的なテクニックを学習します。

BeautifulSoupをインストールする方法

ターミナルで、以下のコマンドを実行します

$ sudo pip3 install beautifulsoup4

URIを指定して読み込む

# coding: utf-8
import requests

uri = 'https://(url)/paiza.html'
html = requests.get(uri)
print(html.text)

アドレスは、自分の学習環境でブラウザからコピーする。

読み込んだWebページから、タイトル要素を取り出す

# coding: utf-8
import requests
https://paiza-webtech.paiza-user.cloud/~ubuntu/paiza.html

指定タグの要素を取り出す

# coding: utf-8
import requests
from bs4 import BeautifulSoup

uri = 'https://(url)/paiza.html'
html = requests.get(uri)
# print(html.text)

soup = BeautifulSoup(html.text, 'html.parser')
# print(soup.find('title').string)

for element in soup.find_all('h2'):
    print(element)

指定の属性の要素を取り出す

for element in soup.find_all('h2', class_='resume'):
    print(element)

指定の属性の値を取り出す

for element in soup.find_all('h2', class_='resume'):
    print(element['id'])

参考になるWebページ

Requests: HTTP for Humans™
https://requests.readthedocs.io/en/latest/

Beautiful Soup: We called him Tortoise because he taught us.
http://crummy.com/software/BeautifulSoup/

プログラミング学習 > PHP > Web技術入門編 > Web技術入門編02 : HTMLを理解しよう > HTMLを取得しよう - Python編

Web技術入門編02 : HTMLを理解しよう

コード判定結果

演習課題「PythonでWebページを読み込む」

演習課題「Pythonで、Webページのタイトルを取り出す」

演習課題「Pythonで、Webページの要素をまとめて取り出す」

＃07:HTMLを取得しよう - Python編

BeautifulSoupをインストールする方法

読み込んだWebページから、タイトル要素を取り出す

指定タグの要素を取り出す

指定の属性の要素を取り出す

指定の属性の値を取り出す

参考になるWebページ