# ここにRコード「ミクロ政治データ分析実習」第11回課題
データハンドリング(2)
問題1: {tidyverse}パッケージを読み込む。
問題2: LMSからダウンロードした課題用データ(Micro_HW11_1.csvとMicro_HW11_2.csv)を読み込み、それぞれraw_df1、raw_df2という名のオブジェクトとして格納し、出力すること。
- データはプロジェクト・フォルダー内に
Dataというフォルダーを作成し、そこにアップロードしておくこと。 read.csv()関数でなく、read_csv()関数を使用すること。read.csv()関数を使用する場合、サンプルページと異なる見た目の結果が出力される。むろん、減点対象である。
# ここにRコード# A tibble: 19 × 4
Country Population Freedom HDI
<chr> <dbl> <chr> <dbl>
1 Argentina 45195774 F 0.845
2 Australia 25499884 F 0.944
3 Brazil 212559417 F 0.765
4 Canada 37742154 F 0.929
5 China 1447470092 NF 0.761
6 France 68147691 F 0.901
7 Germany 83783942 F 0.947
8 India 1380004385 PF 0.645
9 Indonesia 273523615 PF 0.718
10 Italy 60461826 F 0.892
11 Japan 126476461 F 0.919
12 South Korea 51269185 F 0.916
13 Mexico 128932753 PF 0.779
14 Russia 145934462 NF 0.824
15 Saudi Arabia 34813871 NF 0.854
16 South Africa 59308690 F 0.709
17 Turkey 84339067 NF 0.82
18 United Kingdom 68517621 F 0.932
19 United States 334308644 F 0.926
# ここにRコード# A tibble: 19 × 365
Country `2021/01/02` `2021/01/03` `2021/01/04` `2021/01/05` `2021/01/06`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Argentina 5240 5884 8222 13790 13441
2 Australia 24 20 13 19 10
3 Brazil 15353 17190 25039 58083 62507
4 Canada 5814 9625 9717 7528 8911
5 China 82 77 122 96 144
6 France 3466 12491 4084 20833 25186
7 Germany 12690 10315 9847 11897 21237
8 India 18177 16504 16375 18088 20346
9 Indonesia 7203 6877 6753 7445 8854
10 Italy 11825 14245 10798 15375 20326
11 Japan 3071 3166 3343 4949 6049
12 South Korea 651 1020 715 839 868
13 Mexico 6359 5211 6464 11271 13345
14 Russia 25938 23845 23015 23955 23902
15 Saudi Arabia 101 82 94 104 118
16 South Africa 15002 11859 12601 14410 21832
17 Turkey 11180 9877 13695 14494 13830
18 United King… 57877 55116 58919 61093 62554
19 United Stat… 272279 203401 185502 232864 259763
# ℹ 359 more variables: `2021/01/07` <dbl>, `2021/01/08` <dbl>,
# `2021/01/09` <dbl>, `2021/01/10` <dbl>, `2021/01/11` <dbl>,
# `2021/01/12` <dbl>, `2021/01/13` <dbl>, `2021/01/14` <dbl>,
# `2021/01/15` <dbl>, `2021/01/16` <dbl>, `2021/01/17` <dbl>,
# `2021/01/18` <dbl>, `2021/01/19` <dbl>, `2021/01/20` <dbl>,
# `2021/01/21` <dbl>, `2021/01/22` <dbl>, `2021/01/23` <dbl>,
# `2021/01/24` <dbl>, `2021/01/25` <dbl>, `2021/01/26` <dbl>, …
問題3: raw_df1とraw_df2の大きさ(行数と列数)を出力する。
# ここにRコード[1] 19 4
# ここにRコード[1] 19 365
問題4: raw_df1とraw_df2の変数名(列名)を出力する。
# ここにRコード[1] "Country" "Population" "Freedom" "HDI"
# ここにRコード [1] "Country" "2021/01/02" "2021/01/03" "2021/01/04" "2021/01/05"
[6] "2021/01/06" "2021/01/07" "2021/01/08" "2021/01/09" "2021/01/10"
[11] "2021/01/11" "2021/01/12" "2021/01/13" "2021/01/14" "2021/01/15"
[16] "2021/01/16" "2021/01/17" "2021/01/18" "2021/01/19" "2021/01/20"
[21] "2021/01/21" "2021/01/22" "2021/01/23" "2021/01/24" "2021/01/25"
[26] "2021/01/26" "2021/01/27" "2021/01/28" "2021/01/29" "2021/01/30"
[31] "2021/01/31" "2021/02/01" "2021/02/02" "2021/02/03" "2021/02/04"
[36] "2021/02/05" "2021/02/06" "2021/02/07" "2021/02/08" "2021/02/09"
[41] "2021/02/10" "2021/02/11" "2021/02/12" "2021/02/13" "2021/02/14"
[46] "2021/02/15" "2021/02/16" "2021/02/17" "2021/02/18" "2021/02/19"
[51] "2021/02/20" "2021/02/21" "2021/02/22" "2021/02/23" "2021/02/24"
[56] "2021/02/25" "2021/02/26" "2021/02/27" "2021/02/28" "2021/03/01"
[61] "2021/03/02" "2021/03/03" "2021/03/04" "2021/03/05" "2021/03/06"
[66] "2021/03/07" "2021/03/08" "2021/03/09" "2021/03/10" "2021/03/11"
[71] "2021/03/12" "2021/03/13" "2021/03/14" "2021/03/15" "2021/03/16"
[76] "2021/03/17" "2021/03/18" "2021/03/19" "2021/03/20" "2021/03/21"
[81] "2021/03/22" "2021/03/23" "2021/03/24" "2021/03/25" "2021/03/26"
[86] "2021/03/27" "2021/03/28" "2021/03/29" "2021/03/30" "2021/03/31"
[91] "2021/04/01" "2021/04/02" "2021/04/03" "2021/04/04" "2021/04/05"
[96] "2021/04/06" "2021/04/07" "2021/04/08" "2021/04/09" "2021/04/10"
[101] "2021/04/11" "2021/04/12" "2021/04/13" "2021/04/14" "2021/04/15"
[106] "2021/04/16" "2021/04/17" "2021/04/18" "2021/04/19" "2021/04/20"
[111] "2021/04/21" "2021/04/22" "2021/04/23" "2021/04/24" "2021/04/25"
[116] "2021/04/26" "2021/04/27" "2021/04/28" "2021/04/29" "2021/04/30"
[121] "2021/05/01" "2021/05/02" "2021/05/03" "2021/05/04" "2021/05/05"
[126] "2021/05/06" "2021/05/07" "2021/05/08" "2021/05/09" "2021/05/10"
[131] "2021/05/11" "2021/05/12" "2021/05/13" "2021/05/14" "2021/05/15"
[136] "2021/05/16" "2021/05/17" "2021/05/18" "2021/05/19" "2021/05/20"
[141] "2021/05/21" "2021/05/22" "2021/05/23" "2021/05/24" "2021/05/25"
[146] "2021/05/26" "2021/05/27" "2021/05/28" "2021/05/29" "2021/05/30"
[151] "2021/05/31" "2021/06/01" "2021/06/02" "2021/06/03" "2021/06/04"
[156] "2021/06/05" "2021/06/06" "2021/06/07" "2021/06/08" "2021/06/09"
[161] "2021/06/10" "2021/06/11" "2021/06/12" "2021/06/13" "2021/06/14"
[166] "2021/06/15" "2021/06/16" "2021/06/17" "2021/06/18" "2021/06/19"
[171] "2021/06/20" "2021/06/21" "2021/06/22" "2021/06/23" "2021/06/24"
[176] "2021/06/25" "2021/06/26" "2021/06/27" "2021/06/28" "2021/06/29"
[181] "2021/06/30" "2021/07/01" "2021/07/02" "2021/07/03" "2021/07/04"
[186] "2021/07/05" "2021/07/06" "2021/07/07" "2021/07/08" "2021/07/09"
[191] "2021/07/10" "2021/07/11" "2021/07/12" "2021/07/13" "2021/07/14"
[196] "2021/07/15" "2021/07/16" "2021/07/17" "2021/07/18" "2021/07/19"
[201] "2021/07/20" "2021/07/21" "2021/07/22" "2021/07/23" "2021/07/24"
[206] "2021/07/25" "2021/07/26" "2021/07/27" "2021/07/28" "2021/07/29"
[211] "2021/07/30" "2021/07/31" "2021/08/01" "2021/08/02" "2021/08/03"
[216] "2021/08/04" "2021/08/05" "2021/08/06" "2021/08/07" "2021/08/08"
[221] "2021/08/09" "2021/08/10" "2021/08/11" "2021/08/12" "2021/08/13"
[226] "2021/08/14" "2021/08/15" "2021/08/16" "2021/08/17" "2021/08/18"
[231] "2021/08/19" "2021/08/20" "2021/08/21" "2021/08/22" "2021/08/23"
[236] "2021/08/24" "2021/08/25" "2021/08/26" "2021/08/27" "2021/08/28"
[241] "2021/08/29" "2021/08/30" "2021/08/31" "2021/09/01" "2021/09/02"
[246] "2021/09/03" "2021/09/04" "2021/09/05" "2021/09/06" "2021/09/07"
[251] "2021/09/08" "2021/09/09" "2021/09/10" "2021/09/11" "2021/09/12"
[256] "2021/09/13" "2021/09/14" "2021/09/15" "2021/09/16" "2021/09/17"
[261] "2021/09/18" "2021/09/19" "2021/09/20" "2021/09/21" "2021/09/22"
[266] "2021/09/23" "2021/09/24" "2021/09/25" "2021/09/26" "2021/09/27"
[271] "2021/09/28" "2021/09/29" "2021/09/30" "2021/10/01" "2021/10/02"
[276] "2021/10/03" "2021/10/04" "2021/10/05" "2021/10/06" "2021/10/07"
[281] "2021/10/08" "2021/10/09" "2021/10/10" "2021/10/11" "2021/10/12"
[286] "2021/10/13" "2021/10/14" "2021/10/15" "2021/10/16" "2021/10/17"
[291] "2021/10/18" "2021/10/19" "2021/10/20" "2021/10/21" "2021/10/22"
[296] "2021/10/23" "2021/10/24" "2021/10/25" "2021/10/26" "2021/10/27"
[301] "2021/10/28" "2021/10/29" "2021/10/30" "2021/10/31" "2021/11/01"
[306] "2021/11/02" "2021/11/03" "2021/11/04" "2021/11/05" "2021/11/06"
[311] "2021/11/07" "2021/11/08" "2021/11/09" "2021/11/10" "2021/11/11"
[316] "2021/11/12" "2021/11/13" "2021/11/14" "2021/11/15" "2021/11/16"
[321] "2021/11/17" "2021/11/18" "2021/11/19" "2021/11/20" "2021/11/21"
[326] "2021/11/22" "2021/11/23" "2021/11/24" "2021/11/25" "2021/11/26"
[331] "2021/11/27" "2021/11/28" "2021/11/29" "2021/11/30" "2021/12/01"
[336] "2021/12/02" "2021/12/03" "2021/12/04" "2021/12/05" "2021/12/06"
[341] "2021/12/07" "2021/12/08" "2021/12/09" "2021/12/10" "2021/12/11"
[346] "2021/12/12" "2021/12/13" "2021/12/14" "2021/12/15" "2021/12/16"
[351] "2021/12/17" "2021/12/18" "2021/12/19" "2021/12/20" "2021/12/21"
[356] "2021/12/22" "2021/12/23" "2021/12/24" "2021/12/25" "2021/12/26"
[361] "2021/12/27" "2021/12/28" "2021/12/29" "2021/12/30" "2021/12/31"
raw_df1の詳細
| 変数名 | 詳細 | 備考 |
|---|---|---|
Country |
国名 | |
Population |
人口 | |
Freedom |
フリーダム・ハウス指標 | F = Free; PF = Partly Free; NF = Not Free |
HDI |
人間開発指数 | 2019年 |
raw_df2の詳細
| 変数名 | 詳細 |
|---|---|
Country |
国名 |
| その他 | 当該日の新型コロナ新規感染者数 |
問題5: raw_df1のFreedom変数とHDI変数をリコーディングする。リコーディングした後、raw_df1を上書きし、raw_df1を出力すること。
Freedom変数Freedomの値が"F"なら"Free"、それ以外は"Others"とし、Freedom列に上書きする。Freedom変数をfactor化する。要素の順番は"Free"、"Others"の順とする。
HDI変数HDIの値が0.8以上なら"Very High"、0.7以上なら"High"、それ以外は"Middle"とし、HDI列に上書きする。HDI変数をfactor化する。要素の順番は"Very High"、"High"、"Middle"の順とする。
# ここにRコード# A tibble: 19 × 4
Country Population Freedom HDI
<chr> <dbl> <fct> <fct>
1 Argentina 45195774 Free Very High
2 Australia 25499884 Free Very High
3 Brazil 212559417 Free High
4 Canada 37742154 Free Very High
5 China 1447470092 Others High
6 France 68147691 Free Very High
7 Germany 83783942 Free Very High
8 India 1380004385 Others Middle
9 Indonesia 273523615 Others High
10 Italy 60461826 Free Very High
11 Japan 126476461 Free Very High
12 South Korea 51269185 Free Very High
13 Mexico 128932753 Others High
14 Russia 145934462 Others Very High
15 Saudi Arabia 34813871 Others Very High
16 South Africa 59308690 Free High
17 Turkey 84339067 Others Very High
18 United Kingdom 68517621 Free Very High
19 United States 334308644 Free Very High
問題6: raw_df2をlong型データへ整形し、raw_df2に上書きする。日付の列名はDate、新規感染者数の列名はNewCasesとする。整形後のraw_df2を出力すること。
# ここにRコード# A tibble: 6,916 × 3
Country Date NewCases
<chr> <chr> <dbl>
1 Argentina 2021/01/02 5240
2 Argentina 2021/01/03 5884
3 Argentina 2021/01/04 8222
4 Argentina 2021/01/05 13790
5 Argentina 2021/01/06 13441
6 Argentina 2021/01/07 13835
7 Argentina 2021/01/08 13346
8 Argentina 2021/01/09 11057
9 Argentina 2021/01/10 7808
10 Argentina 2021/01/11 8704
# ℹ 6,906 more rows
問題7: raw_df2のDate列と年(Year)、月(Month)、日(Day)に分割する。分割後、raw_df2を上書きし、raw_df2を出力すること。
# ここにRコード# A tibble: 6,916 × 5
Country Year Month Day NewCases
<chr> <chr> <chr> <chr> <dbl>
1 Argentina 2021 01 02 5240
2 Argentina 2021 01 03 5884
3 Argentina 2021 01 04 8222
4 Argentina 2021 01 05 13790
5 Argentina 2021 01 06 13441
6 Argentina 2021 01 07 13835
7 Argentina 2021 01 08 13346
8 Argentina 2021 01 09 11057
9 Argentina 2021 01 10 7808
10 Argentina 2021 01 11 8704
# ℹ 6,906 more rows
問題8: raw_df2を使い、月ごのとNewCasesの合計を計算し、結果をNewCases列として出力する。NewsCasesが高い月が上位に位置するようにソートすること。
# ここにRコード# A tibble: 12 × 2
Month NewCases
<chr> <dbl>
1 12 17436298
2 04 15888981
3 05 14599455
4 01 13366267
5 08 12045598
6 09 9889881
7 07 9228822
8 03 8924379
9 11 8577309
10 10 7607929
11 06 7373985
12 02 7116346
問題9: raw_df1とraw_df2を結合する。キー変数はCountryである。結合したデータはdfという名のオブジェクトとして格納し、dfを出力すること。
- 結合後のデータの大きさが6916行、8列であることを確認すること。
# ここにRコード# A tibble: 6,916 × 8
Country Population Freedom HDI Year Month Day NewCases
<chr> <dbl> <fct> <fct> <chr> <chr> <chr> <dbl>
1 Argentina 45195774 Free Very High 2021 01 02 5240
2 Argentina 45195774 Free Very High 2021 01 03 5884
3 Argentina 45195774 Free Very High 2021 01 04 8222
4 Argentina 45195774 Free Very High 2021 01 05 13790
5 Argentina 45195774 Free Very High 2021 01 06 13441
6 Argentina 45195774 Free Very High 2021 01 07 13835
7 Argentina 45195774 Free Very High 2021 01 08 13346
8 Argentina 45195774 Free Very High 2021 01 09 11057
9 Argentina 45195774 Free Very High 2021 01 10 7808
10 Argentina 45195774 Free Very High 2021 01 11 8704
# ℹ 6,906 more rows
問題10: dfを用い、100万人当たり新規感染者数を計算し、NewCases_per_1Mという列として追加する。追加後、dfを上書きし、dfを出力すること。
# ここにRコード# A tibble: 6,916 × 9
Country Population Freedom HDI Year Month Day NewCases NewCases_per_1M
<chr> <dbl> <fct> <fct> <chr> <chr> <chr> <dbl> <dbl>
1 Argentina 45195774 Free Very… 2021 01 02 5240 116.
2 Argentina 45195774 Free Very… 2021 01 03 5884 130.
3 Argentina 45195774 Free Very… 2021 01 04 8222 182.
4 Argentina 45195774 Free Very… 2021 01 05 13790 305.
5 Argentina 45195774 Free Very… 2021 01 06 13441 297.
6 Argentina 45195774 Free Very… 2021 01 07 13835 306.
7 Argentina 45195774 Free Very… 2021 01 08 13346 295.
8 Argentina 45195774 Free Very… 2021 01 09 11057 245.
9 Argentina 45195774 Free Very… 2021 01 10 7808 173.
10 Argentina 45195774 Free Very… 2021 01 11 8704 193.
# ℹ 6,906 more rows
問題11: dfを用い、国ごとの100万人当たり新規感染者数の合計を計算し、少ない国が上位の行に位置するようにソートする。
# ここにRコード# A tibble: 19 × 2
Country NewCases_per_1M
<chr> <dbl>
1 China 17.4
2 Saudi Arabia 5554.
3 South Korea 11170.
4 Japan 11807.
5 Indonesia 12838.
6 Australia 15570.
7 India 17794.
8 Mexico 19720.
9 South Africa 40203.
10 Canada 43112.
11 Russia 49107.
12 Germany 64544.
13 Italy 66096.
14 Brazil 68630.
15 Turkey 86101.
16 Argentina 89053.
17 United States 103095.
18 France 108305.
19 United Kingdom 152679.
問題12: dfを用い、政治的・市民的自由度(Freedom)ごとの100万人当たり新規感染者数の平均を計算し、NewCases_per_1Mという名の列として出力する。また、Nという列には該当する国の数を表示させる。
n()のみでは国数の計算ができないため、更に364 (観測日数)で割る必要がある。
# ここにRコード# A tibble: 2 × 3
Freedom NewCases_per_1M N
<fct> <dbl> <dbl>
1 Free 177. 12
2 Others 75.0 7
問題13: dfを用い、人間開発指数(HDI)ごとに100万人当たり新規感染者数の平均を計算し、NewCases_per_1Mという名の列として出力する。また、Nという列には該当する国の数を表示させる。
# ここにRコード# A tibble: 3 × 3
HDI NewCases_per_1M N
<fct> <dbl> <dbl>
1 Very High 170. 13
2 High 77.7 5
3 Middle 48.9 1