Excelファイルの読み込み

11898 ワード

tidyxl r rstats unpivotr テキストリンク

問題

どのように、あなたは下記のようなR Excelファイルに輸入しますか?

テーブルにはセルにテキストデータがないが、塗りつぶし色には意味がある.
The readxl パッケージはいくつかの理由であまり役に立たないでしょう.

データがない“tidy” . 左への合併した細胞は、分類的な情報を伝えます;

セルの書式設定は、情報が実際にあるところです.

解決策

エンター tidyxl and unpivotr パッケージ.
The tidyxl パッケージは、データフレーム内の行としてExcelファイル上の各セルをインポートし、その列の位置、内容、書式を記述します.
The unpivotr パッケージは、tidyxlパッケージによって生成されたデータフレームを使用し、それを整理することが可能になります.
最初に、私はoriginal PDF file 人気のオンラインサービスを使用してExcelに.変換されたExcelファイルをダウンロードすることができますhere .
上記のパッケージでは、上記の表を読みやすくするのは以下のようになります.

library(dplyr)
library(purrr)
library(tidyxl)
library(unpivotr)
library(here)

filename <- here("raw-data/produtos_epoca-converted.xlsx")

# The workbook contains several sheets. We first import all tables to a
# list
tables_names <-
  c("Table 1", "Table 3", "Table 4", "Table 5", "Table 6", "Table 7")
tables_to_read <- map(tables_names, xlsx_cells, path = filename)

# We create a function to read each sheet
import_table <- function(df) {
  # Each fill color represent a different information. First, we create a
  # pallette of the fill colors in the sheet that can be indexed by the
  # `local_format_id` of a given cell to get the fill color of that cell
  fill_color_palette <-
    xlsx_formats(filename, "Table 1")$local$fill$patternFill$fgColor$rgb

  # Since the table has different headings, we have to filter out these
  # headings in order to have only the cells with data. Then, we create a
  # new column for the fill colors by looking up the `local_format_id` of
  # each cell in the pallette. Following, we create another column where
  # we codify this information.
  availability <-
    df %>%
    filter(row >= 2, col >= 3) %>%   # filter out headers
    mutate(fill_color = fill_color_palette[local_format_id]) %>%
    mutate(
      availability = case_when(
        fill_color == "FFFF7F7F" ~ "Low",
        fill_color == "FFFFFFCC" ~ "Medium",
        fill_color == "FFCCFFCC" ~ "High"
      )
    ) %>%
    select(availability)

  # We now transform all the headings so we can have a tidy data
  df %>%
    behead("left-up", category) %>%
    behead("left", produce) %>%
    behead("up", month) %>%
    bind_cols(availability) %>%
    select(category, produce, month, availability)
}

# Let's apply the function to our list of sheets
availability_ceagesp <- map_dfr(tables_to_read, import_table)

完了!

結果

そこで、私たちは、インポートされ、Excelのファイルを含んでいたのは、titidyデータを含むと意味を伝えるフォーマット.
現在、データはすでに処理され、必要に応じて調査されます.

availability_ceagesp %>%
  filter(month == "Set",
         availability == "High")

## # A tibble: 89 x 4
##    category produce                 month availability
##    <chr>    <chr>                   <chr> <chr>       
##  1 Frutas   Abacate Breda/Margarida Set   High        
##  2 Frutas   Abiu                    Set   High        
##  3 Frutas   Acerola                 Set   High        
##  4 Frutas   Banana Maçã             Set   High        
##  5 Frutas   Banana Prata            Set   High        
##  6 Frutas   Caju                    Set   High        
##  7 Frutas   Graviola                Set   High        
##  8 Frutas   Jabuticaba              Set   High        
##  9 Frutas   Kiwi Estrangeiro        Set   High        
## 10 Frutas   Laranja Lima            Set   High        
## # ... with 79 more rows

Reference

この問題について(Excelファイルの読み込み), 我々は、より多くの情報をここで見つけました https://dev.to/leonardoshibata/reading-complex-excel-files-2fjk

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

Galaxyのadminとしてユーザーを削除する(GDPR対応)

テストドライバjavascript開発--4.テストドライバ開発プロセス(下)