GCSバケツとBigQueryテーブルの構築

9695 ワード

テキストリンク

Terrraformは、データ科学者とエンジニアがインフラストラクチャを構築し、そのライフサイクルを管理するのに役立ちます.
それを使用する2つの方法があります:ローカル＆雲.以下は、Google Cloud Platform(GCP)のインフラストラクチャを構築するためにローカルでインストールして使用する方法について説明します.

最初のインストール：地形とGoogle雲SDK

Terraformをインストールするには、オペレーティングシステムの適切なガイドを選択しますwebpage.
テラフォームのインストールを完了すると、GCPアカウントを持ってプロジェクトを開始する必要があります.プロジェクトのIDは、TraraFormで進行中の注記のインポートです.

次のステップは、GCPプロジェクトへのアクセスとコントロールのキーを取得することです.GCPのヘッダーのプルダウンメニューから作成したプロジェクトを選んでください.

         Navigation Menu >> IAM & Admin >> Service Accounts >> Create Service Account

次の手順を実行します.

あなたの好みの名前を割り当てる

STEP 2:開始のための役割「ビューアー」を選んでください

Step 3 :あなたの個人的なプロジェクトのこのオプションのステップをスキップします.

次に、サービスアカウントのリストに新しいアカウントを参照してください.をクリックしてローカルマシンのキーをダウンロードします.
次のステップは、あなたのローカルマシンにGoogleクラウドSDKをインストールしますhere.
次に、ローカルマシン上の環境変数をダウンロードしたキー(JSONファイル)にリンクするように、端末(以下GNU/Linuxの例)を開きます.

export GOOGLE_APPLICATION_CREDENTIALS=/--path to your JSON---/XXXXX-dadas2a4cff8.json

gcloud auth application-default login

これはあなたの対応するGoogleアカウントを選択するためにブラウザにリダイレクトします.現在、あなたのローカルSDKはあなたのクラウドサービスに到達して、構成する資格を持っています.ただし、これらの初期認証を使用すると、まだ構築したいGCPサービス、すなわちGoogle Cloud Storage(GCP)とBigQueryに固有のサービスアカウントのアクセス許可を変更する必要があります.

   Navigation Menu >> IAM & Admin >> IAM

プロジェクトを選択し、次のようにパーミッションを編集します

次の手順では、リンクのあとにプロジェクトのAPIを有効にします.

IAM API

IAM Credentials

( APIを有効にしながらGCPアカウントとプロジェクト名を参照してください)

形式でのGCPサービスの構築

必要なインストール(TeraFormとGoogle雲SDK)と認証を完了して、我々はあなたのローカルマシンからTerraformを通してこれらの2つのGCPサービスを構築する準備ができています.基本的に、インストールを設定するために2つのファイルが必要です.main.tf and variables.tf . 前者は以下のコードを必要とし、後者で提供されている変数(以下のコードスニペット)に対してGCPサービスを作成します.

# The code below is from https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/week_1_basics_n_setup/1_terraform_gcp
# --------------------------------------------------

terraform {
  required_version = ">= 1.0"
  backend "local" {}  
    google = {
      source  = "hashicorp/google"
    }
  }
}

provider "google" {
  project = var.project
  region = var.region
  // credentials = file(var.credentials)  # Use this if you do not want to set env-var GOOGLE_APPLICATION_CREDENTIALS
}

# Data Lake Bucket
# Ref: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket
resource "google_storage_bucket" "data-lake-bucket" {
  name          = "${local.data_lake_bucket}_${var.project}" # Concatenating DL bucket & Project name for unique naming
  location      = var.region

  # Optional, but recommended settings:
  storage_class = var.storage_class
  uniform_bucket_level_access = true

  versioning {
    enabled     = true
  }

  lifecycle_rule {
    action {
      type = "Delete"
    }
    condition {
      age = 30  // days
    }
  }

  force_destroy = true
}

# DWH
# Ref: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_dataset
resource "google_bigquery_dataset" "dataset" {
  dataset_id = var.BQ_DATASET
  project    = var.project
  location   = var.region
}

のためのコードvariables.tf :

# The code below is from https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/week_1_basics_n_setup/1_terraform_gcp
# The comments are added by the author

locals {
  data_lake_bucket = "BUCKET_NAME"  # Write a name for the GCS bucket to be created
}

variable "project" {
  description = "Your GCP Project ID"   # Don't write anything here: it will be prompted during installation
}

variable "region" {
  description = "Region for GCP resources. Choose as per your location: https://cloud.google.com/about/locations"
  default = "europe-west6"  # Pick a data center location in which your services will be located
  type = string
}

variable "storage_class" {
  description = "Storage class type for your bucket. Check official docs for more info."
  default = "STANDARD"
}

variable "BQ_DATASET" {
  description = "BigQuery Dataset that raw data (from GCS) will be written to"
  type = string
  default = "Dataset_Name" # Write a name for the BigQuery Dataset to be created
}

上記のファイルがフォルダに置かれると、それらを実行する時間です.terraform cliの主なコマンドはほとんどありません.
メインコマンド

以下のコマンドのために必要なフォルダとファイルを追加することでディレクトリを準備します

validate :既存の設定を有効かどうかを調べる

計画:指定された設定の計画変更を示す

適用:指定した設定のインフラストラクチャを作成する

破壊:既存のインフラを破壊する

init , plan , applyコマンドは以下の出力を出力します(短縮).

x@y:~/-----/terraform$ **terraform init**

Initializing the backend...

Successfully configured the backend "local"! Terraform will automatically
use this backend unless the backend configuration changes.
.
.
.

x@y:~/-----/terraform$ **terraform plan**
var.project
  Your GCP Project ID

  Enter a value: xxx-yyy # write yor GCP project ID here

x@y:~/-----/terraform$ **terraform apply**

var.project
  Your GCP Project ID

  Enter a value: xxx-yyy # write yor GCP project ID here

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
  + create

Terraform will perform the following actions:
.
.
.

上記の3つの簡単なコードを実行した後、GCCの新しいGCSバケットとBigQueryテーブルがGCPアカウントに表示されます.

Reference

この問題について(GCSバケツとBigQueryテーブルの構築), 我々は、より多くの情報をここで見つけました https://dev.to/cemkeskin84/building-gcs-bucket-and-bigquery-tables-with-terraform-4hf4

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

プリティプリントプリーズ

Oracleデータベースの新規作成、ユーザーの削除