Hands-on Apache Superset, Amazon S3, and Amazon Athena

8799 ワード

Athena S3 superset AWS AWS テキストリンク

What is Apache Superset?

"Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application". (Apache Software Foundation)

Some other equivalents you might've heard of would be Tableau or PowerBI, but they're all business licensed software.

What about Amazon S3 and Athena?

S3 : "Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. " (Amazon Web Service)

Athena : "Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run." (Amazon Web Service)

What You'll Need Beforehand

An AWS account (and cash duh).
AWS credentials set.
An Ubuntu 18.04+ environment.
Mapbox account.
pip installed.

Installation

PyAthena

Apache Superset needs an API interface to interact with AWS Athena.

pip install "PyAthena>1.2.0"

Apache Superset

Install superset

pip install apache-superset

Initialize the database

superset db upgrade

Create an admin user (you will be prompted to set a username, first and last name before setting a password)

export FLASK_APP=superset

superset fab create-admin

Load some data to play with

superset load_examples

Create default roles and permissions

superset init

Workflow

To start a development web server on port 8088, use -p to bind to another port

superset run -p 8088 --with-threads --reload --debugger

Switch to your browser and go to http://127.0.0.1:8088/, you should now see something resembling the following

Login with the admin account you have just created. You'll see some examples have been loaded if you followed the tutorial. Play with them if you want to, but we'll be using some other data for demonstrative purposes.

Throw some data into an AWS S3 bucket to process with. This airbnb data from kaggle is what I'll be using.

aws s3 cp ~PATH/TO/AB_NYC_2019.csv s3://YOUR-BUCKET

Now, come back to your Apache Superset's UI and add the click on Databases, then the + button on the top right hand corner.
Modify and add the following text to SQL Alchemy URI.

awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}

Log into AWS Athena's interface and define the columns you need for your database. I won't be using all the columns for simplicity.(Make sure that the region of your AWS Athena and the S3 bucket you made is the same) If you're familiar enough with AWS Athena, you can execute the exact query on Apache Superset's UI.

Going back to your Apache Superset UI, you should see the following

Run a query of your preference and click on Explore.

Running a deck.gl visualization gives us ...

Oops, seems like we need a map token from MapBox. Register an account and export your token as an environment variable to MAPBOX_API_KEY. See the official documentation

export MAPBOX_API_KEY=your-token

Restart your server and you should now see ...

Darker grids are where the average Airbnb price are higher.

Here's a place where some of the more expensive Airbnb's rooms are clustered, and the reasons might be apparent.

Conclusions

There are a lot left to talk about with Apache Superset, AWS S3, and AWS Athena, but the general idea here is to demonstrate a data analysis workflow combining various tools. Indeed, one can achieve this without using any of the above, for instance, with the combination of Tableau and Google Bigquery.

Reference

Apache Software Foundation, "Apache Superset (incubating)", Apache Software Foundation. https://superset.incubator.apache.org/#apache-superset-incubating. 22 August 2020.
Amazon Web Service, "Amazon S3", Amazon Web Service. https://aws.amazon.com/s3/. 22 August 2020.
Amazon Web Service, "Amazon Athena", Amazon Web Service. https://aws.amazon.com/athena/. 22 August 2020.

Author And Source

この問題について(Hands-on Apache Superset, Amazon S3, and Amazon Athena), 我々は、より多くの情報をここで見つけました https://qiita.com/andy971022/items/91adc5c826f95ee4ea42

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .