Connecting Glue job to Snowflake Private Endpoint using Private Link

13289 ワード

Background

When you want to transform data in Snowflake for your data analytics use case, you would usually implement data transformation logic in SQL and create a view or table. On the other hand, if the logic is too complicated to implement in SQL, Snowpark, which is Snowflake version of Data Frame API, would comes in handy. Your data transformation logic in Data Frame will be transformed into SQL then your data will be processed in Snowflake.

However, only Snowpark Scala API is in generally available as of May 2022, Snowpark Python API is still in Private Preview, which is not available in production system. As an alternative solution of Snowpark Python API, AWS Glue, which is serverless Spark service in AWS, would be very useful.

In this article, I will explain how to connect Glue job with Snowflake, especially via VPC endpoint using AWS Private Link. AWS Private Link is widely used in large enterprises to connect VPCs in their own AWS accounts with 3rd party services in AWS. If you are using Private Link for your Snowflake account, your Glue job requires custom Glue connection with VPC configuration to connect with Snowflake private endpoint.

Network configuration

Assume that you have network configuration as follows.

network

Glue job Set up

In this section, I will explain how to set up Glue job that can connect to Snowflake private endpoint using Private Link.

Private Link must be set up in advance

Assume that you have already completed the following configurations.

  • Private Link between your own AWS VPC and Snowflake's VPC.
  • You can connect to Snowflake via Private Endpoint.

If not, please follow the instructions below.

Create a Glue connector

As mentioned previously, you need to create a Glue connector before you create a Glue connection for Snowflake JDBC. To create a Glue connector, please follow the following instruction.