Requirements:

Try to get the Metadata from Google BigQuery by using AWS Lambda function
Build connection between AWS Lambda and Google BigQuery
Write Python code to get metadata from Google BigQuery
format the metadata schema

Solution

Get google-cloud-bigquery.zip Method 1

Instead of getting the GCP Layer via EC2 and Session, You can also download the zip file locally by running the following code

1 2	pip install google-cloud-bigquery -t ./python/ zip -r google-cloud-bigquery.zip python

Add the zip file to Lambda function as a layer from your local device

Get google-cloud-bigquery.zip Method 2

Create EC2 in AWS

create an EC2
Amazon Linux 2
free tier is enough- instance type: t2.micro
the IAM role grants only “Amazon EC2Role for SSM“ so that Session Manager of Systems Manager can connect to this EC2
leave the other options as default

Get AWS Access Key ID and Secret Access Key

Go to service “Identity and Access Management”
click “Users” and find your user name
click “Security credentials”
create access key and please remember that you can only view the credential once.

Start a Session in AWS

go to service “AWS System Manager”
click the tab “Session Manager”
click “Start a Session”

# ec2-Become a user
sudo su - ec2-user
#pip installation
sudo yum install python3 -y
curl -O https://bootstrap.pypa.io/get-pip.py
sudo python3 get-pip.py

export PATH=$PATH:/usr/local/bin
#sdk installation&Zip
pip install google-cloud-bigquery -t ./python/
zip -r google-cloud-bigquery.zip python
#Added because protobuf is required
pip install protobuf --upgrade -t ./python/
zip -r google-cloud-bigquery.zip ./python/google/protobuf

#aws cli settings
aws configure
#Set the following:
  AWS Access Key ID [None]: <my-access-key>
  AWS Secret Access Key [None]: <my-secret-key>
  Default region name [None]: ap-southeast-2
  Default output format [None]: json

#Save to s3
aws s3 mb s3://<my-bucket>
aws s3 cp google-cloud-bigquery.zip s3://<my-bucket>

Check S3 Bucket in AWS

go to the s3 bucket you created just now by using Session Manager
You should see google-cloud-bigquery.zip should be uploaded

Other Steps

create a lambda function and register the created library in the lambda layer in AWS

go to Lambda service
Click “Function” and “create function”
changes settings:
Timeout: 30 seconds

create Layer

add Layer: gcp-bigquery

And you should see the data has been created successfully

Get GCP certificate file in Google Cloud Platform

Log into Google Cloud Platform
go to “IAM&Admin” service and go to “Service Accounts”
Click “Create Service Account”: please make sure that you choose “BigQuery Admin” as the Role

Once you created the account, please click the account and “ADD KEY”

You are able to download the JSON file in the following format

{
  "type": "xxx",
  "project_id": "xxx",
  "private_key_id": "xxx",
  "private_key": "xxx",  
  "client_email": "xxx",
  "client_id": "xxx",
  "auth_uri": "xxx",
  "token_uri": "xxx",
  "auth_provider_x509_cert_url": "xxx",
  "client_x509_cert_url": "xxx",
}

Add functions and credential file in AWS Lambda Function

copy the JSON file into “bq.manager.json”

copy the following codes into “lambda_function.py”

import json
from google.cloud import bigquery

def lambda_handler(event, context):
    client = bigquery.Client()
    sql = """
        SELECT *
        FROM `<project name>.<dataset name>.<table name>`
    """
    
    # Run a Standard SQL query using the environment's default project
    results = client.query(sql).result()
    metadata_objects = []
    for row in results:
        id_dic = {"id": list(row)[0]}
        other_dic = {"metadata": list(row)[1]}
        metadata_objects.append({**id_dic, **other_dic})

    return metadata_objects

Add “GOOGLE_APPLICATION_CREDENTIALS: bq-manager.json” as Environment Variables

Test Result

You should be able to get the JSON result like the format in the BigQuery.

By default, you should get a SQL table. You can modify the result baesd on your code.

Updated Solutions

AWS SSM - Parameter Store

Store the credential paths in the SSM Parameter Store like a Python Dictionary

AWS Lambda - Environment Variable

Set all the paths as an environment variables

Updated Lambda code

from google.cloud import bigquery
from google.oauth2 import service_account
import boto3
import json
import os

ssm = boto3.client('ssm')

def lambda_handler(event, context):

		#retrieve parameters from ssm/appconfig and prepare for connection with BigQuery
		client_email_var = ssm.get_parameter(Name=os.getenv('SSM_BQ_CLIENT_EMAIL'))['Parameter']['Value']
		private_key_var = ssm.get_parameter(Name=os.getenv('SSM_BQ_PRIVATE_KEY'))['Parameter']['Value'].replace('\\n', '\n')
		token_uri_var = os.getenv('SSM_BQ_TOKEN_URI')
		credentials = service_account.Credentials.from_service_account_info(
		    {
		        "client_email": client_email_var,
		        "private_key": private_key_var,
		        "token_uri": token_uri_var
		    }
		)
		
		project_id_var = os.getenv('BQ_PROJECT_ID')
		dataset_name_var = os.getenv('BQ_DATASET_NAME')
		table_name_var = os.getenv('BQ_TABLE_NAME')
		if bigquery.Client(credentials = credentials,project=project_id_var):
		    client = bigquery.Client(credentials = credentials,project=project_id_var)
		else:
		    raise GCPConnectionError
		
		sql = f"""
		    SELECT *
		    FROM {project_id_var}.{dataset_name_var}.{table_name_var}
		"""
		
		# Run a Standard SQL query using the environment's default project
		#retrieve metadata from Bigquery and transfer it to JSON
		results = client.query(sql).result()
		metadata_objects = []
		for row in results:
		    id_dic = {"id": list(row)[0]}
		    other_dic = {"metadata": list(row)[1]}
		    metadata_objects.append({**id_dic, **other_dic})
    return metadata_objects

Hex.*

Access-Google-BigQuery-Metadata-From-AWS-Lambda

Requirements:

Solution

Get google-cloud-bigquery.zip Method 1

Get google-cloud-bigquery.zip Method 2

Create EC2 in AWS

Get AWS Access Key ID and Secret Access Key

Start a Session in AWS

Check S3 Bucket in AWS

Other Steps

create a lambda function and register the created library in the lambda layer in AWS

Get GCP certificate file in Google Cloud Platform

Add functions and credential file in AWS Lambda Function

Test Result

Updated Solutions

AWS SSM - Parameter Store

AWS Lambda - Environment Variable

Updated Lambda code