AWS - Redshift - Data Traffic

Getting your Trinity Audio player ready...
pt flag
en flag
Voiced by Amazon Polly

Imagine the following scenario:

You use Redshift as DW or DL for your reporting and data loads; you see a possibility to make your life easier and give the customer freedom to access that data and generate reports in whatever way they think is coolest with the tool they want, etc.; but remember that AWS charges for data output; Look in AWS portal and discovers that they don't have specific monitoring of who is leaving with data, but they settle the charge... incredible... But you don't want to abandon the idea and want to make some money out of it..

What I am going to show you is not the perfect solution, it lacks some improvements but it is already a guide to help with this idea...

Redshift is a modified PostgreSQL, so a lot of query on PG system tables works fine in Redshift...

For this scenario, you can create an integration services package and run the following query against Redshift:

select

TRIM (Q.database) AS DB,

TRIM (u.usename) as usename,

sum (bytes) /1024 kbytes,

sum (packets) the packets,

date (b.starttime) the dates

from

stl_bcast

join stl_query to NON

b.query = q.query

AND b.userid = q.userid

join pg_user or

on u.usesysid = q.userid

where

packets 0

and b.starttime = dateadd (day, -7, current_date)

and datediff (seconds, b.starttime, b.endtime) 0

—and u.usename like 'usr%'

—and querytxt not like 'fetch%'

group by TRIM (Q.Database)

, u.usename

, date (b.starttime)

This query will bring the volume information in kb traveled by the executed query.

This allows you to build an incremental report and prorate the cost of data output from AWS.

IS IT 100%? , no,, but at least it's something since AWS doesn't provide granularized data from those who consume the data output.

new code will also be created in another repository:

Leave a Reply

Your email address will not be published. Required fields are marked *