Skip to main content

Build a Serverless Real-Time Data Processing App

 Overview

Serverless applications don’t require you to provision, scale, and manage any servers. You can build them for nearly any type of application or backend service, and everything required to run and scale your application with high availability is handled for you.

Serverless architectures can be used for many types of applications. For example, you can process transaction orders, analyze click streams, clean data, generate metrics, filter logs, analyze social media, or perform IoT device data telemetry and metering.

In this project, you’ll learn how to build a serverless app to process real-time data streams. You’ll build infrastructure for a fictional ride-sharing company. In this case, you will enable operations personnel at a fictional Wild Rydes headquarters to monitor the health and status of their unicorn fleet. Each unicorn is equipped with a sensor that reports its location and vital signs.

You’ll use AWS to build applications to process and visualize this data in real-time. You’ll use AWS Lambda to process real-time streams, Amazon DynamoDB to persist records in a NoSQL database, Amazon Kinesis Data Analytics to aggregate data, Amazon Kinesis Data Firehose to archive the raw data to Amazon S3, and Amazon Athena to run ad-hoc queries against the raw data.

This workshop is broken up into four modules. You must complete each module before proceeding to the next.

1. Build a data stream
    Create a stream in Kinesis and write to and read from the stream to track

    Wild Rydes unicorns on the live map. In this module you'll also create an

    Amazon Cognito identity pool to grant live map access to your stream.

2. Aggregate data
    Build a Kinesis Data Analytics application to read from the stream and

    aggregate metrics like unicorn health and distance traveled each minute.

3. Process streaming data
    Persist aggregate data from the application to a backend database stored

    in DynamoDB and run queries against those data.

4. Store & query data
    Use Kinesis Data Firehose to flush the raw sensor data to an S3 bucket

    for archival purposes. Using Athena, you'll run SQL queries against the

    raw data for ad-hoc analyses.
Requirements

AWS Experience: Beginner to Intermediate

Time to complete: 110 minutes

Cost to complete: Each service used in this architecture is eligible for the AWS Free Tier. If you are outside the usage limits of the Free Tier, completing this project will cost you less than $0.5 (assuming all services are running for 2 hours)*

To complete this tutorial you will use:

• Active AWS Account**

• Browser (Chrome recommended)

• AWS Lambda

• Amazon Kinesis

• Amazon S3

• Amazon DynamoDB

• Amazon Cognito

• Amazon Athena

• AWS IAM

*This estimate assumes you follow the recommended configurations throughout the tutorial and terminate all resources within 2 hours.

**Accounts that have been created within the last 24 hours might not yet have access to the resources required for this project.


Architecture_0

In order to complete this workshop, you’ll need an AWS account and access to create AWS Identity and Access Management (IAM), Amazon Cognito, Amazon Kinesis, Amazon S3, Amazon Athena, Amazon DynamoDB, and AWS Cloud9 resources within that account. The step-by-step guide below explains you how to set up all prerequisites.

Step 1. Create an AWS Account

The code and instructions in this workshop assume only one participant is using a given AWS account at a time. If you attempt sharing an account with another participant, you will encounter naming conflicts for certain resources. You can work around this by either using a suffix in your resource names or using distinct Regions, but the instructions do not provide details on the changes required to make this work.


Use a personal account or create a new AWS account for this workshop rather than using an organization’s account to ensure you have full access to the necessary services and to ensure you do not leave behind any resources from the workshop.


Step 2. Region

Use US East (N. Virginia), US West (Oregon), or EU (Ireland) for this workshop. Each supports the complete set of services covered in the material. Consult the Region Table to determine which services are available in a Region.


Step 3. Set up your AWS Cloud9 IDE

AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal. Cloud9 comes pre-packaged with essential tools for popular programming languages and the AWS Command Line Interface (CLI) pre-installed so you don’t need to install files or configure your laptop for this workshop. Your Cloud9 environment will have access to the same AWS resources as the user with which you logged into the AWS Management Console.


Take a moment now and setup your Cloud9 development environment.


  1. Go to the AWS Management Console, select Services then select Cloud9 under Developer Tools.
  2. Select Create environment.Enter Development into Name and optionally provide a Description.
  3. Select Next Step.
  4. You may leave Environment settings at their defaults of launching a new t2.micro EC2 instance which will be paused after 30 minutes of inactivity.
  5. Select Next steps.
  6. Review the environment setting and select Create environment. It will take several minutes for your environment to be provisioned and prepared.
  7. Once ready, your IDE will open to a welcome screen.
  8. You can run AWS CLI commands in here just like you would on your local computer. Verify that your user is logged in by running aws sts get-caller identity.
  9. You'll see the output indicating your account and user information.
  10. Keep your AWS Cloud9 IDE opened in a tab throughout this workshop as you'll use it for activities like building and running a sample app in a Docker container and using AWS CLI.

Admin:~/environment $ aws sts get-caller-identity

{

    "Account": "123456789012",

    "UserId": "AKIAI44QH8DHBEXAMPLE",

    "Arn": "arn:aws:iam::123456789012:user/Alice"

}
Keep your AWS Cloud9 IDE opened in a tab throughout this workshop as you'll use it for activities like building and running a sample app in a Docker container and using AWS CLI.


Step 4. Set up the Command Line Clients

The modules utilize two command-line clients to simulate and display sensor data from the unicorns in the fleet. These are small programs written in the Go Programming Language. The below instructions in the Installation section walks through downloading pre-built binaries, but you can also download the source and build it manually:

•   produce.go

•   consumer.go


Producer

The producer generates sensor data from a unicorn taking a passenger on a Wild Ryde. Each second, it emits the location of the unicorn as a latitude and longitude point, the distance traveled in meters in the previous second, and the unicorn’s current level of magic and health points.


Consumer


The consumer reads and displays formatted JSON messages from an Amazon Kinesis stream which allow us to monitor in real-time what’s being sent to the stream. Using the consumer, you can monitor the data the producer and your applications are sending.


Installation

Switch to the tab where you have your Cloud9 environment opened

Download and unpack the command line clients by running the following command in the Cloud9 terminal:

curl -s https://dataprocessing.wildrydes.com/client/client.tar | tar -xv

This will unpack the consumer and producer files to your Cloud9 environment.

Tips
NOTE: Keep an open scratch pad in Cloud9 or a text editor on your local computer for notes. When the step-by-step directions tell you to note something such as an ID or Amazon Resource Name (ARN), copy and paste that into the scratch pad.
Recap

🔑 Use a unique personal or development AWS account

🔑 Use one of the US East (N. Virginia), US West (Oregon), or EU (Ireland) Regions

🔑 Keep your AWS Cloud9 IDE opened in a tab

Comments

Popular posts from this blog

The Seven-Step Model of Migration

Irrespective of the migration approach adopted, the Seven-step Model of Cloud Migration creates a more rational point of view towards the migration process and offers the ability to imbibe several best practices throughout the journey Step 1: Assess Cloud migration assessments are conducted to understand the complexities in the migration process at the code, design and architectural levels. The investment and the recurring costs are also evaluated along with gauging the tools, test cases, functionalities and other features related to the configuration. Step 2: Isolate The applications to be migrated to the cloud from the internal data center are freed of dependencies pertaining to the environment and the existing system. This step cuts a clearer picture about the complexity of the migration process. Step 3: Map Most organisations hold a detailed mapping of their environment with all the systems and applications. This information can be used to distinguish between the ...

Cloud Computing architecture

Cloud computing architecture refers to all components and sub-components that together form the structure of the cloud computing system. This architecture can be divided into three parts for better understanding – front end, back end and middleware. Each part of the cloud architecture has its own set of functionalities and protocols that work together to deliver on-demand services to user-facing hardware as well as software. In general, the architecture is evolved out of large distributed network applications over the last two decades. Hence it supports any system where resources can be pooled and partitioned as required. The general cloud architecture is capable of running multiple software applications running on multiple virtual hardware in multiple locations to efficiently render on-demand services to the users. The users could be using these software applications from their desktop or laptop or mobile or tablets. Usually, whatever the user is looking at – through t...

connection oriented

connection-oriented:- connection-oriented  describes a means of transmitting data in which the devices at the end points use a preliminary  protocol  to establish an end-to-end connection before any data is sent. Connection-oriented protocol service is sometimes called a "reliable" network service, because it guarantees that data will arrive in the proper sequence. Transmission Control Protocol ( TCP ) is a connection-oriented protocol. For connection-oriented communications, each end point must be able to transmit so that it can communicate. The alternative to connection-oriented transmission is the  connection-less  approach, in which data is sent from one end point to another without prior arrangement. Connection-less protocols are usually described as  stateless  because the end points have no protocol-defined way to remember where they are in a "conversation" of message exchanges. Because they can keep track of a conversation, connection-or...

tag