Skip to main content

Build a Serverless Real-Time Data Processing App

 Overview

Serverless applications don’t require you to provision, scale, and manage any servers. You can build them for nearly any type of application or backend service, and everything required to run and scale your application with high availability is handled for you.

Serverless architectures can be used for many types of applications. For example, you can process transaction orders, analyze click streams, clean data, generate metrics, filter logs, analyze social media, or perform IoT device data telemetry and metering.

In this project, you’ll learn how to build a serverless app to process real-time data streams. You’ll build infrastructure for a fictional ride-sharing company. In this case, you will enable operations personnel at a fictional Wild Rydes headquarters to monitor the health and status of their unicorn fleet. Each unicorn is equipped with a sensor that reports its location and vital signs.

You’ll use AWS to build applications to process and visualize this data in real-time. You’ll use AWS Lambda to process real-time streams, Amazon DynamoDB to persist records in a NoSQL database, Amazon Kinesis Data Analytics to aggregate data, Amazon Kinesis Data Firehose to archive the raw data to Amazon S3, and Amazon Athena to run ad-hoc queries against the raw data.

This workshop is broken up into four modules. You must complete each module before proceeding to the next.

1. Build a data stream
    Create a stream in Kinesis and write to and read from the stream to track

    Wild Rydes unicorns on the live map. In this module you'll also create an

    Amazon Cognito identity pool to grant live map access to your stream.

2. Aggregate data
    Build a Kinesis Data Analytics application to read from the stream and

    aggregate metrics like unicorn health and distance traveled each minute.

3. Process streaming data
    Persist aggregate data from the application to a backend database stored

    in DynamoDB and run queries against those data.

4. Store & query data
    Use Kinesis Data Firehose to flush the raw sensor data to an S3 bucket

    for archival purposes. Using Athena, you'll run SQL queries against the

    raw data for ad-hoc analyses.
Requirements

AWS Experience: Beginner to Intermediate

Time to complete: 110 minutes

Cost to complete: Each service used in this architecture is eligible for the AWS Free Tier. If you are outside the usage limits of the Free Tier, completing this project will cost you less than $0.5 (assuming all services are running for 2 hours)*

To complete this tutorial you will use:

• Active AWS Account**

• Browser (Chrome recommended)

• AWS Lambda

• Amazon Kinesis

• Amazon S3

• Amazon DynamoDB

• Amazon Cognito

• Amazon Athena

• AWS IAM

*This estimate assumes you follow the recommended configurations throughout the tutorial and terminate all resources within 2 hours.

**Accounts that have been created within the last 24 hours might not yet have access to the resources required for this project.


Architecture_0

In order to complete this workshop, you’ll need an AWS account and access to create AWS Identity and Access Management (IAM), Amazon Cognito, Amazon Kinesis, Amazon S3, Amazon Athena, Amazon DynamoDB, and AWS Cloud9 resources within that account. The step-by-step guide below explains you how to set up all prerequisites.

Step 1. Create an AWS Account

The code and instructions in this workshop assume only one participant is using a given AWS account at a time. If you attempt sharing an account with another participant, you will encounter naming conflicts for certain resources. You can work around this by either using a suffix in your resource names or using distinct Regions, but the instructions do not provide details on the changes required to make this work.


Use a personal account or create a new AWS account for this workshop rather than using an organization’s account to ensure you have full access to the necessary services and to ensure you do not leave behind any resources from the workshop.


Step 2. Region

Use US East (N. Virginia), US West (Oregon), or EU (Ireland) for this workshop. Each supports the complete set of services covered in the material. Consult the Region Table to determine which services are available in a Region.


Step 3. Set up your AWS Cloud9 IDE

AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It includes a code editor, debugger, and terminal. Cloud9 comes pre-packaged with essential tools for popular programming languages and the AWS Command Line Interface (CLI) pre-installed so you don’t need to install files or configure your laptop for this workshop. Your Cloud9 environment will have access to the same AWS resources as the user with which you logged into the AWS Management Console.


Take a moment now and setup your Cloud9 development environment.


  1. Go to the AWS Management Console, select Services then select Cloud9 under Developer Tools.
  2. Select Create environment.Enter Development into Name and optionally provide a Description.
  3. Select Next Step.
  4. You may leave Environment settings at their defaults of launching a new t2.micro EC2 instance which will be paused after 30 minutes of inactivity.
  5. Select Next steps.
  6. Review the environment setting and select Create environment. It will take several minutes for your environment to be provisioned and prepared.
  7. Once ready, your IDE will open to a welcome screen.
  8. You can run AWS CLI commands in here just like you would on your local computer. Verify that your user is logged in by running aws sts get-caller identity.
  9. You'll see the output indicating your account and user information.
  10. Keep your AWS Cloud9 IDE opened in a tab throughout this workshop as you'll use it for activities like building and running a sample app in a Docker container and using AWS CLI.

Admin:~/environment $ aws sts get-caller-identity

{

    "Account": "123456789012",

    "UserId": "AKIAI44QH8DHBEXAMPLE",

    "Arn": "arn:aws:iam::123456789012:user/Alice"

}
Keep your AWS Cloud9 IDE opened in a tab throughout this workshop as you'll use it for activities like building and running a sample app in a Docker container and using AWS CLI.


Step 4. Set up the Command Line Clients

The modules utilize two command-line clients to simulate and display sensor data from the unicorns in the fleet. These are small programs written in the Go Programming Language. The below instructions in the Installation section walks through downloading pre-built binaries, but you can also download the source and build it manually:

•   produce.go

•   consumer.go


Producer

The producer generates sensor data from a unicorn taking a passenger on a Wild Ryde. Each second, it emits the location of the unicorn as a latitude and longitude point, the distance traveled in meters in the previous second, and the unicorn’s current level of magic and health points.


Consumer


The consumer reads and displays formatted JSON messages from an Amazon Kinesis stream which allow us to monitor in real-time what’s being sent to the stream. Using the consumer, you can monitor the data the producer and your applications are sending.


Installation

Switch to the tab where you have your Cloud9 environment opened

Download and unpack the command line clients by running the following command in the Cloud9 terminal:

curl -s https://dataprocessing.wildrydes.com/client/client.tar | tar -xv

This will unpack the consumer and producer files to your Cloud9 environment.

Tips
NOTE: Keep an open scratch pad in Cloud9 or a text editor on your local computer for notes. When the step-by-step directions tell you to note something such as an ID or Amazon Resource Name (ARN), copy and paste that into the scratch pad.
Recap

🔑 Use a unique personal or development AWS account

🔑 Use one of the US East (N. Virginia), US West (Oregon), or EU (Ireland) Regions

🔑 Keep your AWS Cloud9 IDE opened in a tab

Comments

Popular posts from this blog

The Seven-Step Model of Migration

Irrespective of the migration approach adopted, the Seven-step Model of Cloud Migration creates a more rational point of view towards the migration process and offers the ability to imbibe several best practices throughout the journey Step 1: Assess Cloud migration assessments are conducted to understand the complexities in the migration process at the code, design and architectural levels. The investment and the recurring costs are also evaluated along with gauging the tools, test cases, functionalities and other features related to the configuration. Step 2: Isolate The applications to be migrated to the cloud from the internal data center are freed of dependencies pertaining to the environment and the existing system. This step cuts a clearer picture about the complexity of the migration process. Step 3: Map Most organisations hold a detailed mapping of their environment with all the systems and applications. This information can be used to distinguish between the

Special Permissions in linux

The setuid permission on an executable file means that the command will run as the user owning the file, not as the user that ran the command. One example is the passwd command: [student@desktopX ~]$ ls -l /usr/bin/passwd -rw s r-xr-x. 1 root root 35504 Jul 16 2010 /usr/bin/passwd In a long listing, you can spot the setuid permissions by a lowercase s where you would normally expect the x (owner execute permissions) to be. If the owner does not have execute permissions, this will be replaced by an uppercase S . The special permission setgid on a directory means that files created in the directory will inherit their group ownership from the directory, rather than inheriting it from the creating user. This is commonly used on group collaborative directories to automatically change a file from the default private group to the shared group, or if files in a directory should be

RequestsDependencyWarning: urllib3 (1.24.1) or chardet (3.0.4) doesn't match a supported version

import tweepy /usr/lib/python2.7/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.24.1) or chardet (3.0.4) doesn't match a supported version!   RequestsDependencyWarning) Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/usr/local/lib/python2.7/dist-packages/tweepy/__init__.py", line 14, in <module>     from tweepy.api import API   File "/usr/local/lib/python2.7/dist-packages/tweepy/api.py", line 12, in <module>     from tweepy.binder import bind_api   File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 11, in <module>     import requests   File "/usr/lib/python2.7/dist-packages/requests/__init__.py", line 97, in <module>     from . import utils   File "/usr/lib/python2.7/dist-packages/requests/utils.py", line 26, in <module>     from ._internal_utils import to_native_string   File "/usr/lib/python2.

tag