AWS S3

Evolution of Storage

Magnetic Tapes-->Floppy disks-->CD/DVD (Optical) Disks/ROMs-->USB flash drives-->Hard (Magnetic) Disk Drives-->Solid State Drives (SSDs)

Storage: A medium that can store data and can be accessed using an application service.

Elastic Block Storage (EBS): Amazon EBS is block storage that attaches to an EC2 instances.

Simple Storage Service (S3)

Amazon S3 has a simple web services interface that we can use to store and retrieve any amount of data, at any time, from anywhere on the web.

It gives any user access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global networks of websites. The service aims to maximize benefits of scale and to pass those benefits on to users.

Buckets: Root-level "folders" we create in S3 are referred to as buckets. Any "subfolders" we create in a bucket is referred to as a folder.

Objects: Files stored in a bucket are referred to as objects.

Creating an S3 Bucket:

  • Choose a bucket name. Bucket names must follow a set of rules:
    • Bucket names must be unique across all of AWS.
    • Bucket names must be 3 to 63 characters in length
    • Bucket names can only contain lowercase letters, numbers, and hyphens.
    • Bucket names must not be formatted as an IP address.
  • Select a region.
  • Block/allow public access
  • Set storage class

What is an S3 Storage Class?

A storage class represents the "classification" assigned to each object in S3. Available storage classes include:

  • Standard
  • Standard-IA (Infrequent Access)
  • One Zone-IA
  • Intelligent-Tiering
  • Glacier
  • Glacier Deep Archive

Each storage class has varying attributes that dictate things like:

  • Storage cost
  • Object availability
  • Object durability
  • Frequency of access (to the object)

Each object must be assigned a storage class ("Standard" is the default class). We can change the storage class of an object at any time.

Object Durability: The percent (%) over a one-year time period that a file stored in S3 will not be lost.

Object Availability: The percent (%) over a one-year time period that a file stored in S3 will be accessible.

Object Sharing: The ability to make any object publicly available via a URL link.

Object Lifecycles: Set rules to automatically transfer objects between storage classes at defined time intervals.

Object Versioning: Automatically keep multiple versions of an object (when enabled).

Additional benefits:

  • Durable, reliable, scalable
  • Security (offers three different kinds of encryption)
  • Integrates with almost all other AWS services
  • Can run big dta analytics on objects directkly in S2
  • Easy to get data in and out of S3
  • Robust admin and access management options available

Storage Gateway

  • AWS Storage Gateway is a hybrid storage service that enables our on-premises applications to seamlessly use AWS cloud storage.
  • We can use this service for backup and archiving, disaster recovery, cloud data processing, storage tiering, and migrations.
  • This service helps us reduce and simplify our data center OR branch/ remote offices' storage infrastructure.
  • Our applications connect to the service through a virtual machine or hardware gateway appliance using standard storage protocols, such as NFS, SMB and iSCSI.
  • The gateway connects to AWS storage services such as Amazon S3, Amazong S3 Glacier, Amazon S3 Glacier Deep Archive, Amazon EBS, and AWS Backup, providing storage for files, volumes, snapshots and virtual tapes in AWS.
  • The service includes a highly optimized data transfer mechanism with bandwidth management, automated network resilience and efficient data transfer, along with a local cache for low-latency on-premises access to our most active data.

Storage Gateway Deployment Types

File Gateway

Data is uploaded to S3 for use with object-based workloads. S3 file storage can also be used for storage tiering to allow for data storage on the most cost effective storage class.

Volume Gateway

Volumes are created in the AWS cloud. The applications in the customer data center can access these volumes. There are two types:

    • Stored volumes: all data is stored at the customer locations and periodically backed up to AWS using snapshots
    • Cached volumes: store data in the AWS cloud and the data is cached in the customers's data center for fast access.

Tape Gateway

Cost-effective, long-term, off-site data archiving. A virtual tape library (VTL) interfaces with the customers's existing tape backup software.

On AWS Services menu, search S3 and click S3. Next page, Click Create Bucket. In pop-up, input Bucket Name and Region. Click Next. Next page, keep default values for Configure options, click Next. Next page, keep the check on Block all public access. Click Next. Next page, Click Create. Next page, we will see our bucket. Click on our newly created bucket. Click Upload. That's it. We can see our file uploaded to our storage bucket. We can download this file as well.

Cloud Front

Amazon CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment.

CloudFront is integrated with AWS – both physical locations that are directly connected to the AWS global infrastructure, as well as other AWS services. CloudFront works seamlessly with services including AWS Shield for DDoS mitigation, Amazon S3, Elastic Load Balancing or Amazon EC2 as origins for your applications, and Lambda@Edge to run custom code closer to customers’ users and to customize the user experience. Lastly, if we use AWS origins such as Amazon S3, Amazon EC2 or Elastic Load Balancing, we don’t pay for any data transferred between these services and CloudFront.

On AWS Services menu, search Cloudfront and click Cloudfront. Next page, click Create Distribution. Next page, click Get Started. Next page, select Origin Domain Name. We will see our S3 bucket in the drop down. Select that bucket. Keep the default options for remaining settings. Click Create Distribution.

Installing Apache Web Server and testing from web browser

  1. Run an EC2 instance with Public IP enabled.
  2. Copy the Public IP and SSH into EC2.
  3. Run the following commands
    1. sudo yum update -y
    2. sudo yum install -y httpd
    3. sudo service httpd start
    4. sudo chkconfig httpd on
  4. Open a new browser window and paste the Public IP link and hit enter. Apache webserver page should be visible.

Relational Database Service (RDS)

In the world of databases, there are two main categories:

  • Relational databases known as SQL
  • Non-relational databases known as NoSQL

AWS offers services for both types of databases:

  • RDS for SQL databases
  • DynamoDB for NoSQL databases

What is RDS?

AWS RDS is a web service that makes it easier to set up, operate, and scale a relational database in the cloud. It provides cost-efficient, re-sizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups. It frees us to focus on our applications so we can give them the fast performance, high availability, security and compatibility the need.

SQL Options include:

  • Amazon Aurora
  • MySQL
  • MariaDB
  • PostgresQL
  • Oracle (Several Oracle options are available)
  • Microsoft SQL Server (several Microsoft options are available)

On Services menu, search RDS and click RDS. Click Create Database. Next page, select Standard Create. In Engine options, select PostgreSQL. In templates, select Free Tier. Type in Master password and confirm it. In DB instance size, keep it default as per Free Tier. In connectivity, keep Default VPC. In database authentication, select Password and IAM database authentication. In Additional configuration, input Initial database name. Click Create database. Next page, click on newly created database and we will see the details of our database. That's it.

What is Dynamo DB?

AWS DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance and automatic scaling of throughput capacity makes it a great fit for mobile, web, gaming, ad tec, IoT, and many other applications.

DynamoDB can replace (or is very similar to):

  • MongoDB
  • Cassandra DB
  • Oracle NoSQL

To SQL or NOSQL...What is the difference?

  1. SQL stores related data in tables (using columns and rows) while NoSQL stores related data in JSON like name-value documents.
  2. SQL typically used for very structured data, such as contact lists while NoSQL typically is used for non-structured data such as cataloging documents.

ElastiCache

What is Elasticache?

It is a web service that makes it easy to deploy, operate and scale an in-memory data store or cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, secure in-memory data stores instead of relying entirely on slower disk base database. It supports two open source in-memory engines.

Redis: A fast, open-source, in-memory data store and cache

Mecached: A widely adopted memory object caching system

What is Redshift?

It is a fast, fully managed data warehouse that makes it simple and cost effective to analyze all our data using standard SQL and our standard Business Intelligence (BI) tools. It allows us to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high performance local disks and massively parallel query execution.

Hands-on Lab for RDS

Lambda

It is a compute service that lets us run code without provisioning or managing servers. We pay only for the compute time - there is no charge when our code is not running. With AWS Lambda, we can run code for virtually any type of application or backend service - all with zero administration. AWS Lambda runs our code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging. All we need to do is supply our code in one of the languages that AWS Lambda supports.

Additional Benefits

  • No servers to manage
  • Continuous scaling
  • Subsecond metering
  • Integrates with almost all other AWS services

Primary Use Cases

  • Data processing
  • Real-time file processing
  • Real-time stream processing
  • Build serverless back-ends for web, mobile, IoT and third-party API requests

On AWS Services menu, search lambda and click Lambda. Next page, click Create Function. Next page, select Author from scratch. In basic information section, input Function name, Run Time (supported language), and select Create a new role with basic Lambda permissions. Click Create Function. That's it. Our first lambda function is created.

Next, Billing and Pricing