AWS

Misc

Optimizations

  • Increase the number of threads that the AWS CLI uses to some large number (the default is 10) with aws configure set default.s3.max_concurrent_requests 50
  • If downloading data is more of a bottleneck than cpu power, use a network speed optimized ec2 instance (“n” in the name) such as c5n.4xl.
  • Decrease EC2 boot time (link)
    • Components
      • Instance Warming
        • Boot the instance once; shut the instance down; then boot it again when needed.
          • From a billing perspective, AWS does not charge for the EC2 instance itself when stopped, as there’s no physical hardware being reserved; a stopped instance is just the configuration that will be used when the instance is started next. Note that you do pay for the root EBS volume though, as it’s still consuming storage.
          • Therefore, it’s possible to boot an EC2 instance once, let it perform whatever initialization it needs, then stop the EC2 instance, and this creates a “warmed” EBS root volume.
          • It also caches configurations and the assigned private ip address
        • Call EC2 instances directly with the LaunchInstances and StartInstances API calls
        • FYI AWS has some kind of service called “Warm Pools for EC2 Auto Scaling” but it isn’t as fast as the method described.
      • Instance Resizing
        • Change the instance’s type with UpdateInstance before starting it again.
        • You can use cheaper instance types to perform the warming and avoid consuming AWS capacity for the warming process - e.g, t3.large instance types for first launching the instance.
        • When AWS has capacity issues and a particular instance type is unavailable, it’s possible to update the instance type to another one with availability and still use the pre-warmed EBS root volume.
    • GH Action Runner Workflow
      1. Created as a t3.large instance with a root EBS volume
      2. Assigned a private IP address in the target VPC
      3. The kernel and userspace processes start once, causing the relevant data blocks to stream from S3 onto the EBS volume
      4. The instance is stopped
      5. When a GitHub job request arrives, the instance type is updated to a m7a instance type, depending on how many CPUs were requested, and the instance is started
      6. If AWS responds that there is no current capacity for m7a instances, the instance is updated to a backup type (like m7i) and started again

Create and Manage Account

  • aws.amazon.com  – Create a free account (top right)
    • e: <email>
    • p: <password>
    • Acct Name:
    • All info needs entered:
      • Tick personal,
      • Full name,phone number, country, address, city, state, postal code,
      • Tick you’ve read user agreement
  • Payment info
    • credit card, exp date, name
  • Verify phone
    • country, phone number (again), the “i am not a robot” thing, click call me now
      • 4 digit code pops up on screen, computer calls you, and you enter code
      • Click continue
  • Selecting the Basic/free acct option (1-year),
  • You can skip the Personalize your experience stuff. Hit “sign into the console” (mid-right)
    • Enter email and password
  • Basic account page set-up
    • Upper right corner –> name of the account –> dropdown menu –> My Account
      • Main Body –> Alternate Contact info –> right side, click edit
      • Main Body –> Security Challenge Questions –> click edit
      • Main Body –> IAM User and Role Access –>  click edit –> check Activate box –> click update
        • Think this is just for generating roles that have access to billing
      • Main Body –> Manage Communication preferences –> edit
        • newsletters, services info, etc.
      • Main Body –> Manage AWS support plan
        • Already selected free earlier
        • Where you can upgrade from free tier to a paid plan
      • (nothing to fill out) left panel –> Dashboard, Bills, Cost Explorer, Reports, allocation tags, credits, misc accounting stuff
        • Stats on your spending
        • Invoice summary of this month’s usage
        • Visual breakdown of the cost/services you’re incurring
        • Reports
        • Create tags to designate which departments/projects are using which services
        • Where to input promo codes
      • Create Spending Alerts
        • Left Panel –> Budgets –> Create Budget
        • Select Type: Cost, usage, reservation, utilization
          • use cost
            • $20
        • Give budget a name
        • Period: monthly, etc.
          • Use monthly, it’s the smallest period available
        • Select start date and end date
        • Amount
        • Can set limits per service
        • Notifications
          • Actual, greater than, 50% of budget amount
            • Other option besides actual is forecasted
          • Email address
          • Option for SNS service but didn’t discuss what that is
          • Create new notification
            • Add additional email alert for 75%
        • Click create budget (bottom right)
      • Left Panel –> Preferences
        • Set notifications for when you exceed the free tier services
        • Get emailed billing invoice
        • Billing alerts, reports
  • Identity Access Management
    • Home Console – top search box under AWS Services, type “IAM”
      • Select Identity and access management
    • Change User Sign-in URL to something more memorable
      • Example: root account name
        • ercbk
      • Click customize (mid-right) and type name
      • URL will be changed to <acct name>.signin.aws.amazon.com/console
    • Security Status section (main body)
      • Activate multi-factor authentication (MFA)
        • This is for root user, see change password section below if you’re a user that’s part of a group
        • Physical MFA (e.g. usb drive) or virtual MFA
          • Choose virtual
        • Click next twice to the bar code
        • Use authenticator app to scan bar code
          • Click + in app, then click bar code, and scan bar code on screen
        • Asks for 2 consecutive pins that display on the app
        • Click finish
        • (May need to refresh page on main screen in order to see green tick mark)
    • Create individual IAM users
      • Allows you to distribute various permissions to people with different roles in your org
      • Need to set up an admin user to get access keys so you can use AWS programmatically
      • left panel –> Users –> add users –> enter user name – select access types
        • Choose access type
          • Tick programmatic and management console
        • Enter console password
        • Require password reset: nope (untick box)
        • Click next (bottom right) to set permissions
      • Set permissions
        • If you’re creating user groups (left panel), you can skip this and set the permissions group-wide
        • Select Attach existing policies directly
          • Tick box “AdministratorAccess” –> click next (bottom right)
            • Dude did this through groups and policies, so may have to go that route if AdministratorAccess isn’t available, then add the user to the group. Also there’s a button to attach additional “policies” later on if needed
      • Review and click create User
      • A lot more to this… groups, policies (see vid for details)
    • Select password policy
      • Tick allow users to set password and untick everything else –> apply policy
      • activate/deactivate security token regions - he didn’t go into this but all US and EU regions were activated by default, so probably doesn’t need to be messed with.
    • Copy URL, log out as root user, Now you can starting going to URL and sign in under the administrator acct you made
      • To change password
        • left panel –> Users –> select yourself –> security credentials tab –> Console password
        • Can also create MFA in same tab
  • Cloud Trail
    • log of actions taken on account
    • type cloud trail in search box
    • Create trail - lets you store the logs in a S3 bucket 

Overview of Basic Services

  • Compute
    • EC2 - Elastic Cloud Compute - Computer Clusters
      • Closely related to on-premise set-ups
    • Elastic Container Service (ECS) - Spin up containers on top of EC2
    • Lambda - Serverless Architecture
      • Computing resources can scale and descend automatically based on real-time demands.
      • Handles security patches and OS updates automatically
      • Issue with “timeouts” - not optimal for long running applications
        • A timeout defines the maximum amount of time a Lambda function can execute before it’s terminated.
        • Default timeout is set at 3 seconds, but you can configure it in the AWS console in increments of 1 second, with a maximum of 15 minutes (900 seconds).
          • AWS charges for Lambda usage in increments of 100 milliseconds (0.1 seconds). Setting the timeout in 1-second increments helps ensure you’re not billed for extra time if your function runs slightly longer than expected.
          • Price optimization entails tuning this timeout parameter to be as close to your run time as possible. Need to take into account timeouts of other services included in your lambda function. (Guide)
        • Monitor your Lambda functions for timeouts using CloudWatch metrics.
      • Dependency limit at 50 MB, can add 512 MB more to a tmp file after function has executed
      • Spins down when not in use, so you don’t pay for downtime, but takes 5 or more secs to spin back up
        • Can ping server from time to time to keep it “warm”
      • Less tweakable than EC2, if problems occur, less flexible in terms of your team handling it
      • The code you run on AWS Lambda is uploaded as a “Lambda function”. Each function has associated configuration information, such as its name, description, entry point, and resource requirements. The code must be written in a “stateless” style i.e. it should assume there is no affinity to the underlying compute infrastructure. Local file system access, child processes, and similar artifacts may not extend beyond the lifetime of the request, and any persistent state should be stored in Amazon S3, Amazon DynamoDB, or another Internet-available storage service. Lambda functions can include libraries, even native ones.
  • Storage
    • S3
  • Database -automanaged by AWS
    • relational db service (RDS)
    • dynamoDB - noSQL
    • ElasticCache - redis
    • Redshift - cadillac data warehouse service
  • Management Tools
    • CloudWatch
      • monitor cpu usage
      • latency
      • set alarms around metrics

CLI

  • google “download aws cli”
  • Get keys
    • log into aws
    • search IAM –> goto iam
    • left panel –> users –> your usesrname –> security credentials tab –> create access key –> 2 keys are created –> click download csv file
      • access key id (kind of like a username)
      • secret access key (kind of like a password)
  • Set-up
    • Configure profile
      • aws configure --profile <name>
        • choose a name, can be anything
      • You’ll be asked for your
        • “access key id” and “secret access key”, enter those
        • optional: default region: us-east-2 or hit enter to skip
        • optional: default output format: just hit enter (he didn’t go into this)
  • commands
    • Help commands
      • gives description, flags, inputs, etc.
      • aws help
      • aws <service> help
        • eg aws iam help
      • aws <service> <command> help
        • eg aws iam list-users help
      • Can also go to AWS documentation at website

EC2

  • On home screen, in search window, type “ec2”, click link

EC2 Page

  • left panel – Dashboard
    • Main Body – Under Resources
      • Instances - which cluster instances you have running
      • Volumes - which storage services you have
      • key pairs - you get a key pair for each running instance
      • Other stuff…
  • left panel – Instances
    • instances
      • where you launch on-demand instances
      • no bidding, you pay full price
    • launch templates
      • stored instance configurations
      • alternate method to launch instances
    • spot instances (requests)
      • bid for available instances for a cheaper price
      • If the Spot price increases above your bid price, capacity is no longer available, or the spot request has constraints that can’t be met, then the Spot Instance can be “interrupted.”
        • https://aws.amazon.com/premiumsupport/knowledge-center/spot-instance-terminate/
        • EBS volumes can be attached, snapshots taken, or results sent to s3 buckets to prepare for a potential terminations (see above link)
    • reserved instances
      • you can make a reservation for an instance to guarantee it’s availability at the time you specify. Can be cheaper than on-demand.
    • dedicated host
      • When you buy instances you share hosts (servers) with other people. Here you can guarantee only you are on the host with your instances.
  • left panel – images
    • AMI
      • scenario: you launch an ec2 and load more packages onto it and want to save that AMI to reuse at a later time. Here you can take a snapshot of that image.
      • you can share these images with other users, sell the image in the marketplace
  • left panel – Elastic Block Store
    • volumes
      • hard drives - hdd (cold or optimized, ssd, iops ssd )
    • snapshots
      • back-up of the volume
      • can be used to launch another instance if the current one is terminated
  • left_panel – Network and Security
    • Security Groups (also see docker notebook – aws – running a task)
      • firewall you place in front of an EC2 instance
      • specify allowed ports and allowed ip connections
      • can’t restrict to inbound or outbound traffic; open is open
    • Elastic IPs
      • allows you to fix static ips (up to 5) to your instances
        • one less thing you’d have to configure if you’re starting and stopping ec2 instances a lot.
    • Placement Groups
      • decreases latency by placing all of your instances in the same or closely placed hosts
    • Key Pairs
      • a form of tags that you can use to access your instance more easily,
        • use values as descriptors of use case for instance to organize and monitor
        • Can use to SSH into your instance
        • pairs are key:value. Example: key= sales, value  = sales_forecast
    • Network Interfaces
      • gives a network card to your instance so it can connect to the internet
  • left panel – Load Balancing
    • scaling managed by aws

    • routes traffic to different instances

    • Load Balancers (also see Docker notebook, aws create load balancer section)

      • types: application, network
    • Target Groups

  • left panel – auto scaling
    • adds instances if triggers set in load balancers (latency, cpu usage, memory)
    • shuts down instances once load decreases
    • keeps cost lower
  • left panel – systems manager
    • allows you to run a command across multiple instances by specifying tags, ids, etc
    • can use scripts that update packages

Launch

  • Search ec2 in the search window
  • Steps
    • top-right header – choose a region
      • where are the users located (latency)
    • left panel – Instances – instances (or Spot Instances)
      • click launch instances
      • choose an AMI, click select on right-hand side
      • choose compute option by ticking box on the left and click next on bottom right
        • eg t2 - micro (free tier option)
      • configure instance details
        • select how many instances you want to launch
        • network, subnet, auto-assign public ip
          • defaults should be fine
        • IAM role - mentions something about databases
        • shutdown behavior: “stop” terminates instance when you stop it
        • enable termination protection: protects against accidental termination
          • think this might have to do with spot instances being terminated
        • monitoring - enable CloudWatch ($)(see Overview of basic services section)
          • free version updates metrics every 5 min
          • detailed version updates metrics every 1 min
        • T2 unlimited
          • enabling says if your t2 instance cpu goes over 20% usage, then start using my credits, and once my credits are used, then bill me for the rest
          • guessing if this is unticked then your t2 is throttled below 21% to remain totally free
      • Add storage
        • see Elastic Block Store above for details on available drives
        • comes with a “root” drive by default
          • Choose HD type, size of storage, and whether to delete on termination (volume gets deleted)
          • General Purpose SSD and up to 30 GB available for free tier.
          • Also saw a snapshot ID I think but he didn’t discuss
        • click add volume to add multiple HDs
      • Add tags
        • add tag
        • enter key and value (different from key pair below)
        • tick which resources you want to use the tag for: instance and/or volume
          • can create unique tags for different resources
      • Configure security group
        • Assign a new security group or select an existing group
          • tick select existing –> tick default
      • Click review and launch
        • shows all the options you selected above
        • Hit launch
      • Choose existing or create new key pair
        • file that’s necessary to ssh (linux) or log into (windows)
        • Create new key pair –> enter name –> download key pair file
          • or you can select previous key pair and use the same file
        • Launch instance
      • Launch Status
        • Click View Instances
          • Watch status column, usually takes a couple minutes to launch
          • screen is split in half, use divider to increase size of lower screen if you want to view the details of the instance you started
          • I think you’re in left panel – instances – instances

Connect to/ Terminate Instance

  • Also see Docker, AWS >> Create Cluster: EC2 with SSH Access
  • left panel – instances – instances
    • highlight instance you want to connect to
    • In bottom of split EC2 screen, copy IPv4 Public IP and save it in notebook++ or somewhere
  • Open an inbound port to instance
    • left panel – Network and Security – Security Groups
      • click on security group you selected during launch of instance
      • lower half of screen – inbound tab
        • “all traffic” means full communication allowed between all instances within this security group
        • click edit – add rule
          • Type – click dropdown – choose SSH
            • after choosing SSH, it also inputs port 22
          • Source – ditto – choose “My IP”
            • automatically finds your ip and inputs it
          • Description - “SSH for ip” or whatever you want
        • for Windows Server AMI instance
          • Type – click dropdown – choose RDP
            • chooses port 3389
          • Source – ditto – choose “My IP”
            • automatically finds your ip and inputs it
          • Description - “RDP for <your name> ip” or whatever you want
  • If on a linux machine (locally), click instance, choose connect, follow instructions
  • If on windows (locally):
    • Open Puttygen (key generator)
      • Click load – find and select the key pair file (.pem) you downloaded before launching the instance
      • Click Save private key
        • It’ll ask if you’re sure you don’t want to attach a passphrase – click yes
        • enter name of .ppk file (no need to add extension) – Click Save
    • Open Putty
      • Configuration window opens
        • In Host Name (IP address) box – paste IPv4 Public IP (that you copied from earlier)
        • left panel – Connections – SSH – Auth
          • Options-for-controlling-SSH-authentication window opens
            • private-key-for-authentication box – click browse
              • select the .ppk file you made
        • Click Open (bottom right) to open connection
          •  Windows pop-up – click yes
    • Putty CLI opens
      • “login as” 
        • type username for the AMI you used
          • example was a basic linux AMI with username “EC2-user”
          • For RStudio AMI, should be “ubuntu”
      • Hit enter and should be connected
  • Terminate
    • left panel – instances – instances
      • select instance you want to terminate – right click instance_id (or anywhere on row) — instance state – terminate
  • Connect to a Windows Server AMI instance
    • Windows Search “Remote Desktop Connection”
      • open it
    • For “Computer:”, type in the IPv4 public ip – click connect
    • For base ami
      • username: administrator
      • password
        • left panel – instance – instance
          • right click instance row – Get Windows Password
            • click Choose FIle – select key pair file .pem file (not .ppk)
            • click Decrypt Password (bottom right)
            • Copy password, paste into Remote Desktop Connection
        • Click OK
    • a window with opens up with Windows OS on it (takes a minute or two to completely load)

Configure Load Balancer and Application Ports

  • Also see Docker, AWS >> Create Application Load Balancer (ALB)
  • left panel – Network and Security – Security Groups (also see docker notebook – aws – running a task)
  • Example: We want the Application Instances to only talk to the load balancer (inbound) and the load balancer talks to the public (inbound) and application instances (outbound) (Reminder all inbound ports are also outbound ports)
    • Create security group for EC2 Instance
      1. Create Security Group (blue button, upper left)
        • Enter name and description
        • click create
      2. Inbound tab (lower half)
        • click edit
          1. Type: HTTP (auto-inputs protocol TCP and port 80)
          2. Source: type name of Load Balancer security group (long box)
            • autocomplete will help
            • after clicking name, a group id is entered into the field
              • Weird, but id resembles the group id listed in the top half of screen but doesn’t exactly match. Should be correct though.
            • drop down should be “custom”
          3. click add rule
          4. Type: HTTPS
            • same thing but auto-inputs port 443
          5. Click Save
      3. outbound tab
        • By default all outbound traffic goes out on the inbound ports, but you can specify additional ports
        • click edit
          1. Type: all traffic (auto-inputs protocol all, port range 0-65535)
          2. Source: same as for inbound
    • Create security group for Load Balancer
      1. Create Security Group
        • Same procedure as above, different name
      2. Inbound tab
        • click edit
          1. Type: HTTP
            • keep Source as is
              • 0.0.0.0 means accept traffic from everywhere
          2. click add rule
          3. Type: HTTPS
            • same
          4. click Save
      3. Outbound tab
        • click edit
          1. Type: HTTP (auto-inputs protocol TCP and port 80)
          2. Source: type the application security group name (long box)
            • see load balancer section for other details
          3. click add rule
          4. Type: HTTP
            • same
          5. click Save

S3

  • Search S3
  • autoscales,, replicates your data to prevent total loss, ability to version data, max file size 5TB
  • charged for what you use (GB/mo)
  • 3 different classes
    • standard
      • highest availability (99.999%), durability (replicated across hosts multiple times)
      • most expensive of the classes
    • infrequently accessed
      • less durability (replicates), should be data that doesn’t end your world if lost
    • glacial
      • for archiving purposes
  • Security
    • IAM
      • give certain persons or departments permissions
    • S3 policies
      • make certain buckets public, private, etc.
  • Buckets
    • must have unique name across all AWS
      • assume this is taken care of by some auto-generated id
      • Can’t contain periods (dns-compliant)
    • region-specific
    • cross-region replication
      • copy objects from one bucket to another for a fee
  • Objects
    • files, artifacts, etc
    • key:value look-up
      • the keys are essentially just file paths
        • bucket_name/folder/file.txt
        • bucket_name/folder/*
          • gets everything
      • the values are the objects
    • You can directly download files from the console but not folders
      • select file, click on “more” button (top left), select “download as”, follow directions
    • To make publicly available through a web link you have to specify that policy (see below)
  •  Create a basic bucket
    1. Click Create Bucket (top left)
      • Opens Wizard
        1. Name and Region
          • Enter unique DNS compliant name (no periods)
          • Enter Region
          • Click next
        2. Set Properties
          • keep defaults
          • click next
        3. Set Permissions
          • keep defaults
          • click next
        4. Review
          • click create bucket
    2. Upload to bucket
      1. Upload via aws console
        1. Click bucket name
        2. click upload button (top left)
        3. Opens Wizard
          1. Select files
            • click add files or drag and drop folders into the window
            • click next
          2. Set Permissions
            • keep defaults
            • click next
          3. Set Properties
            • Storage class
              • standard, standard-ia (infrequently accessed), reduced redundancy
            • encryption
            • keep defaults
            • click next
          4. Review
            • click upload
      2. See below for uploading via CLI
  • Create Bucket Policy to allow for public download (from a weblink)
    1. click bucket name
    2. click Permissions tab (top mid)
    3. Bucket policy requires a json expression
      1. click policy generator (bottom left)
        • opens up new browser tab
      2. Select type of policy
        • S3 Bucket Policy
      3. effect (allow or deny)
        • select allow
      4. Principal
        • type “*” (asterisk) to give everyone access
      5. Actions
        • can select more than one action if you want
        • select “GetObject”
      6. Amazon Resource Name
        • The format of the required expression is given below the box
        • your substituting your bucket_name/key_name into the expression
          • for key name he used * to signify “everything”, but I think this would be your path to the resources (not including bucket_name)
            • e.g. folder1/folder2/file.csv or folder1/folder2/*
        • Click add statement
        • Click generate policy
        • copy json expression
    4. Go back to previous browser tab with the permissions tab and paste the expression into the window
    5. Click Save (mid right)
      • There’s also a DELETE button if you want to remove a policy from the bucket
    6. Should see a “public” tag on the permissions tab confirming the policy has been set
  • Download from bucket
    • Get download link for a file that has a public permission set
      • tick box of selected file
        • window with info about the file should open on the right side
      • copy download link in overview section, paste in browser or use programatically
    • Download via console
      • click file name – overview tab
        •  various downloading options
    • Programatically: see section Upload to bucket via CLI below (uses copy command)
  • Give bucket viewing (and downloading) access via IAM (need to have administrator permissions)
    1. search IAM
    2. left panel – Policies
    3. click create policy (top left)
    4. click choose service (mid)
      • type or select “S3”
    5. click select actions
      • Under Access Level
        • List
          • select all (4)
        • Read
          • View policy
            • GetBucketAcl, GetBucketCORS, GetBucketLocation, GetBucketLogging, GetBucketPolicy, GetBucketTagging, GetObjectAcl, ListBucketByTags, ListBucketVersions
          • Download policy (should be a separate policy from View)
            • GetObject
        • Resources
          • you have to select resources (e.g. buckets, objects (i.e. folders, files) that the actions above affect
            • View policy
              • choose all resources (which is buckets and objects)
            • Download policy
              • choose specific
              • click ARN
                • Enter the bucket name you want to give access to (or tick “any” box to give access to all buckets)
                • enter object name (or tick “any” box for access to all objects)
              • Click add
        • Click Review Policy (bottom right)
      • Enter a Name
        • eg ListAllBucketsObjsS3
      • Click Create Policy (bottom right)
      • Select or click newly created policy (should be back to left panel – policies console
        • new policy name should be in a clickable banner at top of screen
        • Or choose it from list of policies that’s displayed
      • Go to Attached entities tab
        • click attach
        • select users or groups you want the policy to apply to
        • click attach (bottom right)
  • View buckets you have permissions with: aws s3 ls or aws s3api list-buckets --output text
  • Upload to bucket via CLI
    • Also see above for uploading via aws console
    • aws s3 help, aws s3 <command> help
    • aws s3 cp <from> <to> <region> <profile> can copy files via:
      • bucket to bucket
      • local to bucket
      • bucket to local
    • local to bucket
      • aws s3 cp <C:\\Users\\path\\to\\file.csv> <s3://<bucket/path/to/folder/name.csv> --region <bucket? region> --profile <profile name>
      • see cli section above on how to create profile
      • refresh console if you already have it open to see the file
    • bucket to local
      • aws s3 cp <s3://<bucket/path/to/folder/name.csv> <C:\\Users\\path\\to\\file.csv> --region <bucket? region> --profile <profile name>
      • same thing as before, just reversing the <from> and <to> URIs
  • Versioning
    • If you enable versioning and then disable it, then the versioned objects will remain but new objects won’t be versioned
      • Would have to manually delete the versioned objects manually
    • bucket name – properties tab
      • click on “Versioning” card
      • tick Enable versioning
      • click save
    • To see the different file versions
      • bucket name – folder
      • click versions: show button (upper left)
    • Once versions are displayed you can download and delete files/versions in various menus
      • click file name – “latest version” drop down (next to file name, top left)
        • shows all versions with options to download or delete
      • click file name – overview tab
        • various download options
      • tick file name/version box – click more button (upper left) – select delete – click delete button