An attempt to explain how open source tools for automatization are used for minimizing costs and maximizing control over infrastructure in the cloud.
Automatization or Infrastracture-as-Code (IaC) is the idea where all the infrastructure is written in scripts and the scripts are executed when needed. In the “old days” (and some vital parts of organization’s solutions) the infrastructure represented physical servers in the basement with software installed and maintained by the in-house engineers with the help of vendor’s consultants. With the Infrastructure-as-Code the only thing maintained are the scripts while the basement is housing the table-tennis table. The scripts are maintained by data engineers (so called DevOps engineers) and broader audience can now build, maintain and destroy the infrastructure. It does help that the cloud vendors have simplified the services that were once the domain of the network engineers, for example.
The tendency in the areas of data storage and data processing (or everywhere in the IT fields) is to move to a cloud. A private cloud, a public cloud or a hybrid. Those are the options. Moving everything to a public cloud (big three: AWS, Google Cloud Platform or Azure) will make you a smart consumer of those services moneywise. Your goal is to pay-as-you-go, meaning run your applications when needed on the infrastructure you need and destroy the infrastructure when results are saved.
“Pay-as-you-go in cloud”
For succeeding in pay-as-you-go concept, two things have emerged on the market:
- cheap object storage (S3 on AWS, Blob Storage on Azure and Cloud Storage on Google Cloud Platform).
- tools for Infrastructure-as-Code (IaC)
Cheap object storage is exactly that: low cost storage of files in all form, shapes and types. This allows to store data cheap and build infrastructure for processing when needed. This follows the idea of dividing storage and processing.
“Division of storage and processing resources”
Days of having Hadoop just to have Hadoop are over and a company needs a good reason to justify having and maintaining a Hadoop cluster. The division of storage and processing works if the infrastructure is dynamic, rather, if the infrastructure-as-code can fulfil user’s needs. The responsibility falls on the DevOp engineers and the tools.
There is no doubt that the tools are there, plenty to choose from already from the open source community. Since I am following the philosophy where companies pay less for licence and more for knowledge I focus on open source technologies in cloud.
“Organizations will pay less for licenses and more for knowledge”
Infrastructure-as-code should offer a robust and general solution where the infrastructure is configured through input parameters. With other words, users define the input parameters, run the code and get the customized solution. This is what I attempt to demonstrate in a few of my GitHub repositories. I will come to this in my later posts.
Choosing the tools to do the job is not simple. As it is not simple to pick the most suitable cloud distributor. Here in Norway, Azure is the most popular cloud solution, in my opinion, not because of quality but because of the market position and good sales people at Microsoft.
Myself, I have experience mostly with AWS (a reader might observe that I write Amazon Web Services as AWS while Google Cloud Platform is not GCP) and some with OpenStack and VMWare. Choosing a cloud vendor is not as problematic as it is choosing the architecture in your cloud. Using services provided by the cloud vendor results in a possible risk to be locked to one technology or vendor. Migration to another, similar, solution might be costly. And this should be an option always when working with new technologies where there are uncertainties if the proposed architecture will deliver.
“Locking yourself to one distributor can be risky”
The technology stack I use in my examples is the following:
Cloud vendor: AWS
Cloud Vendor’s services: S3 (object storage), VPC (virtual private cloud – mandatory for launching instances in AWS), EC2 (instances in the cloud – Linux servers)
Object storage S3 for storing data is separated from the processing resources (made up from EC2 instances) which are in the mandatory VPC. Any other storage can be used if it has connectors, as well as S3 can be accessed externally.
Infrastructure as Code tools: Terraform (automatization of services in the cloud), Consul (configuration of infrastructure to be created) and Ansible (software installation and administration of instances built in the cloud)
Symbiosis between the IaC tools: user stores configuration of desired infrastructure to Consul, Terraform reads the configuration at provisioning, saves new parameters back to Consul and at the same time executes the Ansible scripts which install and setup the software for the desired solution.
Work environment: Docker for Windows (container with Linux environment on local machine), PowerShell (for Docker creation and development and test of scripts)
Version control: GitHub and GitHub Desktop (for pulling and pushing to the repositories)
Repository with files for Docker container creation is cloned from GitHub using GitHub Desktop. PowerShell is used to create the Docker image and start the Docker container. This Docker container represents the entry point to the Infrastructure-as-Code development and testing.
IDEs for coding: Atom (for Terraform, Ansible and Consul configuration), PyCharm and Jupyter (for Python scripts) and Intellij IDEA (for Scala scripts)
Next post goes in depth on the DevOps environment.