Design end to end application stack provisioning solution (architecture interview):
- Hardware v Cloud differences (one solution to control them all)
- OS provisioning
- Golden image VS Bootstrap V other patterns
- Bare Metal implications
- Cloud implications
- Cohesive application stack deployment
- Configuration management
- Custom Code application provisioning
- Must be complete stack
- Cannot be store bought only solution (no blade logic)
- Address diff challenges in solution tool chain
End to End Environment Provisioner
This tool will need to accomplish a couple of key objectives:
- Should be easy to use (simple flow to provision resources and deploy code)
- Should be a hybrid cloud, a mix of data centers and cloud providers (AWS)
- Should not get in the way of engineers trying to ship code
The Provisioner stack I’m proposing will mix open source tools and custom code. It’s a web-based application with an asynchronous job queue. It will leverage Packer, Chef, Fog (a generic cloud provider interface), and SSHKit (a remote ssh tool).
Before I launch into the details of what various parts of the application do, I want to provide a high level overview of parts that need to happen to allow us to provision a full custom application.
First, we need to choose a standard operating system. For this exercise, I’m going to choose Ubuntu 14.04.1 (64 bit). Automation is all about standardization, and having one OS reduces variation in securing and managing servers at the OS level.
A web application requires a number of different resource types (application server, load balancer, Postgres, Redis, Memcached, Elastic Search, etc.). Other than the application servers, which need to be built to run the application they are intended for, these resource types should be built as as generic instances. Even the application server itself should be built as generic as possible. At my current company, we have Ruby applications using Ruby 1.8.7, 1.9.3, and 2.1.3 and Java applications using Java 6 and 7. We’ve standardized our applications so that all Java servers run Tomcat, and all Ruby servers run Passenger.
Golden images can cause a great deal of trouble if they are not properly managed. If they are properly built and managed, they can provide a lightning fast means of provisioning and/or scaling. I’m going to use golden images, but tightly control how they are built, tested, and managed. To build images, I’m going to use Chef (Chef is easily interchangeable with Ansible, Puppet, Salt, Bash, etc.). A base cookbook should be used for installing and configuring libraries present across all servers. Application specific cookbooks should be used to configuring the core server application (Postgres/Redis/Tomcat/etc.), and resource specific cookbooks should be used for applying the unique set of application configuration specific to Backcountry’s needs. These cookbooks should be developed locally using Vagrant. Server Spec tests need to be written to insure the end result works as desired. The development of cookbooks should mirror the process of application development with tests passing through a CI server before changes are merged into the master branch. Packer will be used to build images. Packer will allows us to simultaneously build, tag, and push images for AWS, Open Stack, and Virtualbox. Server Spec tests are run before images are finalized. Failing tests mean images are not created, insuring that only valid golden images are available for future provisioning.
The Provisioner application acts as an information broker and manager. It’s responsible for starting and stopping, provisioning and destroying, configuring and managing applications and environments. It keeps track of what servers are currently running and has SSH access to those machines.
In the context of a provisioner, an application consists of a number of different resource types (ex. load balancer, java 7 application server, Postgres database), and environment specific settings (staging/production/experimental/etc.). Environments specific settings include information like application domain, database connection credentials, external service API keys, and resource counts (two load balancers and 10 application servers in production vs one load balancer and one application server in staging), etc. Separation of credentials by environment also allows tighter control of access to environment credentials is desired.
After the Provisioner application provisions a server and it has come online, the resource application cookbook is run again, setting the configuration relevant to that resource for that application in the desired environment. For a load balancer, this might include adding the available backend nodes. For an application server, this might include setting the environment variables to configure that application. Configuration needs to happen in a particular order. Databases need to come up and be configured prior to application servers, which in turn need to be configured before being added to load balancers.
An application cannot be considered complete without its code. I like to separate code deploys from server provisioning. Code changes regularly, while the context that code runs in generally does not. When code requires context change (upgrade from Java 6 to 7), new resources should be provisioned and moved through and tested in each of the environments. These major changes are not all that common in comparison to shipping code changes. I believe code should be shipped as a package, with all dependencies, if possible. This is easily done in Java applications with artifacts. For scripting languages, resetting the code against a particular Git tag and then running a dependency resolution tool (like Bundler for Ruby applications) is an alternative. The final step of provisioning the resources for an application is to deploy that applications code to the application servers.
Now that I’ve provided an overview of functionality, I’ll dive into the implementation. The functionality will be wrapped up in a web application. Given my comfort with Ruby, it will be a Rails app (although it could easily be written in a variety of languages). The site will have users and groups to restrict access. All actions in the system are logged providing a history of who, when, and what. The bulk of the heavy lifting will take place in jobs. I will use Postgres for the RDB, with column level encryption for secure credentials and other private information. Redis will be used to back the job queue. All configuration data stored in Postgres will be versioned (image json and environment settings).
Packer templates will be part of the project, and be as generic as possible. Creating a new image means grabbing the JSON and Packer template for that particular image and shelling out to Packer to generate the image. Successful Images will be added to the database and be selectable when provisioning resources. The image will also be tagged when it’s pushed remotely to Open Stack and AWS.
Creating a new application involves naming the application and selected the desired resources. Creating a new application will also create a default environment (staging?). Required information (ex. resource counts) will be added to the environment configuration. The job to provision an application will launch the server(s) based on the image for that resource (using Fog). Once all servers are present and registered, a second job will be run to configure and deploy code. This job will leverage SSHKit. The resource cookbook will be installed locally on each resource, and Chef run again with the configuration information from the Provisioner application specific to that particular application and environment. Once the application is configured, the application code will be deployed and the application will be available for use.
The application will be deployable through the Provisioner application or through a RESTful API. Exposing deployment functionality through an API allows CI servers to deploy changes when tests pass. At my current company, our CI server deploys straight to backstage, but staging and production release are handled manually.
I’ve glossed over how this will work in managed data centers. It’s no small part, but Open Stack should be rolled out prior to work being done on the Provisioner. Open Stack provides a generic access to compute, storage and network, and allows the data center to be API driven as AWS is.
In addition, all AWS deployed applications should be deployed into VPC. If they require access to resources in the data center, secure tunnels can be created between the two networks. Particularly for applications running in AWS, VPC ingress should be restricted by location (ex. port 22 accessible from the offices, and from Provisioner, internal service traffic over SSL, and 443 open only to those applications requiring access). The Provisioner server itself should be accessible only on internal networks.
Some challenges to implement this include implementing Open Stack in the existing data centers. This is a strategic investment, and should provide better utilization of existing hardware for the company as a whole in the future. Open Stack largely removes the local vs cloud discrepancies. One of the biggest challenges to implementing a new system is the simple fact that it’s not the way people currently work. Automation requires removing unique snowflakes. There will need to be a lot of discussion around what is installed on these servers and how, to ensure they can be reliably replicated. Additional tooling and process may need to be implemented to help teams move from their current state to this proposed state.
In summary, this should be a web based application, heavily utilizing a job queue. This queue will be driver golden image creation through Chef, Packer, and Server Spec. Provisioning and scaling will leverage Fog to provision instances on either Open Stack or AWs (or both). SSH Kit will be leveraged with Chef to finalize configuration and wire up resources within an application. Code and configuration will be deployed through SSK Kit and Chef. The end result is a system that allows for the rapid provisioning and deployment of applications across a hybrid cloud environment.