This is a special post as it remarks my first contribution to a small open-source project and also to my ex-colleagues during my brief time at SICS.

This project is called Karamel. Karamel is a tool that simplifies big data deployments on a selection of cloud providers like AWS and Google Cloud Engine, in addition to baremetal cluster setups. It is quite easy to get started, you simply need to define your cluster in a file, submit it to the application and do one click!

With this tool you can easily have al big data cluster up and running in a few clicks which will have preinstalled hadoop , Apache Flink or even Apache Spark. In addition, it is the preferred tool for testing out a new hadoop distribution named Hops which try to address the challenges and issues you could face when you work with very large scale hadoop clusters (for example, how to achieve high availability of the namenode)

Deploying Clusters with Karamel

In the rest of this article, I will describe the process of creating a simple cluster running apache Flink in order to make a deployment using Karamel towards an Openstack environment. We will also go over how you can write a cluster file to be deployed using this tool and how you need to configure the Karamel in order to communicate with your cluster.

Before we start

In order to deploy our Hadoop cluster on Openstack, first we will need to get a copy of the latest version of Karamel, you can do it here. Here you will find guidelines to get started with other providers like AWS. You may download one of the available versions that have support for Openstack (version 0.2.0 onwards should do it) or download the source code and build the application with maven.

Working with Karamel

Getting started with Karamel is quite easy, before launching our cluster, we will need to define the configuration of a cluster. In a cluster definition file, we can identify 4 core elements to describe you cluster: Provider, Cookbooks, Attributes and Groups.

In order to understand the role of each core component, we will go through a simple example cluster, let’s imagine that I want to deploy a simple Hadoop cluster with Apache Flink running a namenode and 20 datanodes using an Openstack Infrastructure.

So how can I express this information in Karamel? In addition, if we want to deploy a wordcount job when the cluster is ready, how can we achieve this? How can Karamel allow us to express this business needs? This is how you would do it, as shown on the following file:

name: flinknova
nova:
  flavor: 3
  image: "99f4625e-f425-472d-8e21-7aa9c3db1c3e"

cookbooks:
  hadoop:
    github: "hopshadoop/apache-hadoop-chef"
    branch: "master"
  flink:
    github: "hopshadoop/flink-chef"
    branch: "master"

groups:
  namenodes:
    size: 1
    recipes:
        - hadoop::nn
        - flink::jobmanager
        - flink::wordcount
  datanodes:
    size: 20
    recipes:
        - hadoop::dn
        - flink::taskmanager
        

In this cluster, we can see 3 different code blocks gathering all the information we need for Karamel to satisfy our cluster needs.

Provider

The first segment of the file contains the provider specific information, for Openstack; we make use of the keyword nova in order to tell Karamel that this is an Openstack cluster and we are going to make use of the Openstack Nova Controller.

After the keyword nova, it follows key parameters that Openstack will make use to deplou the necessary resources to run our software on.

  • Flavor: This corresponds to a specific hardware configuration for the VM, this is similar to Amazon’s EC2 instance descriptors. Openstack refers to them as flavors and each one specifies a configuration (CPU, RAM) and are not the same for everyone, as these are configured by the needs of the organization. To simplify this, Karamel makes use of the flavor id attached to your VM configuration.

  • Image: This corresponds to the VM image you want to launch stored in your Openstack system. In this case, we make use of the generated ID when you store your image in your Openstack project.

Cookbooks

The following code block, gathers the software that we will want to install on our nodes. For this purpose, Karamel makes use of Chef Cookbooks that get will be processed on the nodes by running Chef Solo. To simplify the transfer of the Cookbooks, they should be accessible by the application through Git so it can clone them and execute the recipes on the nodes.

For this example, we want to install Apache Hadoop and Apache Flink so we indicate this Karamel by the keyword Cookbook and specifying the repositories where our Cookbooks are stored plus the branch we want to checkout.

Groups

Under the groups sections, we define the cluster structure around groups of components and services. Here, we specify the number of nodes and the recipes that will install the services to run on those nodes.

Summary

In this, blog post we went over roughly how Karamel now has support to deploy clusters in Openstack based infrastructures. This gives you the opportunity to play with hadoop, flink or spark directly in your private cloud reducing the time you need to configure a whole cluster.

Recently, I had the opportunity to get my hands dirty playing with docker containers in my workplace with the purpose of containerazing our applications and increase our velocity when deploying running software into our hardware. I have been focusing on how to simplify handling a distributed applications deployment where different services (Frontend services, Backend Services and Data services) need to interact in isolated and repeatable environments.

So far the experience has been quite promising on how easy it is to create a container, if we compare it to building for example a simple AMI (Amazon Image) in Amazon Web Services where it involves running Amazon EC2 tools in the Virtual Machine we want to create the image from.

In this post, I will go over the process of working with docker in your local machine.

What is Docker?

Docker is a virtualization technology that focuses on the simple premise of allowing developers and devops to package their applications and dependencies in a medium which allows easier portability among different machines & environments while maintaining a high level of flexibility.

We can tend to think, that docker containers are like normal virtual machines, yet in reality they are more light weight compared to normal vms, as it fine tunes the resources available in the virtualized instance by removing guest OS dependencies and the need of using of a hypervisor, which in involves in an extra overhead to deal with.

So containers deployed in the same machine, tend to share external resources provided by the host OS where docker is running like the kernels OS, filesystem and disk usage.

One thing that I believe, docker is doing great is to focus on isolation and keep the behaviour identical of your application along multiple phases environments.

Once you create your docker image and deploy it in a container, it will behave the same in your testing, staging and production environments like virtual machines, yet allowing faster deployments. For example, if we take a full ubuntu VM; it size will be more than 650MBs at least with everything configured and our application embedded in it. Taking the docker case, its footprint will be around the same size as if we wanted to deploy the builded package itself.

In addition, it has its own version control of the docker images, allowing incremental updates of its content (similarly to a version control like git); which help to keep track of the versions of your applications.

Getting started with docker

It is quite simple to get started with docker and begin creating containers, if you use Intellij Idea you can use a quite nice plugin.

You can start creating docker images easily by bundling it up with a dockerfile. This file acts like a script which docker follows and will generate the image for your docker container.

We will see that in a following post, where I will try to go over how docker works with a working example.

Welcome to My New Page!

I was always quite keen in finding something more personalizable and that could support better my needs as a software engineer and tech interests. I had an initial try with my old site in wordpress where I tried to create something personal, but still I never got satisfied with how wordpress handled things.

Luckily, been an active user of github, I was surprised with their support for personal sites with github pages. In that sense, I was curious to what it could offer and I already saw that you can have some nice things going on with jekyll and ruby.

So this is my second try to create a personal site, hopefully thanks to this nice theme for jekyll; I will be able to satisfy all my needs on customization and giving the site a personal touch I always wanted. Of course, I am not a great frontend guru (I have deeper background in backend/distributed systems) but I really like how this jekyll theme works.

Hopefully, this second try on my personal site, will be more rewarding than the first one I made. I will try to migrate some material from back there.

Github pages HPSTR theme from MadeMistakes (Mademistakes)
Background images from Subtle Patterns (Subtle Patterns) / CC BY-SA 3.0