DiskĪWS has the best offering for local disk solutions of the two cloud vendors. This trend towards network attached storage is one of the scariest industry trends for big data in the cloud where there will probably be more growing pains before it is resolved. When network disk fails or has a multi-instance brownout, you’re just stuck and have to failover to another failure domain, which is usually in another availability zone or in some cases another region! We know this because this kind of failure has caused production outages for us before in AWS. When a local disk fails, the solution is to kill that instance and let the HA built into your application recover on a new VM. The downside is that when network attached disk has problems, you may see it on multiple instances at the same time, potentially taking them down completely (easy to manage) or inducing a brownout (hard to manage). The upside being the claimed (we’re big, but not big enough to have raw-disk failure statistics) failure rates for network attached is lower than for local disk. It is worth noting that the general trend in the industry seems to be to push users onto network attached storage instead of local storage. Being able to push data to and from disk and network is core to how data flows in a cloud environment. Input and output is one of the core bread-and-butter aspects of the cloud.
![sumologic timeslice get oldest sumologic timeslice get oldest](https://burhan.io/content/images/2017/06/deploy.png)
The workloads commonly used on our AWS instances are things which receive instructions for a chunk of local computations that need performed, then the aggregated results are sent back or shuffled to other nodes in the cluster. These compute resources in AWS use a combination of local disk and various EBS attached volume types depending on the SLO of the services using them. For AWS, we have a large pool of compute resources running at high cpu utilization which also dip into spot market resources as needed. Metamarkets runs all of our real-time components (for which we have our own throughput-based autoscaling ) on GCP. This combined with the per-minute pricing means GCP is a natural choice for things which scale up and down regularly relating to real-time data. The distributed load-balancer intake methodologies GCP employs also means clients often hop on the GCP network very close to their point of presence. During part of our initial investigations, node spin-up time on GCP was so fast that we found race conditions in our cluster management software. While the rest of this post centers on some of the higher level considerations of GCP and AWS, it is worth calling out the key use cases Metamarkets uses each cloud for.
In the interest of transparency, our primary operations in AWS are in us-east, which is the oldest AWS region and subject to a lot of cloud legacy both in users and (I suspect) internal hardware, and design. In these instances we believe the level of service Metamarkets has subscribed to is different between the two cloud providers. Some of the comparisons will be listed as unfair comparisons.
![sumologic timeslice get oldest sumologic timeslice get oldest](https://docs.unlaunch.io/assets/img/sumo/https.png)
This post will cover some of the pragmatic differences we have experienced between AWS and GCP as cloud providers as of 2017. After some looking around, we decided the Google Cloud Platform was potentially a very good fit to the way Metamarkets’ business and teams operate and the way some forces in the infrastructure industry are trending. It’s kind of like inflating a balloon in a porcupine farm, where you know it is a bad idea, but while you’re trying to figure out where to start inflating a new balloon, the prior one keeps on filling up with more air!Īs we investigated growth strategies outside of a single AZ, we realized a lot of the infrastructure changes we needed to make to accommodate multiple availability zones were the same changes we would need to make to accommodate multiple clouds.
![sumologic timeslice get oldest sumologic timeslice get oldest](https://docs.unlaunch.io/assets/img/sumo/stream.png)
As we grew, we started to see the side effects of being restricted to one AZ, then the side effects of being restricted to one region. And the majority of our footprint was in a single availability zone (AZ). We started and grew Metamarkets in AWS’s us-east region. At this scale, the ability to failover gracefully, to detect and eliminate brownouts, and to efficiently operate huge quantities of byte-banging machines is necessary.
#Sumologic timeslice get oldest torrent
The torrent of data that clients send to us surpasses a petabyte a week.