New AWS Instances Chew Through Very Large Data Sets

The new family of cloud instances is powerful enough to handle Massively Parallel Processing (MPP) data warehouse, log processing, and MapReduce jobs

Data Center Knowledge

March 31, 2015

2 Min Read
DataCenterKnowledge logo in a gray background | DataCenterKnowledge

Amazon Web Services introduced new dense-storage instances for its EC2 cloud meant for processing multi-terabyte data sets.

The new D2 instances provide additional compute power and memory compared to HS1 instances as well as ability to sustain high rates of sequential disk I/O for access to extremely large data sets (or, if you’re reading this 10 years from now, small data sets). People are getting comfortable with storing and processing larger amounts of data in the cloud, so instance sizes are growing in tow to handle more heavy-duty jobs.

The instances are based on Intel's Haswell processors running at base clock frequency of 2.4 GHz. Each virtual CPU (vCPU) is a hardware hyperthread on an Intel Xeon E5-2676 v3 chip.

The largest of the new instances are capable of providing up to 3,500 MB/second read and 3,100 MB/second write performance with Linux.

New D2 instances are meant for very large data sets. Pricing is based on US-East and US-West AWS regions (CLICK TO ENLARGE).

The largest instance also comes with bonus features of NUMA support and CPU power management. NUMA (Non-Uniform Memory Access) allows specifying an affinity between an application and a processor that will result in use of memory that is “closer” to the processor and therefore more rapidly accessed.

It’s possible to launch multiple D2 instances in a placement group (logical grouping of instances in single availability zone, meant for applications in need of low network latency, high network throughput or both).

The D2 instances provide the best disk performance when you use a Linux kernel that supports Persistent Grants – an extension to the Xen block ring protocol that significantly improves disk throughput and scalability.

Storage on D2 is local, so it’s advised to build redundancy in storage architecture and use a fault-tolerant file system. Each instance is EBS-optimized by default. "EBS" stands for "Elastic Block Storage."

Enhanced networking is available on D2, joining availability on C3, C4, and I2 families. Enabling enhanced networking results in higher performance (packets per second), lower latency, and lower jitter.

“With Enhanced Networking and extremely high sequential high I/O rates, these instances will chew through your Massively Parallel Processing (MPP) data warehouse, log processing, and MapReduce jobs," AWS chief evangelist Jeff Barr wrote in a blog post. "They will also make great hosts for your network file systems and data warehouses.”

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like