IBM Systems Magazine, Mainframe - November/December 2016 - L22
TRENDS IDEAL PLATFORM Co-locating analytics on Linux on z Systems adds value L inux* on z Systems* is well regarded as an ideal consolidation platform for database workloads. Databases traditionally have been hosted on bare metal servers, and with proliferation of commodity x86 servers, it's not uncommon to find hundreds, if not thousands, of distributed servers running database workloads in a data center. This has resulted in complexity with explosive software license charges and significant operational costs for administration, floor space and energy. Avijit Chatterjee is an analytics evangelist in the IBM Competitive Projects Office. Some companies have addressed their server proliferation challenges by sweeping their data center floor of distributed servers and consolidating them on Linux on z Systems. Laboratory studies conducted by the IBM Competitive Project Office have shown that this exercise often cuts total cost of ownership in half because of tremendous software license savings achieved from drastic reduction in cores. While these databases could be running either online transaction processing (OLTP) or analytical workloads, more often than not, they are systems of record (SoRs) running OLTP; the analytical systems are special-purpose appliances in most customer shops. This article presents two studies that demonstrate tremendous value in co-locating the analytics workload with the systems of record on Linux on z Systems. An Important Role Analytics drives competitive differentiation for many companies. Therefore, it is hot on CEOs' radar. As a result, new chief data officer and chief analytics officer positions have been added in the C-suite of most companies. Digital business is all about speed, which drives the need for real-time analytics directly on top of the OLTP data. This type of analytics is also called operational analytics. There's also deep analytics that consists of queries run against historical and unified data from an enterprise stored in data warehouses and marts. Operational analytics consist of short running queries, whereas deep analytics are long running, for they often involve a full table scan of extremely large fact tables consisting of billions of rows of measures that are then sliced and diced using a dimension table consisting of attributes such as product, demographics, time, etc. We conducted two studies using best-of-breed in-memory technologies Spark and DB2* BLU for the operational and deep analytics scenarios, respectively. The first study involves an operational analytics scenario, where a supervisor of a brokerage firm wants to examine the performance of brokers in terms of total trade amount handled in real time. The brokerage OLTP database in this case is hosted on Linux on z Systems. Spark SQL was used to run the aggregate query to measure the performance of the brokers. The database is over 100 GB in size, and the trade table across which the aggregate query is computed has over 360 million rows of trading transactions. We ran an apples-to-apples test running Spark on the same number of cores on z Systems as an x86 server and obtained up to 3x higher throughput co-locating Spark with the OLTP database on the same platform. Spark provides a universal OS for endto-end analytics and, though we tested with SparkSQL, we expect similar throughput benefits for co-location for other Spark modules, such as MLib, GraphX and Streaming. The second study is for deep analytics on a 2.5 TB data mart, containing data from a retail store with multiple channels (in 22 // NOVEMBER 2016 Linux on z Systems Linux_on_z_1116.indd 22 9/29/16 1:15 PM
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.