Monday, February 25, 2013

Microsoft Completes Journey To Big Data Through Hadoop

There's no beating around this bush: today Hortonworks announced a new beta version of its Hadoop Data Platform that will run on Microsoft Windows Server, a move that shows Microsoft's own big data efforts will forever be connected to open source innovation.


This is a highly significant (and even expected) move in the big data sector, even as this is a very strange intro to write. Hortonworks is one of the big Hadoop vendors in the market, though more in terms of innovation than sales, where Cloudera is currently regarded as the leader. Hortonworks' founder and architect Arun Murthy is one of the original Hadoop coders that came out of Yahoo back in the day, and he also serves as the VP of the open source Apache Hadoop project at the Apache Software Foundation.


Which all means that any major platform move like this is sure to impact the rest of Hadoop development and, by extension, the rapidly growing Hadoop ecosystem that's driving much of the big data sector.


Why Windows?


Until today's announcement, Hadoop of any flavor was typically to be found on a Linux-based machine (physical or virtual). This made a lot of sense, since one of the big advantages of Hadoop is the capability to expand its data warehousing out on any number of clustered computers. When the operating system of those clustered machines is Linux, growth is frictionless in terms of licensing and configuration.


But when the underlying operating system is Windows Server, then wouldn't the licensing of Windows create much more friction when trying to build a Hadoop cluster? Or, to put it more frankly, wouldn't using Windows Server as the OS for a Hadoop system be too expensive?


David McJannet, VP of Marketing at Hortonworks, doesn't seem to think so. From Hortonworks' perspective, there were just too many Windows-based shops out there that were shying away from using Hadoop because they didn't want to mess around with heterogeneous resource management that would be part of the package of deploying a Linux-based Hadoop solution.


Infrastructure management is a big component of the reasoning McJannet gave to explain Microsoft's work with Hortonworks over the past 18 months. Numbers were also a big part of reason for the new version; McJannet cited that a "majority of servers" were running Windows in the enterprise now.


The company's press release backs that up: "According to IDC, Windows Server owned 73 percent of the market in 2012 (IDC, Worldwide and Regional Server 2012–2016 Forecast, Doc # 234339, May 2012)."


It is not clear just what server class this 73 percent represents, since the report itself costs $4,500, and it thus a little hard to access. File servers? Application servers? It's sure not web servers, where according to Web analytics from Netcraft, Microsoft currently has 16.93% of the marketshare, dwarfed by Apache's 55.26% marketshare.


McJannet also cited ease of data exploration as another reason for Hadoop on Windows. Using SQL-based queries that can now directly integrate with the Hadoop Distributed File System (HDFS), products like SQL Server and Excel can tap straight into Hadoop-stored data, enabling end-users to more easily navigate through the lakes of data Hadoop contains.


Embracing Open Source


This is not the company's first foray into Windows land. Late last year, Hortonworks released the Windows Azure HDInsight product - essentially Hadoop for the Azure cloud platform.


As odd as it may seem to see Hadoop on Windows Server, the move makes a lot of sense from the Microsoft side of the arrangement. The company needed a big-data entry ever since it decided to drop its own Dryad data warehousing framework back in 2011. The expectation a year ago, when Microsoft announced it would build in tools within SQL Server to connect to Hadoop, was that this day would eventually come.


McJannet emphasized that to date, Microsoft was playing well with others within the open source development model that Hadoop uses, so much of this innovation will be dropped back to the rest of the Hadoop community.


Expect, then, to see more Hadoop vendors to announce their own connections to Windows in the near future.


Image courtesy of Shutterstock.



No comments:

Post a Comment