Apache Spark is defined as “a fast and general engine for large-scale data processing.” The open source big data project helmed by IBM started in 2009. Over the years hundreds of contributions came from researchers and developers in the field of big data. Such contributions have culminated in the release of the latest version that has fundamentally changed the way organizations integrate big data into their processes. The global technology company intends on remaining the incumbent leader in open source innovation through Apache Spark.

According to IBM’s press release, the company is aiming to make one of the decade’s most important investments in big data software. Originally developed at the University of California, Berkeley, IBM is dedicating more than 3,500 researchers and developers and hundreds of millions of dollars in funding and resources. The Apache Spark project looks to push data analytics beyond the standards of current technology.

Rob Thomas, the vice president of product development for IBM analytics told TechCrunch,

“I like to think of Spark as the analytics operating system.”

Spark essentially allows for analyzing mass data in unprecedented speeds. Spark not only improves the operations of data dependent applications but also helps in its development by streamlining complex tools.

The initiative appeals to virtually every business, from energy services to real-time transportation to health care. As a result, IBM could have a tremendous impact on the big data vendor industry.

“Our belief is anyone using data in the future is going to be leveraging Spark. It allows universal access to data,” Thomas explained.

The project is obviously in its early stages, and before everyone is using Spark, there is much education and implementation required on the part of IBM. As part of its strategy moving forward, the company laid out six vital actions, one of which is the aforementioned commitment to putting thousands of developers and researchers on Spark related projects.

In addition to growing the team, IBM will open source the machine learning technology, SystemML, incorporate Spark into IBM’s commerce and data analytics platforms, and offer Spark as a cloud service on the programming platform called Bluemix.

IBM really wants to accelerate the growth of the Spark community by implementing partnerships that will provide education to more than 1 million data scientists and engineers. Moreover, a Spark Technology Center is planned to open in San Francisco to encourage further innovations.

Spark is already applauded by some of IBM’s clients. An example of leveraging business applications through Spark includes NASA and the SETI Institute. Both organizations are collaborating in search of extraterrestrial life in infinite space. With enormous amounts of radio signals, Spark’s machine learning capabilities can decipher patterns in the data that might reveal the presence of other intelligent life forms.

Spark technology might “really deliver on the promise of big data,” said Robert Picciano, the senior vice president for IBM’s data analytics business.


Photo Credit: Via Flickr/Kansir