Bizcubed's Zachary Zeus and Neeraj Khurana organised last week's data warehousing meetup at Brisbane’s Jade Buddha.

Pentaho does the trick for big data

Events such as GovHack and Code for America have given open data a recent boost in popularity, but the technology behind data integration often goes ignored.

Within Brisbane’s robust meetup community is the solution to this divide, the relatively-small Data Warehousing Brisbane run by Bizcubed managing director and founder Zachary Zeus.

Zeus is hoping to build a better broader understanding of information architecture with the open source data integration and business analytics tool Pentaho.

“Most organisations don’t do a good job of managing their existing data that well,” Zeus says. “’Big data provides new ways and new techniques to address that gap, but it doesn’t solve it on its own, and that’s really why I like the Pentaho platform because there are no limits to how widely it can be used.

“We’ve focused particularly on Pentaho as an open tool to facilitate our work with standard databases, and a lot of the organisations that we talk to have a minimum of five different data stores, often times hundreds of different data stores, that need to be integrated in an interesting way,” he said. “They can have apps for specific purposes, such as a social monitoring app, a custmer app, an app to manage new channel apps.

“Each one of these apps are spitting data out their back, and they will have their own analaystics piece, but they won’t integrate with the rest of the data analytics framework.”

Outwardly passionate about enhancing a communal understanding of data productivity, Zeus started Data Wareshousing Brisbane roughly six months ago, and introduced Pentaho with Bizcubed’s arrival in Australia eight years ago. Last week’s meetup was held at Brisbane’s Jade Buddha with Bizcubed regional manager Neeraj Khurana, where both Pentaho and the open-source document database Mongo were explained to a small audience.

Bizcubed is a distributor of the enterprise version of Pentaho, but Zeus says they are also an advocate for the community version “because it means that more people can interact with data in an cost effective, scalable easy to use way.” He also stresses an understanding of ETL (extract-tranform-load) as necessary in the discussion of data analytics.

“If someone says, ‘I’ve got some data about my town and I want to put it on a map’, you do some sort of geo lookup, which is an ETL process,” Zeus says. “Then you’ve got to load it to a new data store, and a series of steps that need to be followed.”

“When a mayor or whoever asks for it, it sounds like a really simple thing,” he says. “‘Put this data on a map’ turns into a relatively involved ETL process or a process you need a developer to script, and script can get kind of brittle.”

About Chris Woods

Chris Woods (@tophermwoods) is the Tech Street Journal's Editor-in-Chief. He lives in Brisbane, has worked in places like Sydney and New York (State of), and will someday update his media-news blog.