Anna Frazzetto's Blog

Digital Innovations and Technology Solutions

Harvey Nash USA Webinar:The Big Opportunities of Big Data

It's 2011-Do You Know What All Your Machine Data Is Worth?


Last week, I had the pleasure of welcoming Doug Harr, CIO of Splunk, to the Harvey Nash webinar stage. In a presentation called, "The Big Opportunities of Big Data," Harr started off his talk by continuing a discussion that recurred at several of our Harvey Nash CIO Survey forum events this fall.

All year long, our CIO Survey forums have been addressing the importance of innovation. The Harvey Nash CIO Survey took a deep look at innovation and what it takes to be an innovation CIO who not only contributes to the ground-breaking work that their companies do, but leads and shapes it. The question for so many IT leaders at our forum events was this: How do I become a bigger force for innovation within my business? Doug Harr gave a compelling answer in his presentation.

Pointing out that the most important letter in CIO is the I-representing intelligence and innovation-Doug introduced our audience to big data and the growing innovation opportunity it represents to businesses of all sizes and their senior IT practitioners. One of his most elegant points was to remind CIOs that to be successfully innovative they must break out of the back office and focus on the customer-facing world.

Big data (also called machine data) he explained offers CIOs a unique pathway to this customer facing world. Every day businesses produce awesome amounts of machine data and the vast majority of it is unstructured. That means it can't be measured, analyzed or used to improve the business.unless today's CIOs decide to harness this powerful business intelligence resource. Doug explained how businesses can better understand customer activities and behavior, identify patterns of usage and revolutionize their transaction visibility simply by tapping into their ever-growing big data resources.

Based on the questions we heard at the end of the webinar, I know that many attendees were both excited and overwhelmed by the opportunity of big data. And that's what any innovation opportunity should be, challenging enough to test you and valuable enough to reel you in.

I want to thank Doug for taking the time to share his insights and experience with our webinar audience and being a part of the Harvey Nash learning series and CIO Survey forums. For those of you who would like to learn more about big data, I invite you to review Doug's presentation below.

The Big Opportunities of Big Data: It's Unstructured, It's Unwieldy, It's an Unreal Business Opportunity



Harvey Nash October Webinar Q&A Transcript

On October 27, 2011, Harvey Nash's SVP of Technology Solutions, Anna Frazzetto hosted a webinar for IT professionals including senior leaders of IT. Anna was joined by Doug Harr, CIO of Splunk, to discuss big data - what it is and how to use it to your advantage. The presentation concluded with a short Q&A session; the following is a transcript of the session.

Q: Is there a size organization that this [Splunk] is best suited for?

DOUG: I do not think that it is related to the size of the organization. We have had customers that have just downloaded and used a free product, and we have people who are paying for terabytes of indexing a day. I think you can make use of this, no matter what the size of your business is.

Q: Are there any examples of unstructured workforce data that drive operational excellence?

DOUG: I know that as you look at what can be done with this information, you can take information that employees are obviously generating in your systems, and gain insight into how they are using them. You might have a professional services automation system, maybe that employees are getting in and using that system, you might have a helpdesk solution. We are looking at feeding data from ServiceNow (http://www.service-now.com/), which we are working on implementing as sort of our help desk tool and looking to feed that information into Splunk. The idea being that if there are any dashboards or reports that are not being provided by that particular solution that we might be able to add to that. One example is marrying up the helpdesk phone line information, like who is calling, and how long it is taking, with how long a ticket is taking to close out.

Q: You did mention defining the problem of unstructured data and then you presented some examples where companies were able to structure the data into dashboards; the piece that I don't understand is how you were able to derive those answers out of this bucket of unstructured data, and the process to get there? I am assuming it takes iterations, and maybe a lot of people.

DOUG: Yes, there are iterations, though not necessarily quite a lot of people needed. In my own implementations of Splunk, we have system admins who are there to help when we are looking to forward some additional log data, events or machine data, into Splunk, that's sort of one step. The next step, what Splunk does is it indexes the information. It lines it all up by date and time, across different functions, and it gives you a way to then start searching through and playing with the information. Then you can look for key values and things, and as soon as you find one, and define it, it is available to you from that point on to be used in named queries and searches and reports. You do refine the data over time; you can also immediately look for things for sort of search and discovery. So you start with this unstructured or semi-structured information, and you enrich it as you go through the process, and discover what you want to do with it.

Q: Can you compare your solution more closely with Hadoop?

DOUG: Splunk is fundamentally different from Hadoop in many ways. Hadoop is used by developers to store big data streams from their Apache web code for later use by custom-coded routines for that website or other related purposes. Splunk can now also ingest and do analytics on Hadoop data, or use Hadoop as a cold tier of long-term storage for our big data streams, although it is more of a tool for analytics on top of machine data.

We do have customers who are using both. Actually, just this week, Splunk announced the planned availability of a new software package called Splunk Enterprise™ with Hadoop. This new offering will include Splunk Enterprise™, the Splunk Hadoop integration layer and Apache™ Hadoop™.

The Splunk Hadoop integration layer will provide more than just point-to-point connectivity, with support currently planned for the following operations:
• Issuing MapReduce queries or higher-level queries from the Splunk search language (using Pig or Hive for example), or to pull the resulting data sets back into Splunk
• Indexing the output of Hadoop jobs in Splunk
• Indexing data storied in HDFS in Splunk
• Delivering data from Splunk to HDFS
• Calling Splunk APIs directly from Hadoop jobs


Q: How do you tie unstructured taxonomy to structured taxonomies for big data BI?

DOUG: Unstructured data sent into Splunk can be examined to determine the name/location of meaningful fields, whether they are actual data such as a product line="Garments" or a product line ID such as "12345." With Splunk, you can then call out to a database or other data source to pull in lookup values. So, for instance, you could look up the product line of "Garments" in Siebel, matching to id=12345. In which case, you would've taken an "unstructured" log file entry, and pulled in data from a structured source to give you activity by human readable categories (i.e. - how many people dumped their shopping cart of garments on 11/11/11 at noon?).


Q: How do you analyze market research data in local PCs and on the data center/servers in an organization to provide instant insights and analysis?

DOUG: We can Splunk data from PC servers and the like and correlate them to the same Splunk instance. Although it depends on what form the market research data is in, most data can be forwarded into Splunk for later reporting. We can pull in CSV files for instance and treat them like "log entries" or "events" for use in Splunk. Or, for example, we can put these into lookup tables, taking CDRs and looking up rates for a mobile phone company.