Tuesday, December 29, 2009

2009 Retrospective

I was thinking about top data mining trends in 2009, and searched for what others thought about it. I'll combine a few 2009 "top 3" lists here, including top trends (as described at Enterprise Regulars here), and posts here that generated the most buzz.

First, the top data mining news story was IBM's purchase of SPSS. It will be very interesting to see if this continues the trend toward integration of Business Intelligence and Predictive Analytics that one sees with SAS, Tibco and now IBM/SPSS.

The Enterprise Regulars post included a few interesting 2010 trends (but since data mining is all about using historical data to make predictions of future behavior, assuming past behavior will continue). In particular, there are 4 mentioned that were of interest to me:
  1. The holy grail of the predictive, real-time enterprise (his #2)
  2. SaaS / Cloud BI Tools will steal significant revenue from on-premise vendors but also fight for limited oxygen amongst themselves. (his #5)
  3. Advanced Visualization will continue to increase in depth and relevance to broader audiences. (his #7)
  4. Open Source offerings will continue to make in-roads against on-premise offerings. (his #8)
I agree with his #2 and #7 (integration of BI/PA and visualization). Several customers I work with are trying to integrate predictive analytics into the database to make better decisions. The difference now is that there is also interest in integrating this process with other data-centric (BI) operations to provide the right information to decision-makers with the right level of granularity (detail). This is typically a combination of creating the ability to perform ad hoc queries along with examining the results (rankings and projections) from predictive analytics.

However,I have not seen Cloud computing and Open source take off from the perspective of customers I work with. The latter two certainly have generated buzz, and in the courses I teach, there is considerable interest in open source computing (R in particular), but it has still be interest rather than action. I expect though that as the allure of data mining and predictive analytics extends its reach deeper into organizations, the need for inexpensive tools (in dollars) will result in increased use of the open source and free tools, such as R, RapidMiner, Weka, Tanagra, Orange, Knime, and others. Lastly, from this blog, the top posts of 2009 were
  1. Why normalization matters with K-Means
  2. How many software packages are too much?
  3. Data Mining: Does it get any better than this?
  4. Text Mining and Regular Expressions

Happy New Year!

Tuesday, December 15, 2009

Overlap in the Business Intelligence / Predictive Analytics Space

I've received considerable feedback on the post Business Intelligence vs. Business Analytics, which has also caused me to think more about the BI space and its overlap with data mining (DM) / predictive analytics (PA) / business analytics (BA). One place to look for this, of course, is with Gartner, how they define Business Intelligence, and which vendors overlap between these industries. (I think of this in much same way as I do DM; I look to data miners to define themselves and what they do rather than to other industries and how they define data mining).

I found the Gartner Magic Quadrant for Business Intelligence in 2009 here, and was very curious to understand (1) how they define BI, and which BI players are also big players in the data mining space. Answering the first question, data analysis in the BI world is defined here as comprising four parts: OLAP, visualization, scorecards, and data mining. So DM in this view is a subset of BI.

Second, the key players in the quadrant interestingly contains only a few vendors I would consider to be top data mining vendors: SAS, Oracle, IBM (Cognos), and Microsoft in the "Leaders" category, and Tibco in the visionaries category. Of these, only SAS (with Enterprise Miner) and Microsoft (SQL Server) showed up in the top 10 of the Rexer Analytics 2008 software tool survey, though Tibco showed up in the top 20 (with Tibco Spotfire Miner).

I think this emphasizes again that BI and DM/PA/BA approach analysis differently, even if the end result is the same (a scorecard, dashboard, report, or transactional decisioning system).

Sunday, December 06, 2009

Business Analytics vs. Business Intelligence

I used to be one that thought the term "data mining" would stay as the description of the kind of analytic work I do. To a large degree it has, but there are always new spins on things, and it seems that quite often in the business world, Predictive Analytics or Business Analytics are the terms of the day.

I just came across this post from the Smart Data Collective: OLAP is Dead (Long Live Analytics), which had some fascinating graphs on hits related to the phrases OLAP and Analytics. The first shows the steady decline of OLAP as a searched term to the point where even the OLAP report has been renamed to The BI Verdict. Meanwhile, "analytics" has been increasing steadily in hits. SAS even touts themselves as leaders in "Business Analytics" now.

Which brings me to the question in the title of this post. It seems to me that Business Intelligence has taken over the role that OLAP and dashboarding used to take on (at least in the circles I worked in). Is there a difference between Business Intelligence and Business Analytics? James Taylor, someone whom I respect tremendously, doesn't think so.
As SAS talked about its business analytics framework it became clear that they envision the results of data mining and predictive analytics (where they genuinely have offerings superior to almost everyone) will be delivered in reports or dashboards. This is what I have somewhat dismissively called "predictive reporting" and while it is better than purely historical reporting, it does not do much to make every decision analytically based as it leaves out the decisions made by machines (which don't read reports) and those made by people with too little time to read a report (most call center or retail staff, for instance) or no skill at interpreting it.

I guess I just don't see the difference between BI and BA...

If all of business analytics is reduced to "predictive reporting", then I can see why some might consider it no more than business intelligence. But even so, are they the same? I don't mean are the results the same either. For that matter, the final decisions from analytics for say classification look just the same as a human decision (buy or not buy? fraud or not?). But is the process the same? I would argue "no". Much of the power of predictive analytics comes from the automation in searching for and assessing nonlinearities, interaction effects, and combinatorics relating observables to outcomes. So, rather than manually assessing these, one automates the process through the use of "decision trees", "neural networks", or some other algorithm. So the difference lies in efficiency in the process.

Now how the predictive information is used, in a report, as part of an automated system or in some other way, is a critically important question, but independent of how the decisions are generated.

Tuesday, December 01, 2009

Computer Science and Theology

I have been reading a book by Don Knuth called Things a Computer Scientist Rarely Talks About (Center for the Study of Language and Information - Lecture Notes)--a very good read for those of you interested in theology as well as analytics. This post is not about the theology of the book (as interesting as that is to me), but rather the reason described in this book for his writing of another book called 3:16, a study of all the 3:16 verses in the Bible. In his chapter on randomized testing (I like to think of model ensembles here), he describes how random sampling is a good way to get an idea of the content of "stuff", whether computer science assignments (he actually does this--randomly take page X of a project and look at that in depth), or understanding books (like the Bible). His 3:16 book takes this verse from every book in the Bible to get a sense of the overall message of the Bible. He admittedly chose 3:16 because of John 3:16 so that he would get at least one great verse, but this was a concession to making the book marketable.

At first I wasn't a big fan of this idea. After all, it is a small sample, But he describes how he then studied these verses in depth. Whereas his prior understanding of the Bible was vague and general (which has its positive points), this exercise led also to a deeper (albeit narrow) understanding as well. I recommend this approach very much.

What does this have to do with analytics? Data Mining often is viewed as a way to get the gist of your data, see the big picture, understand patterns through summarized views. But just as important is the deep view, looking at a few examples (prototypes) in depth. In the text mining project I'm working on right now, while we extract "concepts", much of our time is also spent tracing a few text blocks through the processing to understand in detail why the analytics is working the way it does. I'm a "both / and" kind of guy, so this suits me well; big picture analytics as well as deep dives into record-level descriptions.