to analyze quantitative data and produce information that monitors business performance. The analyses may be summaries or drill downs that present details on subsets of data. More broadly, business intelligence can include any information, such as articles and reports, that offers insights into an industry or company (see sidebar page 9). Usually, quantitative data and text information are considered separately, but now quantitative analysis is being paired with information in text form to achieve a deeper understanding than either can provide alone.
After evaluating several solutions, Budge selected Clementine, a predictive modeling tool from SPSS (spss.com). The flexibility of Clementine was appealing. It uses a wide range of analyses, including regression, neural networking and decision trees. In addition, it has clustering tools that group customers according to multiple behavioral variables.
The intended users of Clementine were not statistical analysts, but business sales and marketing analysts who have a good understanding of the business and the associated data.
The modeling component lets the employees define analyses that predict which types of clusters are likely to buy which kinds of products. The marketing department can then target the campaigns in a more focused way, improving ROI. Being able to predict customer behavior is a key goal of those analyses. One model developed using Clementine was able to identify a group of customers that was three times more likely than average to develop bad debts early in their customer life cycle. The analysis would enable the company's debt teams to manage those customers differently-for example, offering products that limit debt risk like prepayment.
Traditional BI is about reporting the facts," says Olivier Jouve, VP of market strategy at SPSS, "but text mining explains more about why things are happening." Jouve developed the technology that allows analyses of structured and text data to be combined.
Text mining is a bottom-up approach that starts with the data, to see what it shows. Search is a topdown approach, most useful when the researcher has a direction for the inquiry. Trying to find the key words to detect customer sentiment can be difficult. Searching call center notes, for example, may not be particularly revealing.
Fruitful results
Medical data is another environment in which combined analyses of structured and unstructured data can prove fruitful. At the University of Louisville (louisville.edu), a team of researchers headed by Dr. Patricia Cerrito is using SAS Text Miner from SAS (sas.com) to analyze data from area hospitals. Analyses of text records gathered from medication orders and chart notes are helping to explain the relationship between physician practices and patient outcomes.
The ability of Text Miner to find patterns in clinical reports and other medical documents and to provide quantitative analyses of text is valuable in the research at Louisville. In addition, the close integration of Text Miner with Enterprise Miner provides an easy way to combine analyses of structured and unstructured data.
Mining structured data and unstructured text is something SAS customers have applied to many different industry needs, says Mary Crissey, SAS product marketing manager for data mining and text mining.
For example, American Honda (honda.com) now uses SAS Text Miner to monitor warranty claims, in order to detect early warning signs of engineering problems. Honda analyzes text from call centers, technician feedback and other areas across their dealer network to find patterns in the records that may be early indications of potential problems. Then, Honda engineers can investigate further to pinpoint the root cause of the issue.
The attention to text-based feedback as part of early-warning analysis is now becoming essential for manufacturing companies that strive to identify potential issues and resolve them quickly before they are allowed to snowball into larger, more expensive problems.
SAS has traditionally been strong in analytics; its Enterprise Miner product is used to mine structured data. In order to apply some of the same skills to text, the company turned to Inxight (inxight.com), which specializes in discovery and visualization of text information.
Using Inxight's technology in our Text Miner product allowed us to analyze text for concepts using some of the same algorithms we had developed for Enterprise Miner," says Crissey. "We added some graphical interfaces that allow visualization of patterns found in the text, ranging from basics like word counts to a more sophisticated understanding of word usage that might indicate a specific predictive trend."
The issue of combining analyses of structured and unstructured data to provide more meaningful pictures of business and technical information is receiving increasing attention.
With more non-technical people (whether internal to the company, customers or supply chain partners) now seeking information, the ability to find information without knowing its location has become critical.
One interface that users know and are comfortable with is that of Google (google.com), so IBI opted to incorporate that search technology with its BI solution, webFOCUS, to create its Intelligent search tool. That top-down strategy does require the user to know the search target, as opposed to a text mining situation in which the software discovers patterns. However, for actions such as finding all the information on a particular customer, the combination of structured BI and a search engine offers advantages. Besides the familiar interface, the data can be found no matter in which repository it resides, in contrast to situations where the data must be in a dedicated warehouse. IBFs iWay Software integration tool provides adaptors to many structured databases and document management repositories, making everything accessible.
By the end of this year, IBI expects to have a template that presents BI reports and relevant unstructured content within one interface.
Although the practice of combining quantitative analytics and text analytics has not yet been widely adopted, it is becoming increasingly feasible, thanks to advances in technology, and is likely to see greater use in the relatively near term.