Smart Companies Dig Data

November 2007
By Maryann Lawlor
E-mail About the Author

Mining medical data in the private sector presents several challenges because of liability and privacy concerns. Some of these issues also exist in the military medical environment; however, medical information routinely is being collected digitally in the battlespace today.
New opportunities surface when organizations capitalize on captured client information.

Boosting business by improving customer service requires a bit of digging, but the information gold mine already is in place. With the aid of a few algorithms, companies are excavating data to unearth insights about their customers that emerge when small particles of information are fused into a gold nugget.

Corporations need only be willing to invest some time burrowing into the data they collect about their clients every day. Charlie Berger believes that it is the combination of digging and fusing that sets some firms apart from their competitors. Most companies in the same line of business already understand the basics about their customers, he says, but organizations that take the time to aggregate their client data know how to offer customers just what they may be looking for even before they know it themselves. The senior director of product management, Life Sciences and Data Mining, at Oracle Corporation, Redwood Shores, California, Berger says that so few companies are capitalizing on their customer information that the opportunity to pull ahead of the competition pack is wide open, and so are the opportunities to create innovative data mining tools.

One indication that companies are not taking full advantage of data mining can be found in the banking industry’s automated teller machines (ATMs). Although financial institutions have a plethora of information about their customers, the first screen on many ATMs generally asks whether the user wants to see information in English or Spanish. Berger points out that the financial institutions should certainly have that information based on previous transactions and incorporate it into its user interface.

According to Berger, one of the reasons that more companies do not fully employ data mining tools is that they do not understand the benefits these tools offer. Usage is a matter of the harmonic balance between the level of effort it takes to implement the solutions and the benefit the companies will gain from having implemented them, he points out. In many cases, the companies are not aware of the benefits they could reap.

Most consumers are familiar with companies that are doing data mining right, Berger relates. For example, some financial institutions have linked their various customer information silos such as credit cards, retirement funds and college savings accounts. Service representatives can see the across-the-board information that prompts them to make suggestions about each of these items during any customer interaction. “Now when you call a business, you get product information about the full range of relationships you have with that firm. That’s the starting point for data mining in companies, which for some of them is profound,” Berger explains.

Online companies take this type of data mining a step further. By creating predictive models based on a customer’s purchasing data already on file, these firms can offer suggestions about other products a buyer may be interested in. is one example of this data mining usage. When e-customers purchase a book, they immediately see other books that might interest them based on the data the company already has collected about customers that have bought the item.

Businesses do not need their customer data at the level of detail that would allow them to make specific product suggestions such as movie or book titles, Berger shares. “Even though some companies may not know much about me, if they see that my title is doctor, for example, they may suggest a jumbo home loan because their data shows that the probability is that people with the title doctor are more interested in one. Or as a customer, I might even want them to use my information to recommend something and send me information about it. They may have a model based on a number of different characteristics about customers like me, and a byproduct for the companies is cross-selling opportunities,” he offers.

This type of focused advertising and “up-selling” may be in the rudimentary stage now, but Berger says it is not out of the realm of possibility to one day see a level of directed advertisements as those seen in the futuristic movie Minority Report, in which advertisements in public areas changed based on who was passing by them. In fact, Oracle is now working with a cell phone company that is considering introducing ads on their service targeted toward what the company has learned to be a customer’s specific interest, he relates.

Berger explains that this type of focused advertising is made possible by building on classical data mining, which begins with a two-dimensional information table. The initial table would list the customers’ names augmented with transactional data, such as what the customers buy and how often they buy it. The result is a profound improvement for the seller “that’s new and different,” he says.

But data mining can go beyond who is buying what. Berger shares that data can be used to add context to other information. In legal claims, for example, insurance companies can use data mining to predict which customer transactions may be fraudulent. Using next-generation technology, data from police reports or other official sources can be matched with claims information. In this case, merging two pieces of information can help companies more easily spot claims that need to be reviewed more closely or customers who may be trying to defraud the company.

“We can find the words that are suspicious or excessive or even words that indicate certain conditions. I was told they [insurance companies] flag claims when they see submissions that say ‘fractured wrist’ or ‘broken femur.’ An accident report and claims submission that includes broken femur is something that they want to route through an analyst at a higher pay grade to review,” Berger explains.

One way context can be added to information is by looking at unstructured data. Using a clustering algorithm, massive amounts of data can be sorted into piles based on similar themes. A simple example is flagging the number of times the words war, peace, Iraq and Iran appear in a number of documents, which indicates the subject of those documents. Statistical modeling allows the user to notice the frequency of words, word pairs and word triplets and then, using a thesaurus, sort out and know, for instance, that both a car and a truck are vehicles and all vehicles fall in the category of transportation.

Many researchers in the life sciences field are using this technique to pore through medical findings and data that may be readily available but dispersed in a multitude of locations. For example, medical literature can be explored using clustering algorithms that allow researchers to narrow down the number of documents they need to review. Berger compares this to conducting an Internet search but notes that the results are much more focused. “We’re talking about automating the harvesting of information out of both structured and unstructured data,” he says.

However, Berger points out that the use of data mining in the medical field is a complicated issue that is likely to get more complicated. Today, much of the patient information is not yet in digital format; however, the amount will grow as more doctors use laptops to record patients’ medical data. But while this will enhance the opportunities for data mining, it raises contentious issues such as privacy. In addition, as the amount of data concerning illnesses and treatments grows, Berger wonders if there may be a day when a patient’s family could hold a physician responsible for a misdiagnosis or death because the doctor did not mine the available data before choosing a course of treatment.

Whether it is the fear of lawsuits or a lack of understanding of the balance between the investment and benefit, Berger shares that many organizations are not convinced about how helpful data mining can be to their business until they see it demonstrated. And spreading the word has proved to require repetition. In more than one instance, Berger has met with clients who are very impressed with Oracle’s data mining tools but then fail to follow up to have them installed. He says that several years later, he has found himself at the same organizations demonstrating the latest capabilities to a new group of staff members who are once again impressed but often do not follow up with implementation. Instead, many administrators admit that when it comes to identifying fraud or creating new lines of business, humans most often carry out the data mining work, he relates.

Perhaps one of the reasons some companies are reluctant to take advantage of data mining is that they are not certain of the reliability of predictive models, Berger allows. However, data mining could be used to verify the veracity of results, he adds. By employing a feedback loop and champion-challenger predictive model, several different models can be compared. From this point on, a good-better-best system is used to define the confidence level of the resultant data.

“In fraud, sometimes the accuracy that we get is really not so great at all. When we look at the actual accuracy numbers, they are not very impressive compared to my ability to predict whether you’re going to buy a bicycle or not. But in fraudulent cases, the value of taking the haystack of data on customer transactions and reducing that down to a much smaller pile of suspects to analyze is such a huge savings that it’s great,” he says.

Predictive modeling also has been used by the U.S. military to determine which officers would be the best leaders under battlefield conditions as well as the likelihood of military personnel to separate from the service, or churn rate. The latter application can be useful for service businesses to help them predict which customers may be planning to move to a different provider when their contracts have expired. “In the cellular telephone business, for example, they have a 25 percent churn rate. Just being able to anticipate that 20 percent or 30 percent of customers who might be leaving and to turn it around means great amounts of money,” Berger says.

“There are a lot of people who have started doing data mining as a whole, basic data mining. There are some people who are doing some really amazing things with data mining, and there are a lot of people who probably should be doing some of the amazing things,” he states.

Increasing the use of data mining also requires effort from companies that create data mining tools, Berger admits. To reach the harmonic balance between the level of effort and the benefits, vendors must make usage easier. “The benefits that companies get from data mining tools are so profound that I just have to believe that we’re going to see a mad rush to do the sorts of things that some companies are already doing,” Berger says.

Web Resource
Oracle Corporation: