Programs capable of pattern recognition in personnel employment data add teeth to background checks.
A set of software and algorithms developed to identify criminal activity in the gambling industry is now available to the federal government to help detect employee fraud and collusion. The system correlates data from a variety of sources to shed light on questionable personal relationships and transactions. In the federal sector, this system’s potential uses cover internal security, background investigations and intelligence gathering.
Fraud costs U.S. businesses an estimated $400 billion a year, according to the Association of Certified Fraud Examiners. In the government, the cost of security breaches can go beyond dollar amounts—as shown by the Aldrich Ames case. As an agent for the Central Intelligence Agency in the 1980s and 1990s, Ames provided the Soviet Union and Russia with vital information about U.S. spy networks in Eastern Europe. In return, he was paid several million dollars, which funded an extravagant lifestyle that largely went unnoticed by his superiors.
Technologies allowing organizations to search for specific internal and external threats can help alleviate immediate security issues. Conversely, by extracting strands of information out of vast amounts of data, collusion detection software and algorithms also provide a tool for analysts and strategic planners to develop policies that engender more secure procedures.
Systems Research and Development (SRD), a Las Vegas, Nevada-based custom software firm, developed collusion detection technology (CDTECH) as a tool for conducting background checks. The product exists as both a software set and a collection of algorithms. According to Jeff Jonas, SRD’s president and the technology’s inventor, the latest version of CDTECH has been licensed to Digital Data Development, Washington, D.C., a firm geared to penetrate the federal market.
At its most basic level, the technology allows an organization to create a database containing all available employee data. Information such as residence histories is cross-checked against a list of known gaming industry criminals to determine if an employee has ever shared the same address, phone number or bank account with such a perpetrator. Besides cross-referencing for criminal activity, the software also highlights questionable relationships such as supervisors and their direct subordinates having the same home address. Depending on the client, data covered can extend to any public domain information that would yield, for example, address histories and lifestyle indicators.
It is the amount of money in gaming that brings out the opportunists, Jonas says. He notes large sums of money also change hands in banking, but in that industry the reason for the outcome of every transaction is certain in the sense that it is verifiable. “In gaming, the outcome of every transaction is not certain. The house is probably going to win,” Jonas suggests. However, the possibility of winning against the house opens the door for opportunists because they can look straight at the gaming operators and say they got lucky, he relates. Some people genuinely win. But if a person comes in every Tuesday and wins $50,000, gaming management becomes suspicious that either an illegal method is being used or that he or she has come up with a way to change the odds, Jonas explains.
Likewise, in government circles there are policies and procedures in place to protect valuable assets. But collusion, or the insider threat, renders those policies useless, Jonas observes. For example, if a security person charged with examining employees’ bags as they leave a facility is the roommate of someone working there, a potential threat may exist. The guard might be inclined to simply let a friend through, or both may be accomplices, he says.
Collusion detection works by correlating data from different sources. However, when the data set is medium or large—over a million names—the number of relationships can become overwhelming. Because the amount of information would quickly render a system useless, alert rules are introduced. These parameters determine the relationships that matter to the client.
Depending on what is loaded into it, CDTECH can scan for employee fraud or criminal activity. It can help intelligence gathering efforts or assess an individual’s relationship to co-workers or neighbors. On a larger scale, the product can study entire populations or governments for different types of activities.
One of those activity types is transactions. For example, a hypothetical search for overseas corruption might focus on certain countries to identify influential people and transaction patterns that suggest an arms deal is taking place. If a person of modest means suddenly becomes wealthy, an alert might be posted.
Because it is not constrained to a single language, CDTECH can operate across a variety of platforms including UNIX and NT. The data structure lends itself to acquiring and analyzing information from a virtually unlimited number of sources, he says.
CDTECH presents the processed information to clients in the form of an alert report. Depending on the system’s configuration and user requirements, these updates can arrive daily, weekly or monthly. Time is critical when detecting fraud, and even a delay of a few hours can cost money. “There is nothing worse at the end of the day or week than figuring out that there was a relationship going on, and you’ve already gotten nailed,” he emphasizes.
SRD is testing an updated system that can report events in near real time. As fast as a user introduces a new element into a source system, such as a telephone number on an employment record or a new mailing address, the information is instantly tested against the entire repository. Should an alert be triggered, the system notifies an administrator within seconds. This will aid many of SRD’s clients. Jonas cites the example of a client with a $40-million-a-year fraud problem. Shortening the detection time on a crime translates into millions of dollars, he says.
The CDTECH data model can be modified to fit client needs and computing requirements. The system is modular, allowing the introduction of new types of data without the need to re-engineer the database architecture. If the network had to be restructured every time a new data source was added—a new subject category, for example—it would be a maintenance monster, Jonas observes.
Data source problems are avoided by cleaning the information. Addresses are checked and standardized by using off-the-shelf address validation software. For example, if a person lists his or her address as 460 Main Street and the location is really 460 South Main Avenue, the validation process corrects the street name before it enters the database. The same process is applied to names. The name Richard belongs to a name family made up of variants such as Ricky, Ricardo and Dick. These variants are standardized on a name cross-reference table. All personal numbers provided in employee records also are analyzed: birth dates, driver’s licenses, credit cards, checking accounts and Social Security numbers. Any erroneous or invalid information is discarded.
Each of these verification activities is conducted separately in what Jonas describes as a node. This allows the system to be scalable to a client’s needs by allowing any number of data sources, he says. Multiple nodes can be loaded onto a single computer or across a network. Standardized and formatted information from these various points is then merged into a consolidated database. “This node-based architecture is capable of supporting a multiterabyte environment, potentially handling 100 million entities and billions of transaction-activity references such as job applications and denials or acceptances,” he says.
The database closely resembles but is not a data warehouse. According to Jonas, information comes in from many sources. Next, it is retrieved and analyzed for collusion patterns. As the data is loaded, the information triggers alerts if it meets certain criteria. At the end of the day, the full computational power of the software investigates these relationships. Jonas is quick to note that not all of the associations are criminal situations—this is where the analysis and review process becomes necessary. Once the clean information is in one place, the software is a powerful tool capable of finding data while saving time and manpower, he says.
“What you end up with is a database that has all this information correlated,” Jonas says. In an investigative mode, a user might have a suspect telephone number or address. Analysts only have to search the database once because it is like searching all of the databases at the same time, he says. If a user obtains a telephone number, he or she can type it in—it is like searching across all the different systems available, Jonas explains. “That would take armies of people to do. You can’t really go to a payroll system and say ‘does anybody have this phone number?’” he stresses.
CDTECH also features devices called confidence evaluators. The process can be tuned to the size of the data set. For example, a small set of under a million names would be tuned differently than a large set used for studying the entire population of China.
The confidence evaluators also allow the alert and matching rules to change. If an organization is studying a million people in the United States, it might decide that Social Security numbers are sufficient for identification. But, if this search was expanded to 30 million people, a high chance of error would exist because people would probably make mistakes as they keyed in Social Security numbers. Too many duplicate numbers would raise false alarms, Jonas says. Instead, the confidence evaluators might be shifted to include a Social Security number; a first, middle or last name; or date of birth followed by a secondary check to validate the data.
Confidence evaluators can also screen for situations such as two people with the same name sharing the same address. It would then check the date of birth and determine whether the month and day of birth were transposed. This might trigger an alarm because it is either an error or a person trying to alter his or her birthday, Jonas indicates.
Once all of this data is loaded and cleaned, relationships are worked out. When these have been determined—did the employee in question share a residence with a suspected terrorist, for example—the confidence evaluator will decide whether to generate an alert.
Because the system allows users to process large amounts of data quickly for certain information, the company sees potential applications in the public sector. According to Jonas, the U.S. Department of Defense has a backlog of some 700,000 employment background checks. He believes these could be quickly analyzed to identify those who present the highest risks.
Jonas will not divulge client names to avoid alerting employees to the monitoring; however, he notes that a growing interest exists in the government sector for collusion detection technology. Digital Data Development is working to meet those needs, he says.