Mining big data for salient information points presents a plethora of challenges, but in Europe a different issue with the action has emerged as a concern. Regulations prohibiting researchers and others from searching through the data in certain documents are putting countries on the continent at a competitive disadvantage in a number of fields, studies are revealing. With several economies there already in dire straits, the legal encumbrances could add to difficulties in improving financial situations.
The report, “Standardisation in the Area of Innovation and Technological Development, Notably in the Field of Text and Data Mining” lays out the problems of restricting mining in texts. It explains that because text- and data-mining technology is relatively affordable, it is available even to individual and small-organization researchers. The document was produced by an expert group appointed by the European Commission. That government body or its departments establish these groups to provide advice and expertise. They include at least six public- and/or private-sector members who meet more than once.
The personnel chosen for this particular task write in their report that “There is growing recognition that we are at the threshold of the mass automation of service industries (automation of thinking) comparable with the robotic automation of manufacturing production lines (automation of muscle) in an earlier era. [Text and data mining] will be widely used to provide insights in the redesign of this digital services economy. When it comes to the deployment of [text and data mining], there are worrying signs that European researchers may be falling behind, especially with regard to researchers in the United States.”
Because the technologies apply to multiple fields in the public and private sectors, performing on equal footing has wide-reaching ramifications. The report explains that European researchers believe the laws they must follow regarding copyright, database protection and data privacy put them at a disadvantage when compared with the fair-use laws in the United States. Scientific publishers in Europe have proposed licensing terms to make mining their archives easier; but, the report states, many researchers “dismiss these efforts as insufficient … effective research demands freedom to mine all public domain databases without restriction.” Scientific publishers and researchers especially are driving the call for changes.
The report makes three recommendations to increase the international competitiveness of the European Union’s research base. First, licensing of works for text and data mining should be made easier. This should both add value to the economy and add to the human factors necessary for big-data research in the digital economy. It is considered a prologue to true reform. Second, leaders should consider a “specific and mandatory exception to remove text and data mining for scientific purposes from the reach of European copyright and database law.” This second suggestion is a midterm effort that lays the groundwork for the third effort.
The third recommendation states that the best approach to reform will be the establishment within European law of a durable distinction between protecting authors’ rights and copyright in the digital age presenting a barrier to modern research. Within the final suggestion, the writers assert that proposed reform to European data protection laws should avoid creating further impediments for scientific researchers.
The short time frame allowed for generation of the report means that serious primary research into the issue remains to be conducted. Researchers had less than six months to complete their work, which allowed them time to review the state of existing knowledge. “It was an assessment of available evidence,” explains Ian Hargreaves, a professor at Cardiff University in the United Kingdom who chaired the expert group. Team members carried out interviews and conversations with what they characterize as insightful sources, but from the beginning they knew the time frame did not allow for comprehensive research. Hargreaves says it would be easy to spend two or three years looking into this topic.
In conducting their research, the expert group members talked to people in multiple countries within and outside of Europe. They also drew on unpublished work about patents on data- and text-mining apps. Wherever they looked, the researchers found that Europe was less knowledgeable, less active and less ambitious than the United States.
Hargreaves explains that the report is important because of how critical data analytics is in the current phase of the digital revolution and to its next phase, which will be the digital experience economy. “We also know that research as an input is crucial to the kind of economies that exist in Europe and North America,” he says. “It’s very important that Europe does not put itself at a disadvantage either with the conduct of data analytics or the scale or ambition or efficiency of research agenda by having a suboptimal legal regime in regard to text and data mining. Europe does precisely that.” He has no doubt that conditions in the United States are more favorable than in Europe. “That is not something Europe can afford to persist with,” Hargreaves states.
The disadvantage concerns expand to competing with additional parts of the world such as Australia, Japan and Israel. One of the challenges facing the European Union is its multinational composition. Rules and ownership among sovereign states come into play. At the same time, the union has an open market in many goods and services. “Some people argue—myself included—that Europe needs a unified or single digital market,” Hargreaves says. “But at the moment, it has a territorially segmented patchwork of markets.” The setup affects a range of fields, from mobile telephony to data analytics. Hargreaves and others believe the restrictions cause serious economic problems for the European Union.
The differences between digital goods and others stem from several factors, including the fact that the industries these technologies have most dramatically disrupted traditionally have conducted their business on a territory-by-territory basis. Some companies have a strong interest in maintaining such distinctions. When it comes to copyright law, the European Union has an overarching set of directives, but with a list of exceptions and limitations to the law that are optional in member states. The states make varying choices, resulting in a diversity of practice.
Hargreaves explains he is “completely convinced that Europe urgently needs to change the way it thinks about this. And if it doesn’t, I think it will be a very serious handicap for the European research community in the years ahead. So I think it’s quite a big deal.”
European leaders seem to be thinking along the same lines. Other work examining data mining was funded by the European Commission around the same time as Hargreaves’, including a report by De Wolf and Partners titled “Study on the Legal Framework of Text and Data Mining.” Hargreaves says his team was aware of other efforts and drew on them as they came into the public domain.
What will happen with the studies moving forward is unknown. An official with the European Commission said the organization is not commenting on the standardization report at this time, though he did share that results would be used in future deliberations. Further complicating matters, the Europe Parliament held elections earlier this year, with the new members beginning their roles in July. While center-right and center-left parties continue to control the governing body, more extreme parties from both sides saw major gains, including groups with strong anti-European Union bias. Waves of anti-collaboration are rising in places such as the United Kingdom and France.
Hargreaves does not expect an active policy-determining mode to begin in the union until the end of this year or beginning of next year, so effects from his team’s report and others likely will not be felt for months. “My own view is that Europe needs to change its approach to copyright law and database law and adapt these in a way that is better suited to the digital circumstances in which we live,” he states. The recommendations made in the report are positioned to make them potentially actionable by the newly elected parliament, though the work would take some years.
Part of the debate in Europe about text- and data-mining laws revolves around whether to draw a distinction between commercial and noncommercial research. The report recognizes a distinctive set of arguments around publicly funded research fueling the debate around open data access and open systems of academic publishing. It also acknowledges the difficulty in drawing a clear distinction between the commercial and noncommercial sectors. Hargreaves makes the point that economic benefit occurs whether research is carried out in the public or private sector. Spending on research drives innovation and consequently economic growth; a relatively well-understood relationship exists between the two, he explains.