Add new comment

First thanks to Bob @ for heads up & links on this worthy endeavor. I'm right in the middle of this topic, and have been for most of career, so thought I would share a few brief thoughts:

Agree that big data is over-hyped. My perspective is surprisingly close actually to Gartner's hype cycle on this one--as well as the forecasts & views relative to combining BD & HUMINT, which like authors we've been working on for a long time--almost 2 decades for us, 3 if counting everything that should be. Our perspective is from a truly independent effort, self-funded to growth phase--well aligned with org mission & customers, not to be confused necessarily with guilds, agencies, units or even mgmt teams--important distinction these days especially given LT fiscal trajectories around the world, and the related influence of service sector on same, which increasingly seem to have unhealthy influence on sustainable tech & economics.

Regarding definition of BD--another broken record usually found in close proximity to over-hyped tech or methods/services--avoidance of definition & therefore accountability. As with the case of others, the science does have great potential, is quite real, but doesn't exist in some black hole--little inconvenient realities are ever-present like economics, rule of law, governance, conflicts, human/org, & physics.

However, in the case of big data a clean science based definition is doable provided that it's limited to scale. Google provides a case even though their work also includes other algo's. We had a good presentation from a Google engineer/VC on this specific point at the SFI symposium a couple of weeks ago where he shared some actual examples that had been tested out--advantage they have most others don't. For some specific use cases with highly specific algo's, relative to quite imperfect info typical of the web, higher scale can/does equate to higher accuracy.

If we were to limit the definition to 'big' data, with primarily one V rather than the many--perhaps Volume + Velocity for most purposes, the science is pretty clear--there are highly specific functions within data physics that are limited to scale which cannot be achieved otherwise, esp conjoined with velocity within time window needed. With some HPC queries still reported to take up to a year--although likely in part for other reasons like data quality/skill, this is not a trivial issue.

However, and this is a very important point, even with vastly improved algo's, scale alone is limited for most use cases/missions, and at least data I have consumed, big alone for most critical purposes isn't nearly sufficient. During roughly the past 15 years of R&D for example, we've seen BD algo efficiency improve radically from 30ish % efficiency to 70ish%, while highly structured, high quality data returns are almost perfect--99%+, which for some purposes is still not good enough.

The evidence is clear -- while BD should be leveraged, the priority for most critical operations & decisions should be on smart data at the confluence of humans & machines. By smart data we mean continuously adaptive, & tailored to the specific needs of each entity within org parameters (governance, regulatory, policy, mission, time).

Thanks for the discussion.

Mark Montgomery
Founder & CEO