Avoiding Lofty Big Data Promises

By Erhan on March 5, 2016

A "cult of big data" in the security market has created an echo chamber that makes it impossible to tell the difference between companies building data fiefdoms—that is, companies that restrict the visibility of their data—and companies developing technologies that unearth new data.

Specifically, many of the companies building data fiefdoms are using easily collectible data and assuring their customers that the analysis within their black box is innovative.

While most of these products seem defensible because of the technical endeavor required to perform analysis, an astute observer should be skeptical of the actual value the scoring and analytics produce. In fact, the lofty promises of value from big data usually amount to nothing more than cluttered dashboards devoid of any usefulness.

Here are some guidelines to avoid market noise created by the cult of big data:

1. Not all data is created equal: be skeptical of composite scores, over-fitting and extrapolation.

Truly insightful data can be simple, but analysts often obscure their value by using measurements, statistics, and ratings to cram data into single composite scores. This is a terrible idea because the act of combining data of different types presumes they hold equal value.

Many companies try to dance around this issue by relying instead on models and heuristics; however, most models and heuristics in security have not been academically verified. As a result, the models and heuristics amount to nothing more than a company's best guess.

Instead, look for products that allow configurations that affect the outcomes of analyzed data, ground their methods in academic research for specific outcomes, and provide direct visibility into how each computed part was factored into a final score.

2. Data collection should be close to its source of origin. Favor products that control the creation of the data.

Data actively created as a result of a stimulated or controlled process will have significantly less guesswork and fewer gaps than data collected from a hodge-podge of sources. More importantly, a controlled process ensures reliability with consistent quality. Data collected without control over the process are much more prone to unaccounted edge-cases, third-party politics, and scalability issues.

Look for products that integrate directly, or own the processes that generate new data.

3. V5: Volume, Velocity, Variety, Veracity and Variability

In order to establish utility, there is a need for the intellectual understanding and interpretation from a theoretical,
philosophical, and organizational (or societal) perspective of the data being collected. The “V5” is a useful way to understand this context. Consider the following:

  • Volume: How much data can be collected?
  • Velocity: How often can the data be collected?
  • Variety: How can the data be categorized?
  • Veracity: How can the data be verified?
  • Variability: How does data of a specific variety differ from each other?

These questions are open-ended because there are no right answers; ultimately, the business value of the data depends on how it influences decision-making. With that in mind, the V5 will ultimately determine if the product's data will be meaningful.

Final Thoughts

More and more security companies are trying to pass off novel data collection techniques as valuable analysis. Use the above tips to spot dubious big data promises and navigate today’s security echo chamber.

Author picture

by Erhan

About Apozy

Founded in April of 2014 in San Francisco, we are a venture-backed motley crew of passionate hackers building cybersecurity technologies to make the world's information faster, cleaner and safer to access.