< Back


April 1, 2014

Detection tool leverages big data, machine learning

Emerging techniques for detecting security threats are combining big data with behavioral analytics and machine learning to detect and prevent intrusions.

A white paper released by intrusion detection specialist FileTrek describes a detection tool that aggregates real-time data generated from users, applications, files and systems, then computes and tracks risks using machine learning. The company also claims its “risk equation” becomes “smarter” over time.

Ottawa-based FileTrek’s approach is part of a growing trend toward automating big data techniques, such as sifting through large volumes of network activity to identify potential intrusion risks. These tools could eventually be refined to anticipate threats before they occur.

FileTrek CTO Stephan Jou said the company’s approach leverages big data and behavioral analytics to “capture the context of events and connect the relationships between those events to quantify” risk.

The intrusion detection tool relies on inputs from users, activities, files and methods of operation to generate a “risk score” for each input. The resulting baseline for normal business operations is then used to focus on anomalous events. Citing the example of former National Security Agency contract employee Edward Snowden, the system could analyze anomalous behavior like an IT administrator copying sensitive files to an external USB drive.

The white paper describes a recipe, or equation, for computing the behavioral risk of networks users. In short, the overall risk of a certain behavior is computed by multiplying the probability that an activity is anomalous and therefore a potential security threat by the negative “impact” of the an observed event.

Such “behavioral risk indicators” would be flagged by the tool, which would then use machine learning to scan for similar risky or suspicious behavior. FileTrek touts the tool as becoming increasingly capable over time of detecting suspicious behavior in real time across enterprises.

While big data techniques tend to rely on sheer volume, data mining and machine learning techniques also require “appropriate behavior data,” the white paper notes. Examples of the types of behavior related to users and files includes:

  • Copying of IP from one document into the clipboard and pasting that content to another document
  • Assembling a .zip file from individual component files
  • Transactions with cloud sharing services, such as Dropbox
  • Time spent working on specific documents or in specific applications
  • Screenshots taken of documents and subsequent activity with those image files, and
  • Opening and closing applications, including when, where and how those applications are used

The company said it is targeting markets where users must secure large amounts of proprietary or customer assets. These include energy, financial services, high technology, life sciences, and manufacturing companies along with government agencies.