You keep seeing security products tout machine learning. You hear that it can make your life as a security professional easier and more efficient: perhaps it can reduce false positives, provide more sophisticated analytics and enhance alerting. Sounds pretty cool! What are the questions you should ask a potential vendor to determine whether their machine learning offering is going to help you?

Today, let's talk about supervision.

There are multiple algorithms that fall under the umbrella of machine learning, but at a higher level, you can divide machine learning into two forms: supervised and unsupervised. In supervised learning, you are providing the inputs and outputs to a system, and the system's job is to develop a function that maps the inputs and outputs. As one example of supervised learning, we might provide a set of pictures with information about the location where each picture was taken. With supervised machine learning, the system should learn how to determine the location that future pictures are taken, so that as new pictures are added, it can accurately note the location. Critically, humans will be there to tell the system when the function is incorrect, so that it can continue to tune the function. For example, it might tag a picture of the Eiffel tower as being in Paris, until we point out that the picture in question is of the Paris Las Vegas hotel in Las Vegas, Nevada; then it will adjust its algorithm so that it doesn't make that mistake in the future. That's the supervised part: there's a human in the picture to help update and correct the system.

In unsupervised learning we provide inputs into a system, but no outputs – it is up to the system to make sense of the inputs and develop functions that provide insights into the data. Unsupervised learning is closer to what you'd expect for many security and identity use cases, which are much more complex than mapping a picture to a location. If a machine learning system processes a set of data about user logins and access in a particular enterprise, it isn't necessarily looking for a correct answer – it is more likely looking for patterns about how users access systems and looking for outliers that might indicate risk. An important element of this is that there is no human looking over the machine's shoulder to check their answers.

In reality, a lot of systems use semi-supervised learning, which as you might guess, incorporates aspects of both supervised and unsupervised learning – and here's where you should start to ask questions.

  • How much supervision is in place?
  • How much is a human correcting the system?
  • Does your vendor provide that supervision? For how long?
  • Do you, the customer, need to provide some supervision?
  • What expertise is necessary to do so?
  • How complex is the system – is a human going to be able to tell if the machine got the "right" answer?

Asking these questions will help you start to understand the complexity of the machine learning offered by a potential vendor, the ongoing support and oversight required, and how much of that will fall to you. In turn, you will be better able to assess the value of the system and the potential hidden costs to you.
Explore some of the markets where our solutions are deployed here.

Sandra Carielli
Sandy Carielli has spent over a dozen years in the cyber security industry, with particular focus on identity, PKI, key management, cryptography and security management. As Director of Security Technologies for Entrust Datacard, Sandy guides the organization’s next generation security and technology strategy. Prior to Entrust Datacard, Sandy was Director of Product Management at RSA, where she was responsible for SecurID and data protection. She has also held positions at @stake and BBN. Sandy has been a speaker at RSA Conference, SOURCE Boston, the NYSE Cyber Risk Board Forum and BSides Boston. She has a Sc.B. in Mathematics from Brown University and an M.B.A. from the MIT Sloan School of Management.