Pages

Apr 24, 2013

Machine Learning for Anti-virus Software


Symantec is the largest anti-virus software vendor. It has 120 million subscribers, who visit 2 billion websites a day and generate 700 billion submissions. Given such a large number of data, it is paramount that an anti-virus software can detect the virus fast and accurately.

Anti-virus software was originally built manually. Security expert review each malware and construct their “signature”. Each computer file is checked against such signatures. Given the rapid change of malware and many variations, there are not enough human experts to generate all the exact signatures. This gives rise to heuristic or generic signatures which can handle more variations of the same file. However, new types of malware are created every day. Thus we need a more adaptive approach to identify malware automatically (without manual effort of creating signatures). This is where machine learning can help.

Computer virus has come a long way. The first virus “creeper” appeared in 1971. Then we have Rabbit or Wabbit. After that came computer worms like “Love Letter” and Nimda. Today computer virus gets much more sophisticated. It evolves much faster and is constantly changing. Virus creation is now funded by organizations and some governments. There is big incentive to steal user financial information or companies’ trade secrets. In addition, malware enables certain governments to conduct spying or potential cyber war on their targets.

Symantec uses about 500 features for their machine learning model. The feature value can be continuous or discrete. Such features include:
How did it come this machine (through browser, email, ..)
When/were
How many other files on this machine?
How many clean files on this machine?
Is file packed or obfuscated? (mutated?)
Does it write, communicate?
How often does it run?
Who runs it?

Researchers at Symantec experiment with SVM, decision tree and linear regression models.

In building a classifier, they are not simply optimizing accuracy or true positive rate. They are also concerned false positive instances where a benign software was classified as malware. Such false positive prediction could have high cost for the users. The balance of true positive vs. false positive leads to using ROC (Receiver Operating Characteristic) curve.

An ROC curve plots the trade-off between true positive rate vs. false positive rate. Each point on the curve corresponds to a cutoff we choose. They use ROC curve to select a target point. Below is an illustration of the tradeoff.

The chart above suggests that when we aim for 90% true positive rate, we will have 20% false positive rate. However, when we only aim for 80% true positive rate, the false positive rate be reduced to 20%. (A better classifier could shift the ROC curve up, so that we achieve high true positive rate for any given false positive rate.)

According their researcher, Symantec has achieved high accuracy rate (the average of True positive and true negative rate) at 95%. Its true positive rate is above 98% and its false positive rate is below 1%.

I am a user of Norton software (by Symantec) and enjoy it. I hope to see more success from Symantec and we are winning the war against malware!

7 comments:

  1. We must have read it all about how to keep safe, use this internet security and use that antivirus. but once you are infected with something like a rootkit they won't really do any good job. 

    sheltered your web Browser

    ReplyDelete
  2. The best way to keep your device away from viruses it's to ind anti-virus program related to your requirement.
    Also, you can try to solve problem with your device protection using guides and any simple tool. If you need some such tools to remove virus visit http://removalbits.com/
    and solve your problem.

    ReplyDelete
  3. Can one jailbreak iOS 10 / iOS 10.0.2 / 10.0.1? If not, what is latest on iOS 10 / 10.0.2 jailbreak status for iPhone, iPad and iPod touch devices? you can get answer of this Question as now you can get iOS 10 jailbreak.

    ReplyDelete
  4. The most exceedingly bad part is that PC infection sneaks into your framework with no earlier notice. Thus, the harm brought on is enormous. You ought to make all the fundamental preventive measures to guarantee that there is no possibility of infection in your framework. Nonetheless, regardless of that, your framework can in any case get influenced some of the time. how to remove zepto file virus

    ReplyDelete