Abstract: Android operating system has been recognized by most application developers because of its good open-source and compatibility, which enriches the categories of applications greatly. However, it has become the target of malware attackers due to the lack of strict security supervision mechanisms, which leads to the rapid growth of malware, thus bringing serious safety hazards to users. Therefore, it is critical to detect Android malware effectively. Generally, the permissions declared in the AndroidManifest.xml can reflect the function and behavior of the application to a large extent. Since current Android system has not any restrictions to the number of permissions that an application can request, developers tend to apply more than actually needed permissions in order to ensure the successful running of the application, which results in the abuse of permissions. However, some traditional detection methods only consider the requested permissions and ignore whether it is actually used, which leads to incorrect identification of some malwares. Therefore, a machine learning detection method based on the actually used permissions combination and API calls was put forward in this paper. Meanwhile, several experiments are conducted to evaluate our methodology. The result shows that it can detect unknown malware effectively with higher true positive rate and accuracy while maintaining a low false positive rate. Consequently, the AdaboostM1 (J48) classification algorithm based on information gain feature selection algorithm has the best detection result, which can achieve an accuracy of 99.8%, a true positive rate of 99.6% and a lowest false positive rate of 0.
Abstract: In this paper, we describe the use of formal methods
to model malware behaviour. The modelling of harmful behaviour
rests upon syntactic structures that represent malicious procedures
inside malware. The malicious activities are modelled by a formal
grammar, where API calls’ components are the terminals and the set
of API calls used in combination to achieve a goal are designated
non-terminals. The combination of different non-terminals in various
ways and tiers make up the attack vectors that are used by harmful
software. Based on these syntactic structures a parser can be
generated which takes execution traces as input for pattern
recognition.