The Use of Artificial Intelligence in Digital Forensics and Incident Response in a Constrained Environment

Digital investigators often have a hard time spotting evidence in digital information. It has become hard to determine which source of proof relates to a specific investigation. A growing concern is that the various processes, technology, and specific procedures used in the digital investigation are not keeping up with criminal developments. Therefore, criminals are taking advantage of these weaknesses to commit further crimes. In digital forensics investigations, artificial intelligence (AI) is invaluable in identifying crime. Providing objective data and conducting an assessment is the goal of digital forensics and digital investigation, which will assist in developing a plausible theory that can be presented as evidence in court. This research paper aims at developing a multiagent framework for digital investigations using specific intelligent software agents (ISAs). The agents communicate to address particular tasks jointly and keep the same objectives in mind during each task. The rules and knowledge contained within each agent are dependent on the investigation type. A criminal investigation is classified quickly and efficiently using the case-based reasoning (CBR) technique. The proposed framework development is implemented using the Java Agent Development Framework, Eclipse, Postgres repository, and a rule engine for agent reasoning. The proposed framework was tested using the Lone Wolf image files and datasets. Experiments were conducted using various sets of ISAs and VMs. There was a significant reduction in the time taken for the Hash Set Agent to execute. As a result of loading the agents, 5% of the time was lost, as the File Path Agent prescribed deleting 1,510, while the Timeline Agent found multiple executable files. In comparison, the integrity check carried out on the Lone Wolf image file using a digital forensic tool kit took approximately 48 minutes (2,880 ms), whereas the MADIK framework accomplished this in 16 minutes (960 ms). The framework is integrated with Python, allowing for further integration of other digital forensic tools, such as AccessData Forensic Toolkit (FTK), Wireshark, Volatility, and Scapy.

WormHex: A Volatile Memory Analysis Tool for Retrieval of Social Media Evidence

Social media applications are increasingly being used in our everyday communications. These applications utilise end-to-end encryption mechanisms which make them suitable tools for criminals to exchange messages. These messages are preserved in the volatile memory until the device is restarted. Therefore, volatile forensics has become an important branch of digital forensics. In this study, the WormHex tool was developed to inspect the memory dump files for Windows and Mac based workstations. The tool supports digital investigators by enabling them to extract valuable data written in Arabic and English through web-based WhatsApp and Twitter applications. The results confirm that social media applications write their data into the memory, regardless of the operating system running the application, with there being no major differences between Windows and Mac.

The Reproducibility and Repeatability of Modified Likelihood Ratio for Forensics Handwriting Examination

The forensic use of handwriting depends on the analysis, comparison, and evaluation decisions made by forensic document examiners. When using biometric technology in forensic applications, it is necessary to compute Likelihood Ratio (LR) for quantifying strength of evidence under two competing hypotheses, namely the prosecution and the defense hypotheses wherein a set of assumptions and methods for a given data set will be made. It is therefore important to know how repeatable and reproducible our estimated LR is. This paper evaluated the accuracy and reproducibility of examiners' decisions. Confidence interval for the estimated LR were presented so as not get an incorrect estimate that will be used to deliver wrong judgment in the court of Law. The estimate of LR is fundamentally a Bayesian concept and we used two LR estimators, namely Logistic Regression (LoR) and Kernel Density Estimator (KDE) for this paper. The repeatability evaluation was carried out by retesting the initial experiment after an interval of six months to observe whether examiners would repeat their decisions for the estimated LR. The experimental results, which are based on handwriting dataset, show that LR has different confidence intervals which therefore implies that LR cannot be estimated with the same certainty everywhere. Though the LoR performed better than the KDE when tested using the same dataset, the two LR estimators investigated showed a consistent region in which LR value can be estimated confidently. These two findings advance our understanding of LR when used in computing the strength of evidence in handwriting using forensics.

The Forensic Swing of Things: The Current Legal and Technical Challenges of IoT Forensics

The inability of organizations to put in place management control measures for Internet of Things (IoT) complexities persists to be a risk concern. Policy makers have been left to scamper in finding measures to combat these security and privacy concerns. IoT forensics is a cumbersome process as there is no standardization of the IoT products, no or limited historical data are stored on the devices. This paper highlights why IoT forensics is a unique adventure and brought out the legal challenges encountered in the investigation process. A quadrant model is presented to study the conflicting aspects in IoT forensics. The model analyses the effectiveness of forensic investigation process versus the admissibility of the evidence integrity; taking into account the user privacy and the providers’ compliance with the laws and regulations. Our analysis concludes that a semi-automated forensic process using machine learning, could eliminate the human factor from the profiling and surveillance processes, and hence resolves the issues of data protection (privacy and confidentiality).

Searching for Forensic Evidence in a Compromised Virtual Web Server against SQL Injection Attacks and PHP Web Shell

SQL injection is one of the most common types of attacks and has a very critical impact on web servers. In the worst case, an attacker can perform post-exploitation after a successful SQL injection attack. In the case of forensics web servers, web server analysis is closely related to log file analysis. But sometimes large file sizes and different log types make it difficult for investigators to look for traces of attackers on the server. The purpose of this paper is to help investigator take appropriate steps to investigate when the web server gets attacked. We use attack scenarios using SQL injection attacks including PHP backdoor injection as post-exploitation. We perform post-mortem analysis of web server logs based on Hypertext Transfer Protocol (HTTP) POST and HTTP GET method approaches that are characteristic of SQL injection attacks. In addition, we also propose structured analysis method between the web server application log file, database application, and other additional logs that exist on the webserver. This method makes the investigator more structured to analyze the log file so as to produce evidence of attack with acceptable time. There is also the possibility that other attack techniques can be detected with this method. On the other side, it can help web administrators to prepare their systems for the forensic readiness.

Digital Image Forensics: Discovering the History of Digital Images

Digital multimedia contents such as image, video, and audio can be tampered easily due to the availability of powerful editing softwares. Multimedia forensics is devoted to analyze these contents by using various digital forensic techniques in order to validate their authenticity. Digital image forensics is dedicated to investigate the reliability of digital images by analyzing the integrity of data and by reconstructing the historical information of an image related to its acquisition phase. In this paper, a survey is carried out on the forgery detection by considering the most recent and promising digital image forensic techniques.

Classification of Computer Generated Images from Photographic Images Using Convolutional Neural Networks

This paper presents a deep-learning mechanism for classifying computer generated images and photographic images. The proposed method accounts for a convolutional layer capable of automatically learning correlation between neighbouring pixels. In the current form, Convolutional Neural Network (CNN) will learn features based on an image's content instead of the structural features of the image. The layer is particularly designed to subdue an image's content and robustly learn the sensor pattern noise features (usually inherited from image processing in a camera) as well as the statistical properties of images. The paper was assessed on latest natural and computer generated images, and it was concluded that it performs better than the current state of the art methods.

An Analysis of Digital Forensic Laboratory Development among Malaysia’s Law Enforcement Agencies

Cybercrime is on the rise, and yet many Law Enforcement Agencies (LEAs) in Malaysia have no Digital Forensics Laboratory (DFL) to assist them in the attrition and analysis of digital evidence. From the estimated number of 30 LEAs in Malaysia, sadly, only eight of them owned a DFL. All of the DFLs are concentrated in the capital of Malaysia and none at the state level. LEAs are still depending on the national DFL (CyberSecurity Malaysia) even for simple and straightforward cases. A survey was conducted among LEAs in Malaysia owning a DFL to understand their history of establishing the DFL, the challenges that they faced and the significance of the DFL to their case investigation. The results showed that the while some LEAs faced no challenge in establishing a DFL, some of them took seven to 10 years to do so. The reason was due to the difficulty in convincing their management because of the high costs involved. The results also revealed that with the establishment of a DFL, LEAs were better able to get faster forensic result and to meet agency’s timeline expectation. It is also found that LEAs were also able to get more meaningful forensic results on cases that require niche expertise, compared to sending off cases to the national DFL. Other than that, cases are getting more complex, and hence, a continuous stream of budget for equipment and training is inevitable. The result derived from the study is hoped to be used by other LEAs in justifying to their management the benefits of establishing an in-house DFL.

Hash Based Block Matching for Digital Evidence Image Files from Forensic Software Tools

Internet use, intelligent communication tools, and social media have all become an integral part of our daily life as a result of rapid developments in information technology. However, this widespread use increases crimes committed in the digital environment. Therefore, digital forensics, dealing with various crimes committed in digital environment, has become an important research topic. It is in the research scope of digital forensics to investigate digital evidences such as computer, cell phone, hard disk, DVD, etc. and to report whether it contains any crime related elements. There are many software and hardware tools developed for use in the digital evidence acquisition process. Today, the most widely used digital evidence investigation tools are based on the principle of finding all the data taken place in digital evidence that is matched with specified criteria and presenting it to the investigator (e.g. text files, files starting with letter A, etc.). Then, digital forensics experts carry out data analysis to figure out whether these data are related to a potential crime. Examination of a 1 TB hard disk may take hours or even days, depending on the expertise and experience of the examiner. In addition, it depends on examiner’s experience, and may change overall result involving in different cases overlooked. In this study, a hash-based matching and digital evidence evaluation method is proposed, and it is aimed to automatically classify the evidence containing criminal elements, thereby shortening the time of the digital evidence examination process and preventing human errors.

Digital Forensics Compute Cluster: A High Speed Distributed Computing Capability for Digital Forensics

We have developed a distributed computing capability, Digital Forensics Compute Cluster (DFORC2) to speed up the ingestion and processing of digital evidence that is resident on computer hard drives. DFORC2 parallelizes evidence ingestion and file processing steps. It can be run on a standalone computer cluster or in the Amazon Web Services (AWS) cloud. When running in a virtualized computing environment, its cluster resources can be dynamically scaled up or down using Kubernetes. DFORC2 is an open source project that uses Autopsy, Apache Spark and Kafka, and other open source software packages. It extends the proven open source digital forensics capabilities of Autopsy to compute clusters and cloud architectures, so digital forensics tasks can be accomplished efficiently by a scalable array of cluster compute nodes. In this paper, we describe DFORC2 and compare it with a standalone version of Autopsy when both are used to process evidence from hard drives of different sizes.

Identity Verification Using k-NN Classifiers and Autistic Genetic Data

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Three Tier Indoor Localization System for Digital Forensics

Mobile localization has attracted a great deal of attention recently due to the introduction of wireless networks. Although several localization algorithms and systems have been implemented and discussed in the literature, very few researchers have exploited the gap that exists between indoor localization, tracking, external storage of location information and outdoor localization for the purpose of digital forensics during and after a disaster. The contribution of this paper lies in the implementation of a robust system that is capable of locating, tracking mobile device users and store location information for both indoor and partially outdoor the cloud. The system can be used during disaster to track and locate mobile phone users. The developed system is a mobile application built based on Android, Hypertext Preprocessor (PHP), Cascading Style Sheets (CSS), JavaScript and MATLAB for the Android mobile users. Using Waterfall model of software development, we have implemented a three level system that is able to track, locate and store mobile device information in secure database (cloud) on almost a real time basis. The outcome of the study showed that the developed system is efficient with regard to the tracking and locating mobile devices. The system is also flexible, i.e. can be used in any building with fewer adjustments. Finally, the system is accurate for both indoor and outdoor in terms of locating and tracking mobile devices.

Smartphone Video Source Identification Based on Sensor Pattern Noise

An increasing number of mobile devices with integrated cameras has meant that most digital video comes from these devices. These digital videos can be made anytime, anywhere and for different purposes. They can also be shared on the Internet in a short period of time and may sometimes contain recordings of illegal acts. The need to reliably trace the origin becomes evident when these videos are used for forensic purposes. This work proposes an algorithm to identify the brand and model of mobile device which generated the video. Its procedure is as follows: after obtaining the relevant video information, a classification algorithm based on sensor noise and Wavelet Transform performs the aforementioned identification process. We also present experimental results that support the validity of the techniques used and show promising results.

Development of a Software System for Management and Genetic Analysis of Biological Samples for Forensic Laboratories

Due to the high reliability reached by DNA tests, since the 1980s this kind of test has allowed the identification of a growing number of criminal cases, including old cases that were unsolved, now having a chance to be solved with this technology. Currently, the use of genetic profiling databases is a typical method to increase the scope of genetic comparison. Forensic laboratories must process, analyze, and generate genetic profiles of a growing number of samples, which require time and great storage capacity. Therefore, it is essential to develop methodologies capable to organize and minimize the spent time for both biological sample processing and analysis of genetic profiles, using software tools. Thus, the present work aims the development of a software system solution for laboratories of forensics genetics, which allows sample, criminal case and local database management, minimizing the time spent in the workflow and helps to compare genetic profiles. For the development of this software system, all data related to the storage and processing of samples, workflows and requirements that incorporate the system have been considered. The system uses the following software languages: HTML, CSS, and JavaScript in Web technology, with NodeJS platform as server, which has great efficiency in the input and output of data. In addition, the data are stored in a relational database (MySQL), which is free, allowing a better acceptance for users. The software system here developed allows more agility to the workflow and analysis of samples, contributing to the rapid insertion of the genetic profiles in the national database and to increase resolution of crimes. The next step of this research is its validation, in order to operate in accordance with current Brazilian national legislation.

Towards a Proof Acceptance by Overcoming Challenges in Collecting Digital Evidence

Cybercrime investigation demands an appropriated evidence collection mechanism. If the investigator does not acquire digital proofs in a forensic sound, some important information can be lost, and judges can discard case evidence because the acquisition was inadequate. The correct digital forensic seizing involves preparation of professionals from fields of law, police, and computer science. This paper presents important challenges faced during evidence collection in different perspectives of places. The crime scene can be virtual or real, and technical obstacles and privacy concerns must be considered. All pointed challenges here highlight the precautions to be taken in the digital evidence collection and the suggested procedures contribute to the best practices in the digital forensics field.

A Method to Enhance the Accuracy of Digital Forensic in the Absence of Sufficient Evidence in Saudi Arabia

Digital forensics seeks to achieve the successful investigation of digital crimes through obtaining acceptable evidence from digital devices that can be presented in a court of law. Thus, the digital forensics investigation is normally performed through a number of phases in order to achieve the required level of accuracy in the investigation processes. Since 1984 there have been a number of models and frameworks developed to support the digital investigation processes. In this paper, we review a number of the investigation processes that have been produced throughout the years and introduce a proposed digital forensic model which is based on the scope of the Saudi Arabia investigation process. The proposed model has been integrated with existing models for the investigation processes and produced a new phase to deal with a situation where there is initially insufficient evidence.

A Real-Time Image Change Detection System

Detecting changes in multiple images of the same scene has recently seen increased interest due to the many contemporary applications including smart security systems, smart homes, remote sensing, surveillance, medical diagnosis, weather forecasting, speed and distance measurement, post-disaster forensics and much more. These applications differ in the scale, nature, and speed of change. This paper presents an application of image processing techniques to implement a real-time change detection system. Change is identified by comparing the RGB representation of two consecutive frames captured in real-time. The detection threshold can be controlled to account for various luminance levels. The comparison result is passed through a filter before decision making to reduce false positives, especially at lower luminance conditions. The system is implemented with a MATLAB Graphical User interface with several controls to manage its operation and performance.

The Use of Ontology Framework for Automation Digital Forensics Investigation

One of the main goals of a computer forensic analyst is to determine the cause and effect of the acquisition of a digital evidence in order to obtain relevant information on the case is being handled. In order to get fast and accurate results, this paper will discuss the approach known as Ontology Framework. This model uses a structured hierarchy of layers that create connectivity between the variant and searching investigation of activity that a computer forensic analysis activities can be carried out automatically. There are two main layers are used, namely Analysis Tools and Operating System. By using the concept of Ontology, the second layer is automatically designed to help investigator to perform the acquisition of digital evidence. The methodology of automation approach of this research is by utilizing Forward Chaining where the system will perform a search against investigative steps and atomically structured in accordance with the rules of the Ontology.

CVOIP-FRU: Comprehensive VoIP Forensics Report Utility

Voice over Internet Protocol (VoIP) products is an emerging technology that can contain forensically important information for a criminal activity. Without having the user name and passwords, this forensically important information can still be gathered by the investigators. Although there are a few VoIP forensic investigative applications available in the literature, most of them are particularly designed to collect evidence from the Skype product. Therefore, in order to assist law enforcement with collecting forensically important information from variety of Betamax VoIP tools, CVOIP-FRU framework is developed. CVOIP-FRU provides a data gathering solution that retrieves usernames, contact lists, as well as call and SMS logs from Betamax VoIP products. It is a scripting utility that searches for data within the registry, logs and the user roaming profiles in Windows and Mac OSX operating systems. Subsequently, it parses the output into readable text and html formats. One superior way of CVOIP-FRU compared to the other applications that due to intelligent data filtering capabilities and cross platform scripting back end of CVOIP-FRU, it is expandable to include other VoIP solutions as well. Overall, this paper reveals the exploratory analysis performed in order to find the key data paths and locations, the development stages of the framework, and the empirical testing and quality assurance of CVOIP-FRU.

High Speed Bitwise Search for Digital Forensic System

The most common forensic activity is searching a hard disk for string of data. Nowadays, investigators and analysts are increasingly experiencing large, even terabyte sized data sets when conducting digital investigations. Therefore consecutive searching can take weeks to complete successfully. There are two primary search methods: index-based search and bitwise search. Index-based searching is very fast after the initial indexing but initial indexing takes a long time. In this paper, we discuss a high speed bitwise search model for large-scale digital forensic investigations. We used pattern matching board, which is generally used for network security, to search for string and complex regular expressions. Our results indicate that in many cases, the use of pattern matching board can substantially increase the performance of digital forensic search tools.