Digital investigator’s commonly utilized one-way hash technologies MD5 or SHA varieties to generate unique mathematical signatures of known files.
Traditionally, hashing is performed during postmortem forensic investigations and is used to maintain evidence integrity, as well as to identify known files (known good or known hostile).
Autonomous Hashing (over the wire or during direct overt or covert interactions) the process of collecting hash values from live running systems can significantly speed the identification of known threats and known files that users should or shouldn’t possess. Performance enhancement is obtained by performing the hashing function utilizing the target machines computing resources, in other words off-loading the processing to the target. This approach has two important benefits, first the content of the files, directories or drives being hashed don’t pass over the network which could potentially expose (if not encrypted) proprietary data. Secondarily, the performance is dramatically improved especially if multiple targets are being processed simultaneously, network traffic congestion is reduced.
Autonomous Hashing is accomplished by pushing a small software agent to the target(s) machine (credentialed access to the target under investigation is required to accomplish this, or the agent must be installed a priori). The hashing agent is then instructed to gather hashes from the target machine and report back results when completed. The agent can be instructed to collect hashes from all drives and devices permanently or temporality attached, searches can further be restricted to specific directories or file types. This can include USB or Firewire drives, local or remote network drives, mounted or encrypted file systems. Once the collection of hashes (and associated file attributes) is completed the agent delivers a report back to the investigator workstation with the result. It most cases this report is delivered as a compressed and encrypted XML document that is ready for post processing by the investigator. The reason this document is encrypted, is to prevent the disclosure of file system data collected by the agent. Even though the file contents are not included in this report, file system information contained in the report still may contain proprietary data that requires protection.
Post processing of the resulting discovery provides investigators a wealth of data regarding the target. Obviously a file system inventory may reveal recent documents, population of images, audio files, movies, application data, documents etc. In addition, based on the Hash values collected a comparison of hashes collected to known good (operating system programs, application files, development tools) or known bad (rootkits, password crackers, botnet files, trojan horse, encryption, steganography, key loggers etc.) can be identified. In addition, to the known good or bad files identified in such a discovery, files containing proprietary data could be identified based on the hash files, known file names or known partial hashes.
One of the criticisms of utilizing autonomous agents that execute on the target platform is the potential untrustworthiness of the Operating System (OS) of the target. Developers of autonomous discovery technologies are certainly aware of the threats posed by root kits and other malicious code that can intercept OS calls and circumvent the discovery of hidden directories or files. Without revealing the specific details of the countermeasure that developers employ to overcome these hooks, it is safe to say that self inspection of the operating environment is critical to effective autonomous hashing software. This implies that the software must perform a thorough inspection and determine whether core API calls that will be used can be judged safe. In addition to the trustworthiness concerns, anxiety over agent modifications of target evidence that would bring into question the efficacy of the discovery in court. This argument is certainly a valid concern and the responsibility of those engaged in the development of such agents must be considered from the top down. For example great care must be taken to audit every operation and potential modification that the agent may cause. In addition, time stamping (from a trusted source) should be included in robust solutions in order to prove the exact time the snap shot of the file system and when collection of the hash values occurred. Since the target machine is running before, during and after the discovery, at the very next moment the file system is likely to have changed, this is especially important when collecting hashes across multiple targets potentially existing in differing time zones.
It is clear that autonomous hashing and live discovery technologies are advancing rapidly and provide value and expediency for investigators. It is important as we advance these solutions we consider not only what we collect, but also engineer solutions that can prove what we collected, where we collected it, when we collected it and by whom it was collected.