Unless you have had your head under a rock for the past year or so, then you have probably run into ransomware in one form or another. Be it getting that late night call from a family member or receiving a ticket from the service desk saying “we have been getting calls that people have been unable to open excel files - is the server broken?” (Because it’s always the server - always)
I have come across ransomware multiple times at my job. My first exposure to ransomware was one of the first widely “popular” ransomware variants called cryptolocker. Unfortunately my first exposure was also when one my employer’s largest clients was hit. More than 50% of their file server was encrypted - several hundred gigs and hundreds of thousands of files. At that time, I honestly felt like i was going down a series of rapids while on a raft with no paddle.
Compare that to now. When a ticket comes up from the service desk, I punch a few commands into a terminal, walk away, get lunch, and come back to a restored file system. What I want to explore here is how I got from going down the rapids with no life vest and no paddle to instead feeling like I’m wading through the kiddie pool each time ransomware hits.
This post will not go over the restoration methods used for the files. I am reserving that for when I finally post the code to github. Instead what I want to talk about here is how does one use powershell to detect encrypted files on a filesystem.
From the interaction that I have had ransomware tends to fall into four main categories (when encrypting files)
- Append an extension to the end of a file name (Ex: Telsacrypt)
- Leave the file name alone, but encrypt each file differently (Ex: Cryptolocker)
- Leave the file name alone, and encrypt each file exactly the same (Ex: Cryptowall)
- Change the file name to a ransom file name with the same extension of each file (Ex: Locky)
So with that information how can one determine what class of ransomware encryption has currently encrypted the file system? Numbers 1 and 4 are fairly obvious - at least to the eye. Getting that list through a script is trickier for number 4 but not impossible - but when your goal is to restore those files it is possible to do process without having a list of the encrypted files - and I will tackle that when the code is released to Github next week sometime. Numbers 2 and 3, however, are tricky. And unfortunately, cryptolocker uses method 2 which is by far the rarest and trickiest method to analyze for. So how do you determine what files are encrypted by ransomwares that use methods 2 or 3?
File what nows?
File signatures, also sometimes called magic numbers, are the text representation of the first few bytes of a binary based file. So for example, if you wanted to validate that a Microsoft Office 2010 file is really a Microsoft Office 2010 file you could look at the first few bytes of the file through specified means and see what data is returned. If the data returned matches known bytes of other Office 2010 documents, then this is a valid office file.
Most methods look just at the text representation of the hex value to validate the signature. While going through the signature database I noticed some potential issues just using the text values. Based on some other research, I finally found a signature method that seems to be extremely reliable. This method uses both the hex and text representation to validate the file signatures. What is nice about this method is that it can be extended to look beyond just validating file types but also find all files encrypted by ransomware that uses method 3 for encrypting files.
So how does this look in action?
When working to find cryptolocker (type 2) encrypted files we need to analyze all files and determine if the first four bytes of the file matches any of our known good signatures. If none of the signatures match, then we need to list this file as being encrypted. This method is the least reliable of all the methods because it depends on you having a complete listing of ALL potential file signatures. Custom applications or vendor specific encodings of files will more than likely not be in any public database of magic numbers. However this gives you an initial list of encrypted file that you can then par down if need be.
When working to find cryptowall (type 3) encrypted files we need to look at two of the encrypted file and extract the first four bytes to generate the signature that is present on both files. The signatures are then compared to ensure that they are the same. If they are the same, this ensures that this is a type 3 ransomware infection (if they do not match then this is a type 2, cryptolocker, infection). After the signature is determined, then we need to scan all the files on the file server and find all the files which have the same signature.
So how does one get the signature of a file using both a hex and the text representation of the hex? The code is attached below to get the magic numbers of a file.
Settings the variable $FirstFilePath= “c:\doc.docx” and then running this code results in the value 504b 0304 ‘P K . .’ being assigned to the variable $contentstring_firstfile If you look at the magic number database (http://www.garykessler.net/library/file_sigs.html) you can see that this signature represents a lot of potentially valid files (and most of these are based on the zip format) but this indicates that this is VALID. This is a simple example. A more complex example is the signature: 0902 0600 ‘. . . .’ which represents a file done by one of our vendors. If we just used the text representation this would map to multiple file types if using this only. By using both the hex and text we are able more specifically identify these files and scan for these signatures.
So now that we have the signatures that we are looking for (either valid file types for type 2 or the encryption header for type 3) we can then use a few simple commands to find ALL the encrypted files on the file server. The flow for this would be as follows
- Recurse through the file system starting at a specific root path looking at files only (we don’t care about directories in this case).
- For each file that is found we then get the magic number of the file.
- If we are looking at type 2 ransomware we then take that signature and compare it against a list of known good file signatures (this can be stored however you like) and if the file is found in this list then it is not encrypted. If it is not found on this list then the file more than likely is encrypted. If we are looking at type 3 ransomware then if the file magic number matches the magic number produced earlier the file is encrypted. In both cases the file identified as encrypted is logged in one manor or another and then restored later on.
Using magic numbers, we are able to quickly find the encrypted files. Restoring those files is a different matter - and may need to be customized for your specific environment. Next week - I’ll be publishing to github the code for both finding and (if your using netapp snapshots) restoring your files automatically.