Overview

Entroy is obviously useful and tells us information about the binary. However, can we use entropy alone to determine if a sample is packed or not. Let's define the parameters for our study.

Without looking at the binary in IDA (or your RE tool of choice) can you use entropy to determine if the sample is packed?
We are defining "packed" as a file that contains an encrypted, or compressed payload, where our analysis goals are to analyze the payload and not the packer.
Are there specific data in the binary that can be tested for entopy that will give use a better answer than testing the full binary? For example, looking at the entropy of sections, or of resources.

References

Our Problem

Based on entropy can we make a decision about whether or not to open a sample in IDA by looking at entropy alone. If we cannot make this decision we will have to open the binary in IDA so why bother looking at entropy at all?

Our Study

For our study we are going to collect a set of known packed, and unpacked (payload) samples to use as our ground truth. We will then run differnt types of entropy calculation on the binaries and look for a common cutoff where we could made a decision that the samples are packed/unpacked. If the cutoff is such that we cannot classify all samples within an error margin of ERROR-RATE-TBD then we can conclude that entropy will not consistantly answer our problem statement.

Tools

We are going to use bintropy and the standard section entropy calculation from pefile as our two tools.

import pefile
import bintropy

def pe_test(file_path, all_sections=True):
    #
    # We will test the entropy of the non-executable sections
    # and return the largest entropy value
    #
    pe = pefile.PE(file_path)
    entropy_list = []
    for s in pe.sections:
        if all_sections:
            entropy_list.append(s.get_entropy())
        elif not s.IMAGE_SCN_CNT_CODE:
            entropy_list.append(s.get_entropy())
            
    if len(entropy_list) == 0:
        return 0
    return max(entropy_list)


def is_dotnet(file_path):
    pe = pefile.PE(file_path)
    isDotNet = pe.OPTIONAL_HEADER.DATA_DIRECTORY[14]
    if isDotNet.VirtualAddress == 0 and isDotNet.Size == 0:
        return False
    else:
        return True
        

def bintropy_test(file_path, get_average=True):
    h_e, av_e = bintropy.bintropy(file_path, decide=False)
    if get_average:
        return av_e
    else:
        return h_e

UNPACKED_DIR = '/tmp/unpacked'
PACKED_DIR = '/tmp/packed'

file_path = '/tmp/packed/ff5ac0eb80d90c6a2a46a4133fc8d90cd165b8b2bac1cbaa8fadd35b186bd5c8.bin'

#
# threshold avg packed 6.677
# threshold highest packed 7.199
#


# test all sections
print("\ntesting all sections")
pe = pefile.PE(file_path)
for s in pe.sections:
    print(f" is code: {s.IMAGE_SCN_CNT_CODE} -- {s.get_entropy()}")


        
# test the pe method only data
print("\ntest highest entropy from data sections")
print(pe_test(file_path, all_sections=False))

# test the pe method all sections
print("\ntest highest entropy from all sections")
print(pe_test(file_path, all_sections=True))


# test bintropy average
print("\ntest bintropy average")
print(bintropy_test(file_path, get_average=True))


# test bintropy average
print("\ntest bintropy highest")
print(bintropy_test(file_path, get_average=False))

testing all sections
 is code: True -- 7.6981527898958
 is code: False -- 5.395082466102808
 is code: False -- 2.3168484674576013
 is code: False -- 0.020393135236084953
 is code: False -- 4.823677517350269

test highest entropy from data sections
5.395082466102808

test highest entropy from all sections
7.6981527898958

test bintropy average
6.267216812715635

test bintropy highest
7.796875

First Test of All Methods

import os
from rich.console import Console
from rich.table import Table



# assign directory
directory = PACKED_DIR
 
# iterate over files in
# that directory


table = Table(title="Packed Samples", expand=True)
table.add_column("file", justify="center", no_wrap=True, )
table.add_column(".NET", justify="center", no_wrap=True, )
table.add_column("pe data", justify="center", no_wrap=True)
table.add_column("pe all", justify="center", no_wrap=True)
table.add_column("pe all\npacked @ 7", justify="center", no_wrap=True)
table.add_column("bin ave", justify="center", no_wrap=True)
table.add_column("bin all", justify="center", no_wrap=True)
table.add_column("bin all\npacked @ 7", justify="center", no_wrap=True)
table.add_column("bin t/f", justify="center", no_wrap=True)

for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    # checking if it is a file
    if os.path.isfile(f):
        file_path = f
        pe_data =  pe_test(file_path, all_sections=False)
        pe_all = pe_test(file_path, all_sections=True)
        bin_ave = bintropy_test(file_path, get_average=True)
        bin_all = bintropy_test(file_path, get_average=False)
        bin_tf = bintropy.bintropy(file_path)
        dotnet = is_dotnet(file_path)
        table.add_row(filename[:5], str(dotnet), 
                      str(pe_data)[:4], 
                      str(pe_all)[:4], 
                      str(True if pe_all > 7 else False), 
                      str(bin_ave)[:4], 
                      str(bin_all)[:4],
                      str(True if bin_all > 7 else False),
                      str(bin_tf))
        

console = Console()
console.print(table)

                                       Packed Samples                                        
┏━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃        ┃       ┃         ┃        ┃   pe all   ┃         ┃         ┃  bin all   ┃         ┃
┃  file  ┃ .NET  ┃ pe data ┃ pe all ┃ packed @ 7 ┃ bin ave ┃ bin all ┃ packed @ 7 ┃ bin t/f ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ 5b39d  │ True  │  3.46   │  7.85  │    True    │  6.82   │  7.31   │    True    │  True   │
│ 2d83e  │ False │  7.98   │  7.98  │    True    │  6.87   │  7.34   │    True    │  True   │
│ 54fa2  │ False │  2.85   │  7.95  │    True    │  6.86   │  7.28   │    True    │  True   │
│ eca1c  │ False │  7.89   │  7.89  │    True    │  6.07   │  7.30   │    True    │  False  │
│ 8cb98  │ False │   0.0   │  7.99  │    True    │  6.86   │  7.31   │    True    │  True   │
│ f2354  │ True  │  4.22   │  7.81  │    True    │  6.77   │  7.29   │    True    │  True   │
│ b9aba  │ False │  7.17   │  7.17  │    True    │  6.14   │  7.01   │    True    │  False  │
│ d5c2e  │ True  │  2.65   │  7.85  │    True    │  6.28   │  7.31   │    True    │  False  │
│ 3493e  │ True  │  3.70   │  6.21  │   False    │  3.92   │  7.27   │    True    │  False  │
│ 23b04  │ False │  5.93   │  6.55  │   False    │  5.36   │  6.91   │   False    │  False  │
│ 080f1  │ False │  5.52   │  7.46  │    True    │  5.89   │  7.79   │    True    │  False  │
│ a7084  │ True  │  4.10   │  7.86  │    True    │  6.84   │  7.31   │    True    │  True   │
│ 60d89  │ False │  5.25   │  7.77  │    True    │  5.71   │  7.32   │    True    │  False  │
│ 35196  │ False │  7.25   │  7.25  │    True    │  5.44   │  7.38   │    True    │  False  │
│ ff5ac  │ False │  5.39   │  7.69  │    True    │  6.26   │  7.79   │    True    │  False  │
│ 3664a  │ True  │  4.51   │  4.51  │   False    │  4.20   │  5.32   │   False    │  False  │
│ 0d333  │ False │  7.99   │  7.99  │    True    │  6.50   │  7.45   │    True    │  False  │
│ 35196  │ False │  7.25   │  7.25  │    True    │  5.44   │  7.38   │    True    │  False  │
│ 31323  │ False │    0    │  7.99  │    True    │  7.13   │  7.32   │    True    │  True   │
│ 44824  │ False │  7.74   │  7.74  │    True    │  5.80   │  7.79   │    True    │  False  │
│ 3d4f0  │ True  │  6.51   │  7.86  │    True    │  6.79   │  7.30   │    True    │  True   │
│ b707d  │ False │   0.0   │  6.45  │   False    │  5.37   │  7.42   │    True    │  False  │
│ 6966f  │ False │  7.12   │  7.12  │    True    │  5.41   │  7.36   │    True    │  False  │
│ 5139a  │ True  │  2.39   │  7.87  │    True    │  6.82   │  7.35   │    True    │  True   │
│ 6e6e5  │ True  │  7.71   │  7.94  │    True    │  6.83   │  7.33   │    True    │  True   │
│ 0e4f3  │ False │  7.80   │  7.80  │    True    │  6.01   │  7.79   │    True    │  False  │
└────────┴───────┴─────────┴────────┴────────────┴─────────┴─────────┴────────────┴─────────┘

import os
from rich.console import Console
from rich.table import Table



# assign directory
directory = UNPACKED_DIR
 
# iterate over files in
# that directory


table = Table(title="Unpacked Samples", expand=True)
table.add_column("file", justify="center", no_wrap=True, )
table.add_column(".NET", justify="center", no_wrap=True, )
table.add_column("pe data", justify="center", no_wrap=True)
table.add_column("pe all", justify="center", no_wrap=True)
table.add_column("pe all\npacked @ 7", justify="center", no_wrap=True)
table.add_column("bin ave", justify="center", no_wrap=True)
table.add_column("bin all", justify="center", no_wrap=True)
table.add_column("bin all\npacked @ 7", justify="center", no_wrap=True)
table.add_column("bin t/f", justify="center", no_wrap=True)

for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    # checking if it is a file
    if os.path.isfile(f):
        file_path = f
        pe_data =  pe_test(file_path, all_sections=False)
        pe_all = pe_test(file_path, all_sections=True)
        bin_ave = bintropy_test(file_path, get_average=True)
        bin_all = bintropy_test(file_path, get_average=False)
        bin_tf = bintropy.bintropy(file_path)
        dotnet = is_dotnet(file_path)
        table.add_row(filename[:5], str(dotnet), 
                      str(pe_data)[:4], 
                      str(pe_all)[:4], 
                      str(True if pe_all > 7 else False) , 
                      str(bin_ave)[:4], 
                      str(bin_all)[:4],
                      str(True if bin_all > 7 else False) ,
                      str(bin_tf))
        

console = Console()
console.print(table)

                                      Unpacked Samples                                       
┏━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┓
┃        ┃       ┃         ┃        ┃   pe all   ┃         ┃         ┃  bin all   ┃         ┃
┃  file  ┃ .NET  ┃ pe data ┃ pe all ┃ packed @ 7 ┃ bin ave ┃ bin all ┃ packed @ 7 ┃ bin t/f ┃
┡━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━┩
│ cf5f9  │ True  │  3.51   │  5.65  │   False    │  4.51   │  5.37   │   False    │  False  │
│ 1b88e  │ False │    0    │  7.40  │    True    │  6.15   │  7.31   │    True    │  False  │
│ 2333c  │ False │  6.69   │  6.69  │   False    │  5.28   │  6.34   │   False    │  False  │
│ 7b277  │ True  │  0.00   │  1.81  │   False    │  6.04   │  7.30   │    True    │  False  │
│ 2b3d3  │ False │  5.04   │  6.18  │   False    │  5.06   │  6.15   │   False    │  False  │
│ 967bd  │ False │  7.87   │  7.87  │    True    │  5.65   │  7.32   │    True    │  False  │
│ 931a0  │ False │  7.87   │  7.87  │    True    │  5.65   │  7.32   │    True    │  False  │
│ 72f63  │ False │  4.65   │  6.59  │   False    │  5.55   │  7.11   │    True    │  False  │
│ ee75b  │ False │  6.59   │  6.62  │   False    │  5.43   │  7.79   │    True    │  False  │
│ 65be9  │ False │  3.69   │  5.20  │   False    │  4.73   │  5.22   │   False    │  False  │
│ 0f001  │ False │  6.52   │  6.70  │   False    │  5.26   │  7.79   │    True    │  False  │
│ dc305  │ False │  1.26   │  5.73  │   False    │  5.58   │  7.07   │    True    │  False  │
│ 82c44  │ False │    0    │  7.55  │    True    │  5.02   │  7.32   │    True    │  False  │
│ 1d69f  │ False │  4.41   │  6.58  │   False    │  5.52   │  6.98   │   False    │  False  │
│ 75bae  │ False │  6.45   │  6.45  │   False    │  5.12   │  7.27   │    True    │  False  │
│ e8ad8  │ False │    0    │  7.27  │    True    │  6.03   │  7.32   │    True    │  False  │
│ 6057d  │ False │  5.05   │  5.99  │   False    │  5.07   │  5.86   │   False    │  False  │
│ ed6eb  │ True  │  7.99   │  7.99  │    True    │  5.98   │  7.31   │    True    │  False  │
│ 967bd  │ False │  7.87   │  7.87  │    True    │  5.65   │  7.32   │    True    │  False  │
│ 443bb  │ False │  6.58   │  6.58  │   False    │  4.43   │  7.71   │    True    │  False  │
│ 4076a  │ True  │  4.88   │  5.52  │   False    │  4.22   │  5.65   │   False    │  False  │
│ f1c92  │ False │  7.52   │  7.52  │    True    │  5.69   │  7.30   │    True    │  False  │
│ 0a5cd  │ False │  6.65   │  6.65  │   False    │  5.40   │  7.79   │    True    │  False  │
│ 0ffc3  │ False │    0    │  7.41  │    True    │  6.15   │  7.32   │    True    │  False  │
│ 31a4d  │ False │  6.60   │  6.60  │   False    │  4.42   │  7.49   │    True    │  False  │
└────────┴───────┴─────────┴────────┴────────────┴─────────┴─────────┴────────────┴─────────┘

Conclusions

Entropy May Not Be Useful for .NET

For .NET in atleast one sample 3664a0db89a9f1a8bf439d8117943d3e042abe488a761dc6c8e18b90d6081298 which was packed had an entropy in the 4 range but when unpacked 2333c19020f6e928198cea31c05dd685055991c921f3a1cd32ad9817b6c704e6 the entropy was in the 6 range. From this we can conclude that entropy may have no relation to the packed status of a .NET binary.

Bintropy Has A Poor Packed Detection Rate with Default Values

For detecting packed samples using the default threashold the failure rate was 16/26, however there were no unpacked false positives. One conclusion we can draw from this is that the tool can the relied on when it detects a packed sample (ie, the sample is probably packed) but it cannot be relied on for a decision, as it has a high false negative rate. This could be used as a filter, but not as a decision metric.

General Conclusions

The results from our small "pseudoscientific" study do not match the results from the two academic papers refernced in our overview (99% and 97% accuracy). In our study we had a high fidelity when detecting packed sample (ie. if it was detected as packed it was in fact packed) however, we also have very high false negative rates (ie. if it was detected as not-packed there was an over 50% chance that it was actually packed). From this we can draw two conclusions, obviously the first, this was not a scientific study (more data needed) and the second, we cannot use entropy alone for packer identification, it must be combined with other metrics.