Overview

We need a repeatable method to identify and markup the visual c++ types.

References

Rolf STL Types Script

Rolf has a nice IDA script that will add some STL structs to IDA for us STLTypes-ForDistribution.py. To use the script simply run it in IDA, then use MakeListTypes(DWORD) in the Python CLI to define the structs. Once the structs are defined you still need to manually apply them to the types.

** In the end it may be that we should be doing this dynamically anyway (from the man who would know) Automation Techniques in C++ Reverse Engineering

HexRaysPyTools (Mishap fork)

There is also a modified version of the HexRaysPyTools plugin for IDA that can be used to apply some STL types HexRaysPyTools.

Our Approach

We are going to use a contrived example but hopefully create a repeatable process.

Example. 9c7fa766649f100e7d2f17f1415782908182e719dd90abf37e69039088f052b6 malshare

Identify the MSVC version

Appearently the STL type definitions change based on the version of MSVC that is used. To make sure are applying the correct types we need to first make sure we have the right version.

Assumptions

  • We can use Detect it Easy DiE to figure out the version.
  • We might be able to improve on the DiE signature and make some standalone sigs.

A Working Approach

Die (and many other PE analysis tools) simply read the PE rich header info and extract the version information form this. We can read the rich header info as good as anyone, let's just do this directly.

WARNING This approach relies on the rich header info of the PE both being intact and being correct (unmodified). If the header info is mangled or modified we cannot use this approach. In these cases it may be necissary to try something more dynamic?

Compiler/Liker References

Example

From PEStudio: Utc1900_CPP,Visual Studio 2015 - 14.0,24 - this is C++ 14.

Import the correct type structs into IDA

Translating the type definitions into structs does not seem trivial! It's template madeness! We do have the type definitions for STL Microsoft STL but we need some way to parse them and we need some way to differentiate between versions.

A Working Approach

This is where the Holy Bible of MSVC types comes in. But we still have a parsing/translation problem.

TODO - More Research Needed We need a tool/process to "compile" the template defs into structs keeping intact all variations of the template. I'm not sure if this is possible, or if something like this exists currently? For now we are doing this manually, NOT IDEAL!

  • One idea might be to compile dummy files and import the pdb to get the type info?

Example

We have located the def for the string type for our compiler version 14.0,24 std::string. This def is the same as the one we have already defined in the fork of HexRaysPyTools but this was done manually (not ideal).

Locate the type helper functions

Assumptions

  • If we compile a "dummy" PE with a bunch of STL types, and no opimization we might be able to use BinDiff to ID some of the helper functions. At a minimum we can use the PDB to import the correct struct definitions.
    • My assumption is that even though the target will be optimized (destorying many of the helper functions) we only need one good one to ID the argument types and we can propogate them backwards once we identify one!
  • We can use the existing FLIRT sigs (built into IDA) and maybe create our own (or steal some from github) but this is again at the mercy of compiler optimization. WE have noticed that when IDA identifies these functions the function prototype is incorrect because they do not have the correct STL type defined.

A Working Approach

Sig approach with FLIRT

There is some limited success with this, the default IDA sigs do pick up some helper functions which can be used to ID and propogate the function types (this is a good start, and free work!). In addition to the built in IDA FLIRT we will also try these sigs from Mandiant siglib. These sigs also seem to have some success in identifying some helper functions for the types which we can use to ID the type (argument) and then propogate that type to the other functions.

Neither of these have a high success rate due to compiler optimization (nor should any signature based method).

Bindiff Approach

This is a complete bust! Optimization causes too much change in the code for us to realistically be able to match much (some FPs etc.)

TODO This seems like an unsolved problem currently. The best we can do is some signature matching... maybe we could build a big enough signature db? We need to research other approaches that have/haven't been tried.

Apply the types in IDA

Assumptions

  • We can either apply these directly if our "helper function" identification process works or
  • We can try the shape identification from the forked HexRaysPyTools to automatically identify known types

A Working Approach

  1. Use the helper functions that we have identified in the previous steps to locate variables (arguments) that we know the type of
  2. Apply the type to these variables
  3. "Back-propogate" the type information for each identified variable (this is somewhat automated by the fork of HexRaysPyTools).

Future Research

  • If the PE rich header is missing or mangled what other ways can we use to identify the MSVC version?
  • We need a tool/process to "compile" the template defs into structs that can be imported into IDA
    • One approach might be to use PDB files from dummy compiled PEs
  • When attempting to locate helper functions using FLIRT sigs we are limited due to compiler optimization, could we build a FLIRT db big enough to handle all optimiztion paths? Is it realistic, how much is inlined and can't be sigged?