SoulSearcher Worm
What is going on with all these SoulSearcher samples
Overview
We have a Malpedia Yara rule for the targetted campaign dubbed "SoulSearcher" detailed in a Fortinet blog that is matching on thousands of samples!
These samples seem to be very similar (same metadata etc.) but have different hashes. Our assumption is that these samples are part of some self-propogating worm, but we need to figure out...
1) Is this worm-set of samples related to SoulSearcher 2) What is going on with the worm-set of samples? Are they actually a worm or something else?
Malpedia Yara Rule
rule win_soul_auto {
meta:
author = "Felix Bilstein - yara-signator at cocacoding dot com"
date = "2023-01-25"
version = "1"
description = "Detects win.soul."
info = "autogenerated rule brought to you by yara-signator"
tool = "yara-signator v0.6.0"
signator_config = "callsandjumps;datarefs;binvalue"
malpedia_reference = "https://malpedia.caad.fkie.fraunhofer.de/details/win.soul"
malpedia_rule_date = "20230124"
malpedia_hash = "2ee0eebba83dce3d019a90519f2f972c0fcf9686"
malpedia_version = "20230125"
malpedia_license = "CC BY-SA 4.0"
malpedia_sharing = "TLP:WHITE"
/* DISCLAIMER
* The strings used in this rule have been automatically selected from the
* disassembly of memory dumps and unpacked files, using YARA-Signator.
* The code and documentation is published here:
* https://github.com/fxb-cocacoding/yara-signator
* As Malpedia is used as data source, please note that for a given
* number of families, only single samples are documented.
* This likely impacts the degree of generalization these rules will offer.
* Take the described generation method also into consideration when you
* apply the rules in your use cases and assign them confidence levels.
*/
strings:
$sequence_0 = { 732f 8b55fc 3b5510 7327 8b55b0 8b5dec }
// n = 6, score = 400
// 732f | mov ecx, dword ptr [ebp - 0x18]
// 8b55fc | mov dword ptr [ebp - 0x2c], ecx
// 3b5510 | mov ecx, dword ptr [ebp - 0x3c]
// 7327 | mov dword ptr [ebp - 0x30], edx
// 8b55b0 | add ebx, edi
// 8b5dec | mov edi, dword ptr [ebp - 0x44]
$sequence_1 = { c1e90b 0faf4df0 3bf1 7306 }
// n = 4, score = 400
// c1e90b | mov eax, dword ptr [ebp + 8]
// 0faf4df0 | cmp eax, dword ptr [ebp - 4]
// 3bf1 | jae 0x202
// 7306 | movzx ecx, byte ptr [eax]
$sequence_2 = { d3e2 8515???????? 7405 e8???????? }
// n = 4, score = 400
// d3e2 | shl edx, cl
// 8515???????? |
// 7405 | je 7
// e8???????? |
$sequence_3 = { 81c7680a0000 81fa00000001 7318 3b45fc }
// n = 4, score = 400
// 81c7680a0000 | mov word ptr [edi], bx
// 81fa00000001 | add ecx, ecx
// 7318 | not edx
// 3b45fc | jmp 0x21
$sequence_4 = { 03d2 668939 8d7cd104 c745f800000000 c745e408000000 e9???????? 2bc7 }
// n = 7, score = 400
// 03d2 | dec eax
// 668939 | lea ecx, [ebp - 0x10]
// 8d7cd104 | je 0x5e
// c745f800000000 | jne 0x55
// c745e408000000 | dec eax
// e9???????? |
// 2bc7 | lea ecx, [0x1b354]
$sequence_5 = { ff25???????? ff25???????? 48895c2408 4889742410 }
// n = 4, score = 400
// ff25???????? |
// ff25???????? |
// 48895c2408 | dec eax
// 4889742410 | mov dword ptr [esp + 8], ebx
$sequence_6 = { 8b5dec 8b7d08 e9???????? 5f 5e b801000000 5b }
// n = 7, score = 400
// 8b5dec | add ecx, edx
// 8b7d08 | mov edx, ecx
// e9???????? |
// 5f | mov ecx, dword ptr [ebp - 8]
// 5e | mov word ptr [ecx + edi*2], dx
// b801000000 | cmp eax, dword ptr [ebp - 4]
// 5b | jae 0x1ff
$sequence_7 = { 5d c3 57 eb05 }
// n = 4, score = 400
// 5d | movzx ecx, byte ptr [eax]
// c3 | shl esi, 8
// 57 | mov dword ptr [ebp - 0x20], eax
// eb05 | mov eax, dword ptr [edi + 0x40]
$sequence_8 = { b801000000 e9???????? 8b4f24 8b5f38 3bcb 7305 8b4728 }
// n = 7, score = 400
// b801000000 | lea edi, [edi + edi + 1]
// e9???????? |
// 8b4f24 | cmp edi, 0x40
// 8b5f38 | jb 0xffffffc1
// 3bcb | sub edi, 0x40
// 7305 | mov dword ptr [ebp + 8], eax
// 8b4728 | cmp edi, 4
$sequence_9 = { 83c10b 894dec e9???????? 2bc7 2bf7 8bfa }
// n = 6, score = 400
// 83c10b | mov edi, eax
// 894dec | mov dword ptr [ebp - 8], edx
// e9???????? |
// 2bc7 | movzx edx, word ptr [edx]
// 2bf7 | jae 0xad
// 8bfa | cmp dword ptr [ebp - 0x14], 0x13
$sequence_10 = { e8???????? e9???????? 48c7458007000000 4c89742478 664489742468 33c0 4883c9ff }
// n = 7, score = 200
// e8???????? |
// e9???????? |
// 48c7458007000000 | cmp edx, dword ptr [ebp + 0x10]
// 4c89742478 | jae 0x2f
// 664489742468 | mov edx, dword ptr [ebp - 0x50]
// 33c0 | mov ebx, dword ptr [ebp - 0x14]
// 4883c9ff | add edi, 0xa68
$sequence_11 = { 488d4dd0 e8???????? ff15???????? 448bc0 488d153e250200 488d4dd0 ff15???????? }
// n = 7, score = 200
// 488d4dd0 | test ecx, ecx
// e8???????? |
// ff15???????? |
// 448bc0 | dec eax
// 488d153e250200 | mov dword ptr [esp + 8], ebx
// 488d4dd0 | dec eax
// ff15???????? |
$sequence_12 = { 745c 833d????????01 7553 488d0d54b30100 e8???????? 33ff }
// n = 6, score = 200
// 745c | sub esp, 0x30
// 833d????????01 |
// 7553 | dec eax
// 488d0d54b30100 | mov ebx, ecx
// e8???????? |
// 33ff | dec eax
$sequence_13 = { 4533c0 488d55ef 488d4da7 e8???????? }
// n = 4, score = 200
// 4533c0 | jae 0xe
// 488d55ef | jae 0x31
// 488d4da7 | mov edx, dword ptr [ebp - 4]
// e8???????? |
$sequence_14 = { 4533c0 488d55c8 488d4df0 e8???????? 488d542448 488d4df0 e8???????? }
// n = 7, score = 200
// 4533c0 | mov dword ptr [esp + 8], ebx
// 488d55c8 | dec eax
// 488d4df0 | mov dword ptr [esp + 0x10], esi
// e8???????? |
// 488d542448 | push edi
// 488d4df0 | dec eax
// e8???????? |
$sequence_15 = { 498bcc e8???????? 85c0 0f84e9000000 488d15ca0f0200 488bcb e8???????? }
// n = 7, score = 200
// 498bcc | dec eax
// e8???????? |
// 85c0 | mov dword ptr [esp + 8], ebx
// 0f84e9000000 | dec eax
// 488d15ca0f0200 | mov dword ptr [esp + 0x10], esi
// 488bcb | push edi
// e8???????? |
$sequence_16 = { 4883ec20 4d8b21 4533ff 4d8bf1 }
// n = 4, score = 200
// 4883ec20 | push edi
// 4d8b21 | jmp 9
// 4533ff | mov dword ptr [ebp - 0x1c], ecx
// 4d8bf1 | jmp 0x25
$sequence_17 = { 66f2af 6689442420 48f7d1 4c8d41ff 488d4c2420 e8???????? 488d4c2420 }
// n = 7, score = 200
// 66f2af | sub eax, edi
// 6689442420 | sub esi, edi
// 48f7d1 | mov edi, edx
// 4c8d41ff | shr edi, 5
// 488d4c2420 | mov edx, dword ptr [ebp - 8]
// e8???????? |
// 488d4c2420 | mov word ptr [edx], cx
$sequence_18 = { 03d0 8d0452 488b542418 c1e008 4d8d1442 }
// n = 5, score = 200
// 03d0 | sub edx, edi
// 8d0452 | mov word ptr [ecx + 0x646], dx
// 488b542418 | add edi, edx
// c1e008 | mov word ptr [ecx + 2], di
// 4d8d1442 | mov edx, 2
$sequence_19 = { ffe1 418b5610 85d2 7509 45895e08 e9???????? 83ff10 }
// n = 7, score = 200
// ffe1 | mov dword ptr [esp + 0x10], esi
// 418b5610 | push edi
// 85d2 | dec eax
// 7509 | sub esp, 0x30
// 45895e08 | dec eax
// e9???????? |
// 83ff10 | mov dword ptr [esp + 8], ebx
$sequence_20 = { e8???????? 4c8d1d111f0100 4c895c2428 488d154d6d0100 488d4c2428 }
// n = 5, score = 200
// e8???????? |
// 4c8d1d111f0100 | jmp 0x1f
// 4c895c2428 | shr ecx, 0xb
// 488d154d6d0100 | imul ecx, dword ptr [ebp - 0x10]
// 488d4c2428 | cmp esi, ecx
$sequence_21 = { 488d0587fdffff 48894740 488d058cfdffff 48894748 8b442424 4c896770 894768 }
// n = 7, score = 200
// 488d0587fdffff | mov ecx, dword ptr [ebp - 0x20]
// 48894740 | movzx edx, word ptr [ecx + ebx*2 + 0x180]
// 488d058cfdffff | cmp eax, 0x1000000
// 48894748 | sub eax, edi
// 8b442424 | sub esi, edi
// 4c896770 | mov edi, edx
// 894768 | shr edi, 5
$sequence_22 = { 4885c0 746a 4c8bc0 0fb7d3 8bce e8???????? }
// n = 6, score = 200
// 4885c0 | dec eax
// 746a | sub esp, 0x30
// 4c8bc0 | dec eax
// 0fb7d3 | mov ebx, ecx
// 8bce | dec eax
// e8???????? |
condition:
7 of them and filesize < 1400832
}
SoulSearcher Samples
- 579fa00bc212a3784d523f8ddd0cfc118f51ca926d8f7ea2eb6e27157ec61260
- 69a9ab243011f95b0a1611f7d3c333eb32aee45e74613a6cddf7bcb19f51c8ab
- 0f7af0cad4aade0e7058051a449059b35358ddda075d88b2d289625adc02deef
- 1af5252cadbe8cef16b4d73d4c4886ee9cecddd3625e28a59b59773f5a2a9f7f
- a6f75af45c331a3fac8d2ce010969f4954e8480cbe9f9ea19ce3c51c44d17e98
- c4efb58723fd75d51eb92302fbd7541e4462f438282582b5efa3c6c7685e69fd
Comparison
- The SoulSearcher samples are either DLLs or they are 64bit, this does not match with our worm-set of 32bit exe files
- The metadata of the SoulSearcher samples sometimes spoofs Microsoft Files but none of them spoof Word like our worm-set
- There are three functions that are common between the samples
0F7AF0CAD4AADE0E7058051A449059B35358DDDA075D88B2D289625ADC02DEEF
these appear to be part of a compression algorithm but it's uncertain (maybe just open source statically compiled lib, or... same fdev!)
Question 1 - Is The Worm-Set Part of SoulSearch
It appears as though there is a shares library between the worm-set and the soulsearcher set but there is no other overlap. We can say with medium confidence that these samples are not related and that the overlap is highly likely a shared statically compiled library.
Based on this we should update the Malpedia rule, designate the shared library code sigs as "shared" and exclude them from allowing a full match of the yara rule.
Worm-Set Samples
- 7a9397dc47ce8a604c359da17dded029580b56cc6d988e841eaaabe622b23fa4
- b3ba13bbd74e20a40d9c02c0398a474c93a352322211561f35d60cdc26739477
- a36f318370a5ace5f9c4611c1f8edfffdb05dd8e109a3dd2127ca3881670d7bd
- a803aa42fe614cafdb86da10d730cecbc8a2432eb7080fd004e640cc536b595c
- fc2642273c70d99a172ce45f57fe02d7f65b0ecd2b29b882ce84d0f3e01c8197
Older sample that seems to be related? (2014)
- 4824d76a4d15840e30bb0a3da01757c22502f0d9cfeac05e9219100c477beeee
Common Artifacts
Metadata
These samples all have many common metadata artifacts.
LegalCopyright © 2006 Microsoft Corporation. All rights reserved.
InternalName WinWord
FileVersion 12.0.4518.1014
CompanyName Microsoft Corporation
LegalTrademarks1 Microsoft® is a registered trademark of Microsoft Corporation.
LegalTrademarks2 Windows® is a registered trademark of Microsoft Corporation.
ProductName 2007 Microsoft Office system
ProductVersion 12.0.4518.1014
FileDescription Microsoft Office Word
OriginalFilename WinWord.exe
charsetID 1252
Translation 0x0000 0x04e4
LangID 0x0000
Compile Time
Tue Nov 11 14:39:16 2003 UTC
Imports
- KERNEL32.dll
- ole32.dll
- SHLWAPI.dll
Yara Match Strings
0x7a6d:$sequence_0: 73 2F 8B 55 FC 3B 55 10 73 27 8B 55 B0 8B 5D EC
0x80c8:$sequence_1: C1 E9 0B 0F AF 4D F0 3B F1 73 06
0x7f71:$sequence_3: 81 C7 68 0A 00 00 81 FA 00 00 00 01 73 18 3B 45 FC
0x7415:$sequence_4: 03 D2 66 89 39 8D 7C D1 04 C7 45 F8 00 00 00 00 C7 45 E4 08 00 00 00 E9 84 00 00 00 2B C7
0x7a7a:$sequence_6: 8B 5D EC 8B 7D 08 E9 0D F5 FF FF 5F 5E B8 01 00 00 00 5B
0x7d2c:$sequence_8: B8 01 00 00 00 E9 89 04 00 00 8B 4F 24 8B 5F 38 3B CB 73 05 8B 47 28
0x72c9:$sequence_9: 83 C1 0B 89 4D EC E9 90 07 00 00 2B C7 2B F7 8B FA
Worm Differences (Polymorphic)
- The PE header (DOS header) has what appears to be some junk that is changed between samples
- The PE first .data section (0x42b000) contains data blobs that change between the samples
Reverse engineering the program flow we can see that if the binary is exeuted with no command line arguments it will create a polymorphic copy of itself in the %TEMP% directory and launch the copy. Because the copy is launched with no arguments it replicates the same behaviour creating an endless loop filling the %TEMP% directory with copies of the malware.
Question 2 - Where Are All These Samples Comming From?
Why Does VT have 500,000 copies of this malware?!
Because VT runs some submissions in a sandbox and collects the dropped files from the sandbox run for anlysis (running some of these in a sandbox) a loop is created where this worm endlessly creates new copies of itself. This issue is not limited to VT but can effect any intel feed that reprocesses their own samples.