The hint is referring to the previous yara challenge, Threat 2 Challenge: The rest of us, we died with our honor.

To begin the challenge, we are given 6 word docs.

Let’s open them all up

The files open and display the somewhat same word doc.

Running the file command on all of these files results in:

3673c9d7a5b2f978d3a34001d360ac485f22ed6fa868c8304eb99273a6efb268.doc: Microsoft Word 2007+

668bed5ed5d5effb3be659e8dab55c63369985064f7ee80f9365e75b34f6283d.doc: Microsoft Word 2007+

7717bd124dd0c0881afd6b327ff41b420bff77d3c5ae338a31cce5cfdcb3b5d0.doc: data

87f146c41082d7ba885f9433e0223b346f3032f7364bf18675b924a017994779.doc: Microsoft Word 2007+

afc502de73482404cc344301c207f27c7da7b31641cd2192b3bba40f3ab6964e.doc: Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.1, Code page: 1252, Author: Micah Yates, Template: Normal.dotm, Last Saved By: Micah Yates, Revision Number: 2, Name of Creating Application: Microsoft Office Word, Create Time/Date: Wed Jul 13 17:19:00 2016, Last Saved Time/Date: Wed Jul 13 17:19:00 2016, Number of Pages: 1, Number of Words: 146, Number of Characters: 837, Security: 0

d48a2f4922bca81ce8fff8c18d788f41d2034c7999ca1ed03965d914dc06a9df.doc: Rich Text Format data, version 1, unknown character set

They’re not all the same file format, but all contain the same basic content. There’s a .doc, .docx, .rtf, .mhtm, .dot, and .docm file with the same plaintext inside. Simply changing their extensions to .doc allows for Word to try and open them as a standard Word .doc

Let’s open up the RTF (d48a2f4922bca81ce8fff8c18d788f41d2034c7999ca1ed03965d914dc06a9df.doc) in a hex editor. They’re typically fairly simple to follow. The header looks fine so lets scroll down to the bottom of the file.

Looks weird right? Let’s take a look at the RTF file format on Wikipedia, specifically the Code Syntax:

Get all that?

So the TL;DR of that section states that RTF data must be within curly braces “{}”. This RTF file clearly has data appended to it.

Remember the hint? “The same, but different.”Well it seems this challenge is similar to the Threat 2 challenge. There is unknown data appended to a legitimate looking file.

Let’s attack the data that seems to be repeating. The appended data looks similar to base64 encoding, but somewhat obscured. Within this appended data we have 3 sequences of data that are 48 bytes long that are repeating.

Two of them are different, and of those, one is clearly not a valid Base64. (See Wikipedia’s definition of Base64.)

When there are sequences of four repeating characters in Base64 encoded data, the underlying data typically repeats in some sort of pattern. For example: AAAAAAAAA Base64 encoded is QUFBQUFBQUFB

Let’s assume these two different sequences are hiding the same data, but have encoded it differently due to data position. If that’s true, it looks like both sequences have also been obscured by a single byte operation. So what do we do to figure out that operation? Brute Force!

Let’s write a small script that performs single byte operations on the two sequences and then test it to see if they’re valid Base64 characters. In short, this runs a simple one byte XOR over each byte in the sequence and checks to see if they are ASCII compatible with Base64 characters.

import sys, re
data_sequence = bytearray(open(sys.argv[1], 'rb').read())

test_output = ""

key = range(0,255)
for i in range(len(key)):
    for j in data_sequence:
    base64_test = test_output
    test_output = ""
    if re.match("(?:[A-Za-z0-9+\/]{8})", base64_test):
        print hex(i), base64_test

I truncated the two sequences above into an 8-byte sequence of data:

Running the script over these shortened sequences, it returns 4 XOR candidates:

So we have four potential XOR values that decode valid Base64 characters.

Starting with XOR-ing the appended data with 0x26 we get this:

Looks pretty promising, and almost all characters are Base64 standard except there are no “+” characters, only “-“.

This looks like it may be using an alternate encoding string with “-“ in place of “+”. If we try and decode with this alphabet: “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-/”, there is no data that looks promising.

Some malicious Base64 encodings use alternate alphabets. Sometimes they’re simple and just change up the position of the alphabet. Let’s write another brute force script to rotate the position of the alphabet above, decode, and test to see if it has valid ASCII text.

import string, base64, re

STANDARD_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
CUSTOM_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-/'
def is_ascii(s):
    return all(ord(c) < 128 for c in s)
input_str = "vRjD3gu6ysbzqvjbihjP1gu63gW6zgvOzwnOigfG1cbQyxjDyxrD1QTNigXAihrC0xm6zwT91Qr/zcb-

def alt_decode(input):
  return base64.b64decode(input.translate(DECODE_TRANS))

for rotate in range(len(CUSTOM_ALPHABET)):
    if is_ascii(alt_decode(input_str)):
        print rotate, ROTATED_ALPHABET
        print "***************"
        print alt_decode(input_str) 

So what we did in the script was enter our data that had been XOR-ed with 0x26. We defined an alternative Base64 alphabet and then looped through all variations of that alphabet by rotating the entire alphabet by one character per loop. We then checked each output to see if it was valid ASCII, and then printed it. Running the above code returns:

Here’s the encoded data and clue. It appears that the decoding alphabet has been rotated by 26 characters.

Would you look at that, three lines of repeating characters, all the same length: “{ ** ** ** ** ** ** ** ** ** ** ** ** }”

So to recap, we brute-forced a single byte XOR, then brute forced an alternate base64 alphabet that had been rotated left by 26 characters.

Going back through the other documents we find the two variations to this encoded data, and fill out the YARA rule like this:

rule enc_doc : enc_doc
			$first = { 50 74 4C 62 15 41 53 10 5F 55 44 5C }
			$second = { 4F 40 15 6B 16 5E 54 09 4F 41 43 10 }
			$third = { 4F 45 44 5E 14 67 09 69 5C 55 44 11 }
			1 of them

Submitting the rule above will result in the key: PAN{7H1r7EEn-hOuR_71me_l1M17}