12 Mar 2018
Introduction
Analyzing and breaking external encrypted HDD has been a “hobby” of mine for
quite some time. With my colleagues Joffrey Czarny and Julien Lenoir we looked
at several models in the past:
- Zalman VE-400
- Zalman ZM-SHE500
- Zalman ZM-VE500
Here I am going to detail how I had fun with one drive a colleague gave me: the
Chinese Aigo “Patriot” SK8671, which follows the classical design for external
encrypted HDDs: a LCD for information diplay and a keyboard to enter the PIN.
DISCLAIMER: This research was done on my personal time and is not related
to my employer.
|
|
Enclosure |
Packaging |
The user must input a password to access data, which is supposedly encrypted.
Note that the options are very limited:
- the PIN can be changed by pressing
F1
before unlocking
- the PIN must be between 6 and 9 digits
- there is a wrong PIN counter, which (I think) destroys data when it reaches 15 tries.
In practice, F2, F3 and F4 are useless.
Hardware design
Of course one of the first things we do is tear down everything to identify the
various components.
Removing the case is actually boring, with lots of very small screws and
plastic to break.
In the end, we get this (note that I soldered the 5 pins header):
Main PCB
The main PCB is pretty simple:
Important parts, from top to bottom:
- connector to the LCD PCB (
CN1
)
- beeper (
SP1
)
- Pm25LD010 (datasheet) SPI flash (
U2
)
- Jmicron JMS539 (datasheet) USB-SATA controller (
U1
)
- USB 3 connector (
J1
)
The SPI flash stores the JMS539 firmware and some settings.
LCD PCB
The LCD PCB is not really interesting:
It has:
- an unknown LCD character display (with Chinese fonts probably), with serial control
- a ribbon connector to the keyboard PCB
Keyboard PCB
Things get more interesting when we start to look at the keyboard PCB:
Here, on the back we can see the ribbon connector and a Cypress CY8C21434 PSoC
1 microcontroller (I’ll mostly refer to it as “µC” or “PSoC”):
The CY8C21434 is using the M8C
instruction set, which is documented in the
Assembly Language User Guide.
The product page states it
supports CapSense,
Cypress’ technology for capacitive keyboards, the technology in use here.
You can see the header I soldered, which is the standard ISSP programming header.
Following wires
It is always useful to get an idea of what’s connected to what. Here the PCB
has rather big connectors and using a multimeter in continuity testing mode is
enough to identify the connections:
Some help to read this poorly drawn figure:
- the PSoC is represented as in the datasheet
- the next connector on the right is the ISSP header, which thankfully matches what we can find online
- the right most connector is the clip for the ribbon, still on the keyboard PCB
- the black square contains a drawing of the
CN1
connector from the main PCB,
where the cable goes to the LCD PCB. P11, P13 and P4 are linked to the PSoC
pins 11, 13 and 4 through the LCD PCB.
Attack steps
Now that we know what are the different parts, the basic steps would be the
same as for the drives analyzed in previous research :
- make sure basic encryption functionnality is there
- find how the encryption keys are generated / stored
- find out where the PIN is verified
However, in practice I was not really focused on breaking the security but more
on having fun. So, I did the following steps instead:
- dump the SPI flash content
- try to dump PSoC flash memory (see part 2)
- start writing the blog post
- realize that the communications between the Cypress PSoC and the JMS539
actually contains keyboard presses
- verify that nothing is stored in the SPI when the password is changed
- be too lazy to reverse the 8051 firmware of the JMS539
- TBD: finish analyzing the overall security of the drive (in part 3 ?)
Dumping the SPI flash
Dumping the flash is rather easy:
- connect probes to the
CLK
, MOSI
, MISO
and (optionally) EN
pins of the flash
- sniff the communications using a logic analyzer (I used a Saleae Logic Pro 16)
- decode the SPI protocol and export the results in CSV
- use decode_spi.rb to parse the results and get a dump
Note that this works very well with the JMS539 as it loads its whole firmware
from flash at boot time.
$ decode_spi.rb boot_spi1.csv dump
0.039776 : WRITE DISABLE
0.039777 : JEDEC READ ID
0.039784 : ID 0x7f 0x9d 0x21
---------------------
0.039788 : READ @ 0x0
0x12,0x42,0x00,0xd3,0x22,0x00,
[...]
$ ls --size --block-size=1 dump
49152 dump
$ sha1sum dump
3d9db0dde7b4aadd2b7705a46b5d04e1a1f3b125 dump
Unfortunately it does not seem obviously useful as:
- the content did not change after changing the PIN
- the flash is actually never accessed after boot
So it probably only holds the firmware for the JMicron controller, which embeds
a 8051 microcontroller.
Sniffing communications
One way to find which chip is responsible for what is to check communications
for interesting timing/content.
As we know, the USB-SATA controller is connected to the screen and the Cypress
µC through the CN1
connector and the two ribbons. So, we hook probes to the 3
relevant pins:
- P4, generic I/O in the datasheet
- P11, I²C SCL in the datasheet
- P13, I²C SDA in the datasheet
We then launch Saleae logic analyzer, set the trigger and enter “123456✓”
on the keyboard. Which gives us the following view:
You can see 3 differents types of communications:
- on the P4 channel, some short bursts
- on P11 and P13, almost continuous exchanges
Zooming on the first P4 burst (blue rectangle in previous picture), we get this :
You can see here that P4 is almost 70ms of pure regular signal, which could be
a clock. However, after spending some time making sense of this, I realized
that it was actually a signal for the “beep” that goes off every time a key is
touched… So it is not very useful in itself, however, it is a good marker to
know when a keypress was registered by the PSoC.
However, we have on extra “beep” in the first picture, which is slightly
different: the sound for “wrong pin” !
Going back to our keypresses, when zooming at the end of the beep (see the blue
rectangle again), we get:
Where we have a regular pattern, with a (probable) clock on P11 and data on P13.
Note how the pattern changes after the end of the beep. It could be interesting
to see what’s going on here.
2-wires protocols are usually SPI or I²C, and the Cypress datasheet says the
pins correspond to I²C, which is apparently the case:
The USB-SATA chipset constantly polls the PSoC to read the key state, which is
‘0’ by default. It then changes to ‘1’ when key ‘1’ was pressed.
The final communication, right after pressing “✓”, is different if a valid PIN
is entered.
However, for now I have not checked what the actual transmission is and it does
not seem that an encryption key is transmitted.
Anyway, see part 2 to read how I
did dump the PSoC internal flash.
09 May 2017
Foreword
This post is the result of some thinking about reverse engineering tools. I
have been reverse engineering for more than 15 years but it has been only
very recently that I have begun feeling disappointed by the current tools.
Of course, I am not the first, and as Halvar said: “I am regularly infuriated
about the state of reverse engineering tools, and have only myself to blame.”
source
That being said, as most of the reverse work I do is static reversing on
“exotic” platforms or operating systems, your perception may quite differ,
particularly if your focus is automated analysis, which is not my case. As I
almost exclusively reverse interactively, I think it’s very important for
the tools to be easily integrated in the analyst’s workflow: some tools are
really awesome but only usable for automated analysis.
And of course, this is my own ranting :)
Reverse engineering techniques
10 years ago most tools were pure disassemblers with nonexistant to poor advanced
static analysis capabilities. But recently, techniques for binary code analysis
have improved greatly and are getting practical. I will cover them quickly,
describing how I understand them and how they can be useful.
Static analysis techniques
I will not discuss here the merits of symbolic execution, abstract
interpretation or any other technique, as my point is about the practical tools
available to the reverser. Which underlying technique is or could be used is
out of scope.
Type propagation and reconstruction
Type propagation is quite simple to understand: knowing some types, either from
external APIs, FLIRT or from the analyst, use data flow analysis to propagate
types to arguments and relevant data. The challenge here is to do it both ways:
- forward, for example with argument types inside in a function, or return values
- backward, when calling an function with known argument types.
IDA has been doing it, in a limited way, for a long time (more than 10 years).
Type reconstruction is more advanced: using both type propagation and access
patterns, reconstruct complex types such as structures or vtables.
The only two practical tools that I know of are:
both using HexRays’s decompiler SDK to analyse the decompiled output and create
the advanced structures.
Note that this is a research topic with several academic papers covering the
subject, but I don’t know of any tool with IDA integration.
Also, interesting approaches have been proposed for dynamic analysis, for
example Trace Surfing by A. Gianni.
Taint analysis
Taint analysis is also very useful for the reverser as it can help pinpoint
interesting parts of a binary or function, depending on the source of taint.
Ponce is very interesting as it uses Triton to provide taint analysis directly
in IDA, with an easy to use GUI. I think it is a good way to provide advanced
analytics, too bad it is limited to dynamic analysis.
Data slicing
Data slicing could be described as a kind of backward taint analysis, where
the goal is to find which instructions and data inputs are used for a given
resulting register or memory space.
miasm’s
blog
gives a very good example and a practical tool ;)
Decompilation
Of course the holy grail of reversers is a good decompiler, which is currently
Hex-Rays.
Some academic papers such as this
one
claim interesting results, but are not available.
Lots of very interesting tools have appeared in the last years, covering part
of the techniques I mentioned before. For example:
While they all provide powerful features, they certainly do not cover the use
case I covered in my introduction. Most of them could be (and have been) used
as external helpers to add output to IDA but they do not provide a platform to
build interactive tools upon.
Disassemblers
In addition to the previous tools, the two main challengers to IDA are:
While I did not try them extensively (Relyze does not work on Linux, Binary
Ninja was slow when I tried the beta version), they look promising.
In particular, Binary Ninja’s IL and API seem to cover much of the points I
will cover in the next section. Be a Binary
Rockstar by Sophia
d’Antoine, Peter LaFosse and Rusty Wagner is a good showcase.
IDA
IDA is, like it or not, the only real tool you can use for serious reverse
engineering work, particularly on exotic platforms.
Why is IDA still reigning ?
Based on the features I outlined before, IDA is not really good on most of
them. So why is it still the default RE tool ?
For several reasons:
- its GUI works, really, sometimes it’s painful but it works :)
- it supports so many architectures that it’s very rare to have something not
supported.
- its plugin system allows to extend it and compensate for missing features (if
one bears the pain of using the SDK).
- its included library of information: type infos and flirt signatures.
- its very reactive and knowledgable support.
The decompiler is of course a killer feature for efficient reverse engineering,
particularly of C code.
Missing features
While this part may seem like throwing stones at IDA, I really think it’s a
great tool. Read it has an extended wishlist :)
Collaborative work
Clearly, one missing, essential, feature of IDA is the ability to do
collaborative work. One just needs to look at the various attempts to create
plugins attempting to fix the problem:
collabREate,
SolIDArity, polichombr,
YaCO, etc.
One basic aspect of a collaboration feature would be to be able to
simultaneously work on the same IDB, synchronising information like a git
repository.
But, while a life changing enhancement, that wouldn’t be enough. Hopefully, one
could also share structures through a server. For example, someone working on a
client and server could share the structures for the protocol while working on a
different binary.
Also, several attempts have been made over the years to create plugins to
integrate analyst’s knowledge in IDA, by recognizing functions already reversed
in the past. polichombr, crowdRE, IDA toolbag, etc.
The common point with (almost) all those tools is that they die slowly as their
authors move on to other things. Which is definetly not helped by IDA’s
internals which are not suited for such low level integration.
Multiple files handling
Another painful aspect of IDA is the inability to work on several files at the same
time. One trivial example is a binary that uses a shared library. One has to
switch all the time between two IDB, copy pasting info (typing for example) to
“synchronise” information.
This gets particularly painful when working on a more than 2 binaries at the
same time.
Semantics
This is where IDA lags behind most other recent tools: instructions semantics
and intermediate representation.
Currently the only way to search for instructions is syntaxic, which is
definitely not enough if we need to search for a changing pattern. A trivial
example is argument lookup for functions parameters, which is basically
impossible.
Having an IR would also help tremendously writing scripts independently of the
underlying architecture. Some would argue that the Hex-Rays decompiler provides
such IR, but it is expensive and, most importantly, it is quite often wrong.
Others
Some other points:
- the SDK is a pain, inconsistenly named, poorly documented, with only partial
Python support. But it is powerful.
- C++ support is nonexistent.
- Porting information (typing, names) from one IDB to another can be painful.
Future ?
warning: personal feelings here
I think one of the main reasons IDA has not evolved much in 15 years is because
there was simply no competition. The market was a niche but it feels like more
and more people are doing RE, expanding the market somehow.
Considering that HexRays made several millions of euros of result in the past
years for 5-6 full time employees, I am surprised that they did not start a new
project to replace the definitely outdated base that is the IDA core.
“Just” porting the app to 64 bits seems a major pain. So with all their
experience, their market share and money, I think Hex-Rays could start IDA-ng
from scratch, and be very successful ! :)
Hopefully, the appearance of real competition like Binary Ninja or Relyze may
stir the field a bit and force Hex-Rays to fix the fundamental problems :)