The Pursuit of Artificial Intelligence – Part 2

ai, ruby

My previous post outlined Joan, the conversational AI bot, that’s not so good at conversations and stated my goal of two Joan bots conversing with each other. In this post, I’ll go into the methods and tools used to extract Joan’s source code and the de-obfuscate process I used to understand this code.

Finding The Source

To get Joan to talk to herself I needed access to the API. For this I had to determine Joan’s input and output elements. The easiest way to do that? Firebug! With Firebug I tracked down the control ids for Joan’s main UI elements: her response area, my response area and the say button.

I decided at the beginning of this foray that I was going to use Ruby for scripting. To download Joan’s source I cooked up this Ruby script:

1
2
3
4
5
6
7
8
def GetSource
  url = URI.parse('http://www.icogno.com/')
  request = Net::HTTP::Get.new(url.path)
  response = Net::HTTP.start(url.host, url.port) {|http|
    http.request(request)
  }
  return response.body
end

The source didn’t contain Joan’s UI elements, so I concluded that Joan’s code was inside of an iFrame. Sure enough it was: http://jabberwacky.icogno.com/joan2?bot=joan… I downloaded the source from that url and found the UI elements. I also found obfuscated javascript code.

At this time, Google had just released the Closure Tools project and I probably could have used that to de-obfuscate the code, however I wanted to reverse engineer the code by hand.

De-Obfuscation

As much as I enjoyed the process of De-Obfuscation, it doesn’t make for a fun read. Instead I’ll share the 4 things I learned about de-obfuscation.

Number 1: Readability is the number one priority.

If you can’t read the code, you can’t understand it. To make the code readable I did the following:

Expand methods: Javascript is not a whitespace dependent language and so any whitespace is removed during obfuscation. Proper whitespace, indenting and newlines make a huge difference in creating human readable code and so expanding methods to include whitespace is important.

Remove Variables: In the code I found variables like:

1
2
T=true;
F=false;

These values were used to shorten the code (compression commonly occurs with obfuscation) and so I replaced them with their correct value.

Remove methods: When you expand some methods they turn out to be simple functions. You can remove these methods and insert the actual functions into the code. During my de-obfuscation I came across functions like `sT(f,m)`, `gE(e)` and `iH(e,h)`, and they translated to functions like this:

1
2
3
sT(f,m)  =>  window.setTimeout(f,m)
gE(e)    =>  document.getElementById(e)
iH(e,h)  =>  document.getElementById(e).innerHTML = h

Rename methods: Acronyms are used to shorten functions and these are hard to work with, especially if you’re unfamiliar with the domain. `Tts` was a common acronym and a google search returned `TextToSpeech` as the definition. I replaced `Tts` with `TextToSpeech` which helped a lot.

Number 2: Know your native functions.

Knowing the native functions built into a language gives quick insight into what methods do. For example, knowing that HttpXmlRequest is used for AJAX calls, I was able to find methods using that object. With that information I could begin to build assumptions around those methods.

Number 3: What the heck do I do next?

You will hit a stage where everything looks like perl Greek and nothing you look at makes sense. Nicely spiced in my de-obfuscation notes are sentences like:

  • Didn’t know what to do next
  • I was stuck
  • There are a lot of variables!
  • huh???

For me there were a couple of strategies here. One was to take a break. The other was to scroll through the code until I found something eye catching. In fact, this strategy lead me to some key code that cracked open the API. More on that in a bit.

Number 4: You’re going to make mistakes.

Some will be bad replacements, others will be bad assumptions. Everything is a hypothesis when you’re de-obfuscating and you have to be prepared to toss those assumptions out the window when they begin failing.

While de-obfuscating I came across >< as a boolean evaluator. At first I thought it was some esoteric javascript operator. To figure out what was going on I went back to the original source code. It turned out that I had introduced the error by replacing a couple of < operations with ><. Having the original source handy made for a quick fix.

Success

After spending a couple of hours applying the above techniques, I had many expanded methods, fewer variables, and a good idea of what the methods were doing. During a “What the heck do I do next?” moment I came across the variable U. It struck me as being out of place:

1
2
var U = 'whbshrvpchfkrm';
var U = U.replace(/h/g,'e').replace('k','o').replace('p','i');

Staring at it for awhile, it became clear that this code had been purposely inserted to obscure the variable U. I manually performed the regex replacement and 'whbshrvpchfkrm' became 'webserviceform'. At this point, I had everything I needed to make web requests to the Joan API. But I didn’t know it yet.

Part of the trouble with de-obfuscation is that you don’t know what you have until you realize you have it. You gather data and information, have a cup of coffee, someone says something about blue cheese and you say with a giant shout “Of course! Why didn’t I think of that before!” That’s where I was, I had all the pieces but I was missing the epiphany.

The final pieces to get Joan talking to herself can be found in Part 3. It will include the full Ruby source code so you can try things out at home!

This page was delicately crafted on by Gavin Miller.