In a previous post on File Access Vulnerabilities I mentioned the use of a sanitize function. Sanitize functions are needed because you don’t always have full control of file names or file paths provided by a user. And when you can’t control file names/paths the attack surface of your application increase.
This post will work through the creation of a file sanitization function, contrast whitelisting vs blacklisting, and look at a gem to handle sanitization.
Let’s start with an example of code that would need a sanitize function:
1 2 3 4 5 6 7 8
This is from a question asked on StackOverflow.
The questioner stated that
param[:code] was dynamic and couldn’t be determined a priori. They were correct in
assessing that this is vulnerable to an attacker submitting an HTTP request with the parameter of:
code=../../../config/database. Bam! Compromised
This means that the above function needs to be sanitized so that the system doesn’t get compromised.
Whitelisting vs Blacklisting
There are two main methods you can use to sanitize user input: whitelisting or blacklisting.
- Whitelisting is the act of setting what characters are allowed.
- Blacklisting is setting what characters are not allowed.
The distinction is subtle but makes a huge different for security and usability of a function.
Generally speaking you want to use a whiltelisting function before a blacklisting function. This is because whitelists (if done properly) are safer – you’re stating what is allowed vs trying to exclude all the bad things that shouldn’t be allowed. In such a case you’ll typically miss something and viola an attacker has an in. You’re smart, but when someone is motivated they’ll figure out a way to be smarter then you!
It’s trivial to enumerate the files in this directory and provide a list to the user mapped to a hash or GUID and then provide a drop down list for the user to choose from. This is called restriction via identifier which I’ve written about before.
This particular instance is nice since the download is restricted to
.yml files, meaning you can be extra
aggressive in your whitelisting. Let’s write a naive whitelist function:
1 2 3 4
In the above case, if you used the malicious string
../../../config/database the output is just what you’d
_________config_database. The slashes and dots are all removed and your
database.yml is safe. You could have
skipped replacing the ‘bad’ characters
with an underscore
_, but I prefer underscores since it’s more friendly/readable for the normal,
non-attacker use case.
But! (there’s always a but) You’ve got some additional considerations. While the above function is safe,
it is limited to a minimal character set. What happens if you inserted any of these characters:
into that function? They get stripped out!
In this case, you’re probably ok with that given the context of the files. You likely have full control of the language files so you can make assertions in your sanitization. But that’s not always the case.
This is where whitelists can become unwieldly. As a programmer you don’t want to go and define every single character that you want to allow; that’s tedious. That’s where the blacklist function comes in. Let’s see that:
1 2 3 4 5 6 7 8 9
Using the function, with some weird input:
猪<lǝgit> "input" °?I |s:*w*:é::ä::r: /\.?%ʎן
░ you get the
following back (results may vary by OS):
░. And while
this isn’t the prettiest filename, it’s what the user wanted!
This code is more complex than the whitelisting sanitize, and it’s more permissive. It’s also more user friendly since it’s giving the user what they put in.
The last piece to mention is alternatives. If you’re looking for a good gem that does this for you I’d recommend Zaru. It handles the same “bad characters” as the blacklist sanitize above, and also handles some windows edge cases for reserved words. Plus it’s got a test suite, which is a comfort when you’re looking at filename sanitization!