Creating a Safe Filename Sanitization Function

brakeman, ruby, security

In a previous post on File Access Vulnerabilities I mentioned the use of a sanitize function. Sanitize functions are needed because you don’t always have full control of file names or file paths provided by a user. And when you can’t control file names/paths the attack surface of your application increase.

This post will work through the creation of a file sanitization function, contrast whitelisting vs blacklisting, and look at a gem to handle sanitization.

Let’s start with an example of code that would need a sanitize function:

1
2
3
4
5
6
7
8
def download
  language_code = params[:code]
  send_file(
    "#{Rails.root}/config/locales/#{language_code}.yml",
    filename: "#{language_code}.yml",
    type: "application/yml"
  )
end

This is from a question asked on StackOverflow. The questioner stated that param[:code] was dynamic and couldn’t be determined a priori. They were correct in assessing that this is vulnerable to an attacker submitting an HTTP request with the parameter of: code=../../../config/database. Bam! Compromised database.yml file.

This means that the above function needs to be sanitized so that the system doesn’t get compromised.

Whitelisting vs Blacklisting

There are two main methods you can use to sanitize user input: whitelisting or blacklisting.

  • Whitelisting is the act of setting what characters are allowed.
  • Blacklisting is setting what characters are not allowed.

The distinction is subtle but makes a huge different for security and usability of a function.

Generally speaking you want to use a whiltelisting function before a blacklisting function. This is because whitelists (if done properly) are safer – you’re stating what is allowed vs trying to exclude all the bad things that shouldn’t be allowed. In such a case you’ll typically miss something and viola an attacker has an in. You’re smart, but when someone is motivated they’ll figure out a way to be smarter then you!

This particular instance is nice since the download is restricted to .yml files, meaning you can be extra aggressive in your whitelisting. Let’s write a naive whitelist function:

1
2
3
4
def sanitize(filename)
  # Remove any character that aren't 0-9, A-Z, or a-z
  filename.gsub(/[^0-9A-Z]/i, '_')
end

In the above case, if you used the malicious string ../../../config/database the output is just what you’d want: _________config_database. The slashes and dots are all removed and your database.yml is safe. You could have skipped replacing the ‘bad’ characters with an underscore _, but I prefer underscores since it’s more friendly/readable for the normal, non-attacker use case.

But! (there’s always a but) You’ve got some additional considerations. While the above function is safe, it is limited to a minimal character set. What happens if you inserted any of these characters: é 猪 pig into that function? They get stripped out!

In this case, you’re probably ok with that given the context of the files. You likely have full control of the language files so you can make assertions in your sanitization. But that’s not always the case.

This is where whitelists can become unwieldly. As a programmer you don’t want to go and define every single character that you want to allow; that’s tedious. That’s where the blacklist function comes in. Let’s see that:

1
2
3
4
5
6
7
8
9
def sanitize(filename)
  # Bad as defined by wikipedia: https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words
  # Also have to escape the backslash
  bad_chars = [ '/', '\\', '?', '%', '*', ':', '|', '"', '<', '>', '.', ' ' ]
  bad_chars.each do |bad_char|
    filename.gsub!(bad_char, '_')
  end
  filename
end

Using the function, with some weird input: 猪<lǝgit> "input" °?I |s:*w*:é::ä::r: /\.?%ʎן octopus you get the following back (results may vary by OS): 猪_lǝgit___input_°__I__s__w__é__ä__r______ʎן octopus . And while this isn’t the prettiest filename, it’s what the user wanted!

This code is more complex than the whitelisting sanitize, and it’s more permissive. It’s also more user friendly since it’s giving the user what they put in.

Alternatives

The last piece to mention is alternatives. If you’re looking for a good gem that does this for you I’d recommend Zaru. It handles the same “bad characters” as the blacklist sanitize above, and also handles some windows edge cases for reserved words. Plus it’s got a test suite, which is a comfort when you’re looking at filename sanitization!

This page was published on by Gavin Miller.