Blend has a new CMS partner! Check out all the details on our newest supported platform.

Input Sanitizing and Common Pitfalls of User Input

With the rise of malicious bots on the Internet, websites are becoming more vulnerable to data breaches. Implementing sanitizing techniques can protect your website and your users.

7/9/2024

Authored by

Joe Kepley

Categorized

Development
Thoughts

Most of the Internet is trying to steal your data. According to Imperva’s 2024 Bad Bot Report, nearly 50% of all internet traffic now comes from non-human sources. Malicious bots are estimated to be around a third of all Internet traffic. In previous ages of the Internet, a security mistake may have seen you the victim of a teenage prank; these days, it will likely result in theft or ransom of data by organized hacking groups or state-sponsored bad actors.

How data is stolen.

So how does this work? At the root level, a web server is a pretty simple computer. Your web browser shows up and asks for a specific part of a website using a URL (which stands for Uniform Resource Location). The web server sends back a text file full of HTML, and the browser displays it. Very easy.

Of course, a modern website is more than just simple pages, and that’s where we can start to get into trouble. As developers, we have to treat everything that is sent to us as though it may be potentially dangerous.

Let’s take a simple example. We have a simple search page on our website. People can type in a search term, and we’ll show them a list of results. We’ll also add a line that tells users what we searched for:

Screenshot 2024-04-17 at 12.56.32.png

Pretty harmless. But now we’ve taken something that a user gave us and we’ve put it in a web page. What if the user types in some HTML tags? Let’s try a simple ‘i’ tag that makes italics:

Screenshot 2024-04-17 at 13.01.41.png

Hmm, if we see that the text turned italic, it means that the page will render whatever we send it. On the surface, this doesn’t seem so bad. But what if instead of sending an italics tag, I send a javascript tag? What if I make that javascript pop a user/password prompt and send me whatever is typed in? And what if I mail the link to that to a lot of people? And what if that search page is on your bank’s website? Now I can send you an email that looks like it’s from your bank, to a web page on your bank’s site, that asks for your credentials, and sends them to me. We could easily alter the page to not look like the search page, and it would, of course, still have a lovely SSL lock icon in the address bar. After all, it’s really your bank’s website!

How to address malicious inputs.

You can see how we can pretty quickly go from an innocuous-seeming vulnerability to a full-blown data breach. It gets even more complicated if you start to consider user-generated content, like comments, forums, or social media. And this is only using our exploit to send things to users; we could potentially be trying to mess with the server computer, or its database, or any other system that powers the site.

There are a couple of ways we address this. First, can clean out anything that we’re given that we don’t want. This is known as "sanitizing" the input. For instance, if I’m expecting a credit card number, I can throw away anything that’s not a numeric digit. If I’m expecting a US State abbreviation, I can toss out anything that isn’t a two-letter alphabetic character.

For more general text inputs, most modern web frameworks provide some type of cleaning function that removes special characters, or turns them into versions that can be rendered safely. Here’s what our italic example looks like with sanitized input; the angle brackets are displayed as symbols instead of rendered as HTML:

Screenshot 2024-04-17 at 13.02.33.png

In many modern systems, this sort of filtering is enabled by default; systems like .Net MVC actually make developers intentionally opt out of the filtering behavior so developers don’t have to remember to clean input.

Beyond sanitation.

As with all security topics, it’s important to provide a layered defense. Even in situations where filtering is automatic, we still use code reviews to double-check that we’ve done things well. We also use regular external scans to make sure that if there’s a problem, we find it first. Finally, active web application firewalls will examine requests to try to prevent malicious input as a last line of defense. Here’s the result I got on the example above when I tried to send in a script tag, which shows that the firewall is doing its job:

Screenshot 2024-04-17 at 13.13.29.png

As methods of attack become increasingly sophisticated, the importance of robust security measures in web development can’t be overstated. By treating all user inputs as potential threats and implementing sanitizing techniques, developers can significantly reduce the risk of security breaches.