Software security
What are the common pitfalls when writing programs, from a security perspective?
Here, the focus is not on using appropriate security mechanisms in secure software, but to avoid opening gaping security holes which can be used to circumvent the security mechanisms.
cf. von Braun's diagram on increasing technical levels of attacks, while skill level of attackers is decreasing: these techniques can be automatised, and used in ready-made attack tools (e.g. "root kits" in Unix), where the attacker need not be an expert but instead a "script kiddie".
However, to understand these pitfalls you need to understand how programs really run on a real system. Programming, compilers, data structures, architecture, operating systems are useful to know.
Do not try these techniqes on the real net, or on other people's systems! It is illegal, and if/when you are detected you will be prosecuted.
Pitfalls:
- Inputs: encodings (strings, URLs), validation/quoting (sql/script inserts)
- Overflows: integer, stack, heap
- Races: temp files
Defenses:
- typing, canonical forms, bounds checking, validation/quoting
- hardware support, source analysis/inspection (typing, functions, use), testing
- least privilege
Pitfalls
"Unexpected" inputs to programs can break security. Examples:
1.1.1. URL encodings (cf von Braun)
Various ways of hiding the "real" destination of a HTML link.
- explicit URL in link text, e.g. <a href="http://villain.si.te">http://www.nordea.se</a>: in some situations the browser may not even show the real link before the user clicks
- encoding the href, e.g. http://129.168.1.42, http://quite-a-long-prefix-of.nordea.se:incomprehensible-junk@villain.si.te where the quite-...se is interpreted as username, incomprehensible-junk as a password, and the combined length hides the real host villain.si.te if the browser field to show the link is too short.
- homograph encodings: the "Internationalized Domain Name" (IDN) encoding allows full unicode character set, where many characters look the same: using a letter looking just like ascii "o", the domain www.nordea.com can be registered, and the user agent/client may not show the difference, making spoofing attacks easier.
1.1.2. Name-based protection
- e.g. file name case sensitivity (the file "MyFile" has separate access control (not internal to file system) with that spelling, but can be accessed as "myfile" bypassing access control)
- encodings of names/identifiers, e.g. ASCII vs ISO-8859-x vs UTF-8 vs Unicode, or IP address vs DNS name
Good solution: rewrite to one canonical form before lookup/control. (Don't decode twice! Cf foo&amp;bar => foo&bar => foo&bar)
1.1.3. Script and SQL insertion
Scripts (typically interpreted high-level programs) are often used for programming web pages, handling web forms etc. Input is often received in variables (program-level or environment variables) corresponding to the fields of the web form and other information about the web request.
Problem: input may be used in un-controlled ways. Example from the book: web script accepting a mail address to send a file to (say $clientaddr), executing
- cat thefile | mail $clientaddr
in a standard shell.
A malicious client can now supply not only a mail address, but a partial shell command, e.g. "foo@bar.com; rm -rf /". What gets executed is
- cat thefile | mail foo@bar.com; rm -rf /
Similar standard attacks on SQL commands (database language used in many web applications); may be used to change the database query and find/change data.
Solution:
- always escape/quote special characters for the "command interpreter" whether it is a shell, programming language, or SQL database. E.g. replace ";" with "\;", "|" with "\|", in shell scripts; "'" with "\'" in SQL code, etc.
- validate the input before using it; e.g. checking for correct syntax (mail addresses, integers, ...), typing (if possible) and length. Client-side validation (e.g. using JavaScript) is only a help for the user; the important issue is server-side validation.
1.1.4. Pathnames
Files used by e.g. a web server are typically located in a specific "branch" of the file system, such as /var/www/html, and web clients should not have access to files outside that branch (e.g. /etc/passwd). File names are input (implicitly) from the client through the URL, and are typically relative to the "root of the web branch".
What happens if the file name is relative, e.g. ../../../../etc/passwd?
Solutions:
- disallow/remove "../" components - and check for encodings of that string (cf Gollmann 14.2.1)
- translate the pathname to an absolute one, and make sure it is within the correct "branch" (cf "canonical forms")
- in Unix, use chroot to make it impossible to access paths outside the intended branch (chroot changes the root (/) interpretation in the process and subprocesses)
Each of these may be more or less practical.
Integer representation is typically finite/fixed size (8, 16, 32, 64 bits).
- Wraps around: 255+1 = 0 (unsigned 8-bit), 127+1 = -1 (signed 8-bit)
- Converting between representations may lose value: 32-bit to 16-bit throws away half the bits!
1.2.1. Array bounds checking
In low-level languages like C, there is no array bounds checking. Specifically, strings are arrays of 8-bit bytes (terminated by 0 ("null" character)). Indexing an array is just doing pointer arithmetic:
- arr[index] is the same as &arr+index*sizeof(array element)
i.e. start address of arr plus offset to index
Consider negative indexes.
Standard problem:
- gets(buf) reads bytes into buf until EOF or newline - without checking that the contents fit!
Solution:
- fgets(buf, nbytes, stream) reads maximum nbytes (from stream, e.g. stdin)
Similar problems with strcpy(dest, src): use strncpy(dest, src, nbytes) or even better strlcpy(dest, src, nbytes where nbytes is the size of dest including the terminating "null" character! (Note that for strncpy you need to allow room for the null byte, and put it there.)
Exploit of the bounds checking problem, in combination with how local variables are allocated. Consider a function
void foo(void) { /* no argument, no value */ char buf[70]; gets(buf); /* read input */ if (strncmp(buf, "foo", 3)) /* are the first three characters "foo"? */ printf("bar!\n"); /* then give some response */ }
When the function is called, the stack has the following content (on a system where the stack grows from higher to lower addresses):
Stack (high-to-low addresses) |
---|
... |
return addr |
saved frame pointer |
buf (70 bytes) |
When input is read, it is stored in buf (without range checking), so in particular the return address can be overwritten, changing where the process jumps to when done. Also, the higher addresses on the stack can be overwritten with code, which can then be jumped to. Thus an "attacker" can make the program execute arbitrary code! Cf the Worm 1988, and Gollmann 14.4.3.
The heap is the memory area used e.g. for dynamically allocated memory blocks. These blocks are structured, and contain a header/trailer with infomation about the next (free/allocated) block, sizes, etc. Overwriting this info (in similar ways to above) you can cause crashes, overwrite security-relevant data, etc. Cf http://en.wikipedia.org/wiki/Heap_overflow.
If the system does not use the separation principle well enough, race conditions may be used for attacks. Race conditions is when the timing of the execution (severely) affects the outcome (of computations). Gollmann has a historic example from CTSS (the Compatible Timesharing System) of the 1960s, but similar problems are found on a daily/weekly/monthly basis even today.
Example:
- A program uses a file in /tmp to store intermediary data (e.g. the password file being edited(!)), which is later used (e.g. as the real password file(!))
- Even if files are by default protected, problems:
- without the "sticky bit" (see Unix security), anyone can replace the file, and if done at the right moment, may not be detected
- with the sticky bit, if the file name can be predicted, it can be created ahead of time with a malicious owner/protection
Defenses
Validate inputs: do type-checking, use canonical forms, do bounds-checking
- type checking can check not only for "memory integrity" but also for information flows and other security aspects
Hardware support: e.g. to protect against stack overflow attacks, the stack can be set to non-executable, you can enforce either write or execute permission on memory, etc. Sometimes this breaks old software, sometimes not.
Software support: examples
- change function call mechanism to place a (random) check value ("canary") below return address on stack, and check that it is intact before returning from the function. Requires recompilation.
- use programming language with bounds checking
Source analysis: compilers can check for "dangerous" functions (e.g. gets) and uses of functions, type checking, and perform "deeper" analysis of known security issues. The OpenBSD Unix distribution uses tools to catch all "simple" pitfalls.
Testing: similar to penetration testing over the network, testing for known vulnerabilities.
Least privilege and other design principles (cf end of Models) are always good to consider.
And of course, keep your system up-to-date with respect to security patches and updates (including "virus" protection). When a new vulnerability is found, a new patch/update is created, and quickly new attacks based on the vulnerability are created - sometimes because the vulnerability is made public, sometimes not.