Mordent 0 Report post Posted March 16, 2008 (edited) I'm curious as to the best methods of letting users submit data to a MySQL database, displaying that data, and removing any unwanted tags etc. from it. Currently, there's a handful of PHP functions that I know of to help with this: mysql_real_escape_string() - perhaps the best known and most commonly used function, it should be used in pretty much any MySQL query. It escapes characters that have SQL significance. ...which prepends backslashes to the following characters: \x00, \n, \r, \, ', " and \x1aI like to think I made a pretty good attempt at finding out what \x00 and \x1a are, but I can't find anywhere that will simply tell me. I'd assume that one of them is a hyphen (-), as that has special significance in SQL? strip_tags() - removes all HTML tags (including malformed ones) except those given in the second argument. nl2br() - converts all newlines (i.e. "\n") to "<br />" htmlspecialchars() - converts characters that have HTML significance (i.e. ", ', &, <, >) in to ones which will simply display instead of having any HTML meaning. htmlentities() - any idea what the difference between this and htmlspecialchars() is? This function is identical to htmlspecialchars() in all ways, except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.Anyone care to comment on how that would handle a string differently from htmlspecialchars()? addslashes() - adds "\" before quotes (both ' and "), other backslashes and NUL. According to php.net the function should be used when <i>entering</i> data into a database, although if magic_quotes_gpc is on (which, I believe, is the case both by default and on Xisto's servers) then it is entirely unnecessary, as apparently: The PHP directive magic_quotes_gpc is on by default, and it essentially runs addslashes() on all GET, POST, and COOKIE data. Do not use addslashes() on strings that have already been escaped with magic_quotes_gpc as you'll then do double escaping. The function get_magic_quotes_gpc() may come in handy for checking this. stripslashes() - effectively reverses (?) addslashes().So, given the function arsenal above, what can we get out of it? Clearly we could apply a lot of overkill to some strings, which would be unnecessary, but what's the minimum that needs to be done to make user-inputted data secure and still output what you want? Let's say we have a textarea which the user can input whatever they like in to, and in all cases the data will be stored in a MySQL database and can be displayed exactly as typed (mainly because that's the bit I'm working on ). Take this forum, as an example. I can quite happily type things such as "<b>foobar</b>" and they display exactly as entered. The quotation marks are left in, the bold tags are displayed, but not carried out. All formatting such as using bold text is done on the user's side with BB Code, which uses square brackets. For now, however, I want to leave this additional formatting alone, and just show precisely what's typed in. So, back to the textarea idea, let's say we have a form as below: <fieldset><legend>Update Text</legend><form action="update_text.php" method="post"><textarea cols="100" rows="10" name="text"></textarea><br /><input type="submit" name="update" value="Update" /></form></fieldset>So whatever the user types in is sent (via POST) to the script update_text.php. In that file we want to store it in a MySQL database. Given that we have a method of identifying the user by an ID (via sessions, most likely), and that the required file connects to the database. ...// process input$text = $_POST['text'];// access databaserequire('includes/db.php');mysql_query('UPDATE members SET text = "' . mysql_real_escape_string($text) . '" WHERE id = "' . mysql_real_escape_string($id) . '"') or die(mysql_error());...So, correct me if I'm wrong, but that would store the text so it can be recovered as entered? Newlines ("\n") would be put in, naturally, and any relevant characters would be escaped so that they're stored in MySQL correctly, and the possibility of SQL injection here would be low, right? The data would now be stored, theoretically exactly as inputted. If we want to get that data back out, so that it's shown by default in the form we could do so as shown below: ...// access databaserequire('includes/db.php');$getMember = mysql_query('SELECT text FROM members WHERE id = "' . mysql_real_escape_string($id) . '"') or die(mysql_error());if (mysql_num_rows($getMember) == 1){ // member found $row = mysql_fetch_array($getMember); $currentText = htmlspecialchars($row['text']);}......and then echo $currentText between the textarea tags in the form? htmlspecialchars() would need to be used, I believe, to stop people from closing the textarea early themselves and going on to do anything else they want. I'm pretty sure no other functions in the list above need to be used, but I'd like to confirm that. Then, when displaying the text (i.e. not in the textarea), I assume something like this could be used: ...// access databaserequire('includes/db.php');$getMember = mysql_query('SELECT text FROM members WHERE id = "' . mysql_real_escape_string($id) . '"') or die(mysql_error());if (mysql_num_rows($getMember) == 1){ // member found $row = mysql_fetch_array($getMember); $text = nl2br(htmlspecialchars($row['text']));}......which is identical to the previous method except for the use of nl2br() as well. Note that it's used after htmlspecialchars(), as otherwise the "<br />" tags would else be converted to "<br />" afterwards. Would any other functions need to be used, or would that simply do the job to a high enough level of security and still give the desired result? Thanks in advance for any feedback or comments, Mordent Edited March 16, 2008 by Mordent (see edit history) Share this post Link to post Share on other sites
Jared 0 Report post Posted April 19, 2008 I'm afraid I can't give you a very clear answer, but htmlspecialchars () would effectively remove anything that could be maliciously (bad choice of word) interpreted in HTML.. So as far as security goes, you're fine. Now we just have to worry about formatting. Essentially, the <pre> tag would make text appear exactly as shown, so we just have to think about what the <pre> tag really does. So it turns out the <pre> tag simply treats newlines and spaces as they are entered. So we just have to format those. nl2br () would take care of the newlines, but the spaces are still unaccounted for. But this may be a simple matter... we wouldn't be able to use a regular expression to replace multiple spaces with if there were tags in the midst--since <a href=""> is the same as <a href=""> and not <a href="">--HOWEVER there are no tags here! Text is being displayed exactly as it is. So we have a regular expression: (\s{2,}) Also... I just remembered... we have to watch out for tabs, too. Tabs (unfortunately) cannot be forced to print as a space can with , but you can use to make it slightly more html-friendly. So my final answer would be: <?php$text = 'your mysql variable';$text = htmlspecialchars ($text);$text = preg_replace_callback ('/(\x20{2,})/', create_function ('$matches', '$list = false; for ($i = 0; $i < strlen ($matches[1]); $i++) $list .= \' \'; return $lsit;'), $text);$text = preg_replace_callback ('/(\x09{2,})/', create_function ('$matches', '$list = false; for ($i = 0; $i < strlen ($matches[1]); $i++) $list .= \' \'; return $list;'), $text);$text = nl2br ($text);echo $text;?> That would work. My create_function is slightly sloppy, so you might want to fix that up if you can find a better way haha.. P.S. The only way you could "make tabs format" is if you decided to replace each tab with, say, 5 spaces. It's not the same idea as the tab (since a tab has variable space) but it's close. That would be this: $text = preg_replace_callback ('/(\x09{2,})/', create_function ('$matches', '$list = false; for ($i = 0; $i < 5 * strlen ($matches[1]); $i++) $list .= \' \'; return $list;'), $text); Hope this helps!!! - Jared Share this post Link to post Share on other sites