Jump to content
xisto Community
Sign in to follow this  
Unparallelogram

How To Parse Bbcode/scripts

Recommended Posts

I am planning to make a parser for BBCode and templating scripts. I am guessing I will be doing this in php as it is the most widely available language available for web right now. I am wondering if anyone has some advice on how I should structure this as to make it easier on myself. The thing is, while I would probably be able to make a full scale parser, that would be difficult and time consuming. Whereas, BBCode, template include declarations, etc seem like a far smaller subset of most parser languages.Really, I only need these features.1. Tag recognition. If I see a (B) tag in the source it should know to convert it into a <b> html tag.2. Tag parameters. I need to be able to (include-template "abc.template") for example, and pass extra parameters into certain tags/calls.3. Tag matching. I need it to make sure that all tags are closed properly and nested in the correct order.Does anyone have ideas as to the simplest way to implement this?

Share this post


Link to post
Share on other sites

Are you going to be utilizing this bbcodes on a forum or somewhere already established script? I would strongly suggest searching plug-ins or mods because you'll be end up reinventing the wheel--and we don't want to do that :)

Any case, if you want to compare acceptable bbcodes I suppose the way to compare is using array and pointers.

Without really working on it, just from the top of my head, you want to build a function

function bbcode($text)



$text will be the source page/paragraph/sentence

And then with series of str_replace and preg_replace you want to filter (B) to <b> etc. You'll need to build another function to check if open bbcode is closed properly.

After completing the replacing (B) to <b> etc, you would then return $text.

Try this page for a guidance bbcode function

Share this post


Link to post
Share on other sites

Actually, I WANT to reinvent the wheel. I want to get practice and build a portfolio of web apps that will show prospective employers that I know more than just theory and classroom assignments. I feel this should have some original stuff, but most things have already been done before, and that's probably a better show of programming skills than nothing.Should I be scanning the text input, or should I be doing searching? How do I minimize the amount of copying that goes on in memory? Are there string buffers? Does that even matter in the overall performance?Thanks for the link.

Share this post


Link to post
Share on other sites

You sound like an old fashioned programmer. While I was taking Modular (useless code I spent 3 years of my life) my professor said the same thing: minimize the memory buffer.But I think that's yesterday's issue. Today's servers are much more powerful that it can handle floating points here and there. And with new technologies updating CPUs by the seconds and miracles of burst RAM, maximizing the bus is not that critical. But this also leads to dirty coding...I know (I'm so guilty of it).The overall performance would be up to the length of the $text parsed. If you'd like to know how bbcode is parsed, look at the source of this forum IP.Board, vBulletin and phpbb. They all utilize bbcodes. Another forum called AEF is completely free open source forum script. They will give you better insight.

Share this post


Link to post
Share on other sites

You may want to look into regular expressions.  It takes a while to get a hang for it, but regexes are an invaluable tool for searching through and replacing portions of text.  You can find a fairly in-depth explanation and tutorial at http://www.regular-expressions.info/.  Finding tags and parameters can easily be done with regexes, but tag matching is nearly impossible with regexes alone.  You could look into how HTML Tidy works (http://tidy.sourceforge.net/) as it does tag matching and auto-correction of nesting, open tags, and such.

You can use regexes as a tool, but with regexes alone, you won't be able to parse bbcode well.

This link may also help.  https://kore-nordmann.de/blog/do_NOT_parse_using_regexp.html

Share this post


Link to post
Share on other sites

Definitely consider using biterscripting. There is a good sample script posted at http://www.biterscripting.com/SS_WebPageToText.html that parses tags from a web page. It parses html tags, so you can convert it to parse BB tags. One good thing about it is that biterscripting is very simple to learn and in no time, you can create your own collection of scripts, which is what I did, and possibly productize your collection of scripts.

It will be nice if you post some small sample scripts you create. They will be very useful to others as welll.

Good luck with your project. Email me if you need any help. (I only read my posts mail whenever time permits - so, please be patient.)

Patrick

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.