The Web’s Syntax Problem

As @aefaradien notes, the web has a syntax problem. It’s this: A user wishes to post something complicated – text with links, formatting, even inline graphics. They go to a website and are faced with a text box and a flashing cursor. What do they type? What syntax will help them achieve their goal?

It depends entirely on which website they’re on and what powers it. With any luck the text box itself might have an area below explaining how to use it, but chances are, the user won’t read it. The knowledgable user has a whole bunch of questions:

  • Can I use HTML? The internet is made of HTML (and cats). Once the post is submitted, it’ll be sent to everyone else’s browser as HTML, so can I just write in HTML anyway? But HTML is complex, am I restricted to a certain subset? Do I have to worry about breaking the website’s formatting? Is the site using some weird CSS that’s going to distort my post? Could I introduce security vulnerabilities?
  • Is the syntax HTML-like? Am I using a phpBB-powered forum, or others that support its syntax? Something else HTML-like but not true HTML? To make something bold, do I write <b> or [b]?
  • Is the syntax Wiki-like? And what even is Wiki-like? MediaWiki, which powers Wikipedia, probably has the most popular syntax out there, but each wiki is subtly different. If I CamelCase words, will they become links? If I surround a word with *asterisks*, will it become bold? What about apostrophes? Forward-slashes?
  • Is it something much stranger? Could it be something like Markdown, which could interpret some unintentional meaning from my text because I don’t know its syntax?

To my mind, there’s no simple solution to this problem. Each has its own strengths and weaknesses, and developers of each web platform, blog or forum app have their own preferences. BBcode has some traction, but it’s so close to HTML — why not just use HTML? Wiki markup’s great for linking to internal wiki pages, not so great for anything else. And Markdown and its cohort of technically superior solutions just don’t have any traction in the real (non-geek) world.

I think if this problem were to ever be solved — and I must say I don’t think it’s likely — we have no option but to pick the lowest common denominator, because nothing else will ever have enough traction.

And here’s where I make myself unpopular: the common denominator is HTML. But HTML used with some intelligence:

  • Auto-link URLs, but deal with it if users want to use <a> tags. Nothing’s more annoying than having to copy-paste a URL into your location bar because it’s not actually a hyperlink. Also, it breaks the web.
  • Deal gracefully with special characters. If a user doesn’t know HTML, they should be penalised as little as possible for using triangular brackets in their text.
  • Limit HTML as little as possible. Sure, don’t allow <IFRAME> or <SCRIPT>, but if there’s no way a user’s HTML could be harmful (including to layout and design), let them use it.
  • Don’t use weird CSS. If you don’t want users to use <h3> because your <h3> is 72px high, change your CSS. You design a website for its users, and that includes giving them what they expect when they use their own HTML in their posts.

And that’s that. By auto-linking URLs and gracefully dealing with triangular brackets, we’re giving users that don’t know the syntax what they expect. For users that know HTML, we’re not making them learn some other new syntax that offers a slight improvement. And for users that want to learn the syntax so that they can do more complex things, they’ll be learning HTML, and that opens up far more of the internet to them than knowing BBcode or Markdown syntax.

Thoughts, as always, appreciated!

5 Comments

  • one thought that does occur to me is that developers would need to be really careful that the user can't inject something that could break the page layout.

    First example that comes to mind is long words that don't wrap properly. Also, by the nature of CSS, its last-write wins, so comments could end up extra colourful and annoying. That said, these issues are not new.

    Apart from uncertainty, I can't actually object to this plan… yet.

    Oh wait, what about white space? My original annoyance earlier was due to not working out how to indent something. Would it be ok to allow html(-lite) input, but treat white space as significant?

    Also, what if I want users to have access to something not in the HTML spec, do they have to write inline CSS? e.g. double underline? Also, most formatting-related HTML is deprecated.

    Also, there would need to be some clean up to make sure any mis-matched tags don't cause problems, brake validation, etc.

    Oh, and what version of (X)HTML would this be? Would the user need to know what was declared in the header?

  • Long words wrapping is vaguely solvable – I've done it in Successwhale, and it's sometimes ugly but functional. This is going to be a problem in any scheme.

    CSS last-write-wins can be gotten over by disabling the style attribute for comments.

    Indenting: pre / ul / blockquote?

    I'd say things outside the HTML spec / things where it matters which version of HTML you're using are going to be very rare cases, probably not worth worrying about unless you really love some double-underline action! :D Inline CSS is not going to be a good idea, for every use of double-underline there'll be a <p style=”font-size:1000%”>FIRST!<p>…

    Mismatched tags are largely a solved problem, AFAICS.

  • does that make it prototype time then? :D

  • I imagine I'm as close as you're going to get to a non-knowledgeable user, in you particular circle of friends (I've got a passable knowledge of HTML, but that's it). And working on the assumption that the input of laymen is useful when discussing something designed for them, I'd say this: Yes, you're right.

    HTML is the answer. It's simple to pick up as you go along, there are masses and masses of easily accessible resources to help you with more fiddly stuff (and help you improve), and as you say it's the building block of the internet itself. I'm sure there are whole swathes of subtle programming nuance I'm missing here, but why would you use anything else for your basic-level user interactivity?

  • Usually because you want to supply a tool to insert the relevant codes for bold/italics etc. or because you want to so heavily restrict html it's worth just specifying what tags are allowed in another language.

    But I'd agree that html-comments are usually for the best

Post a Comment

Your email is never shared. Required fields are marked *
You can also log in using any of the following services: