Class BlueCloth
In: lib/bluecloth.rb
Parent: String

Bluecloth is a Ruby implementation of Markdown, a text-to-HTML conversion tool.

Synopsis

  doc = BlueCloth::new "
    ## Test document ##

    Just a simple test.
  "

  puts doc.to_html

Authors

  • Michael Granger <ged@FaerieMUD.org>

Contributors

  • Martin Chase <stillflame@FaerieMUD.org> - Peer review, helpful suggestions
  • Florian Gross <flgr@ccan.de> - Filter options, suggestions

Copyright

Original version:

  Copyright (c) 2003-2004 John Gruber
  <http://daringfireball.net/>
  All rights reserved.

Ruby port:

  Copyright (c) 2004 The FaerieMUD Consortium.

BlueCloth is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

BlueCloth is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

To-do

  • Refactor some of the larger uglier methods that have to do their own brute-force scanning because of lack of Perl features in Ruby‘s Regexp class. Alternately, could add a dependency on ‘pcre’ and use most Perl regexps.
  • Put the StringScanner in the render state for thread-safety.

Version

 $Id: bluecloth.rb 130 2009-07-16 00:08:36Z deveiant $

Methods

Classes and Modules

Class BlueCloth::FormatError

Constants

Version = VERSION = '1.0.1'   Release Version
SvnRev = %q$Rev: 130 $   SVN Revision
SvnId = %q$Id: bluecloth.rb 130 2009-07-16 00:08:36Z deveiant $   SVN Id tag
SvnUrl = %q$URL: svn+ssh://deveiate/svn/BlueCloth/releases/1.0.0/lib/bluecloth.rb $   SVN URL
RenderState = Struct::new( "RenderState", :urls, :titles, :html_blocks, :log )   Rendering state struct. Keeps track of URLs, titles, and HTML blocks midway through a render. I prefer this to the globals of the Perl version because globals make me break out in hives. Or something.
TabWidth = 4   Tab width for detab! if none is specified
EmptyElementSuffix = "/>";   The tag-closing string — set to ’>’ for HTML
EscapeTable = {}   Table of MD5 sums for escaped characters
StrictBlockTags = %w[ p div h[1-6] blockquote pre table dl ol ul script noscript form fieldset iframe math ins del ]   The list of tags which are considered block-level constructs and an alternation pattern suitable for use in regexps made from the list
StrictTagPattern = StrictBlockTags.join('|')
LooseBlockTags = StrictBlockTags - %w[ins del]
LooseTagPattern = LooseBlockTags.join('|')
StrictBlockRegex = %r{ ^ # Start of line <(#{StrictTagPattern}) # Start tag: \2 \b # word break (.*\n)*? # Any number of lines, minimal match </\1> # Matching end tag [ ]* # trailing spaces $ # End of line or document }ix   Nested blocks:
     <div>
             <div>
             tags for inner block must be indented.
             </div>
     </div>
LooseBlockRegex = %r{ ^ # Start of line <(#{LooseTagPattern}) # start tag: \2 \b # word break (.*\n)*? # Any number of lines, minimal match .*</\1> # Anything + Matching end tag [ ]* # trailing spaces $ # End of line or document }ix   More-liberal block-matching
HruleBlockRegex = %r{ ( # $1 \A\n? # Start of doc + optional \n | # or .*\n\n # anything + blank line ) ( # save in $2 [ ]* # Any spaces <hr # Tag open \b # Word break ([^<>])*? # Attributes /?> # Tag close $ # followed by a blank line or end of document ) }ix   Special case for <hr />.
LinkRegex = %r{ ^[ ]*\[(.+)\]: # id = $1 [ ]* \n? # maybe *one* newline [ ]* <?(\S+?)>? # url = $2 [ ]* \n? # maybe one newline [ ]* (?: # Titles are delimited by "quotes" or (parens). ["(] (.+?) # title = $3 [")] # Matching ) or " [ ]* )? # title is optional (?:\n+|\Z) }x   Link defs are in the form: ^[id]: url "optional title"
ListMarkerOl = %r{\d+\.}   Patterns to match and transform lists
ListMarkerUl = %r{[*+-]}
ListMarkerAny = Regexp::union( ListMarkerOl, ListMarkerUl )
ListRegexp = %r{ (?: ^[ ]{0,#{TabWidth - 1}} # Indent < tab width (#{ListMarkerAny}) # unordered or ordered ($1) [ ]+ # At least one space ) (?m:.+?) # item content (include newlines) (?: \z # Either EOF | # or \n{2,} # Blank line... (?=\S) # ...followed by non-space (?![ ]* # ...but not another item (#{ListMarkerAny}) [ ]+) ) }x
ListItemRegexp = %r{ (\n)? # leading line = $1 (^[ ]*) # leading whitespace = $2 (#{ListMarkerAny}) [ ]+ # list marker = $3 ((?m:.+?) # list item text = $4 (\n{1,2})) (?= \n* (\z | \2 (#{ListMarkerAny}) [ ]+)) }x   Pattern for transforming list items
CodeBlockRegexp = %r{ (?:\n\n|\A) ( # $1 = the code block (?: (?:[ ]{#{TabWidth}} | \t) # a tab or tab-width of spaces .*\n+ )+ ) (^[ ]{0,#{TabWidth - 1}}\S|\Z) # Lookahead for non-space at # line-start, or end of doc }x   Pattern for matching codeblocks
BlockQuoteRegexp = %r{ (?: ^[ ]*>[ ]? # '>' at the start of a line .+\n # rest of the first line (?:.+\n)* # subsequent consecutive lines \n* # blanks )+ }x   Pattern for matching Markdown blockquote blocks
PreChunk = %r{ ( ^ \s* <pre> .+? </pre> ) }xm
AutoAnchorURLRegexp = /<((https?|ftp):[^'">\s]+)>/
AutoAnchorEmailRegexp = %r{ < ( [-.\w]+ \@ [-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+ ) > }xi
Encoders = [ lambda {|char| "&#%03d;" % char}, lambda {|char| "&#x%X;" % char}, lambda {|char| char.chr }, ]   Encoder functions to turn characters of an email address into encoded entities.
SetextHeaderRegexp = %r{ (.+) # The title text ($1) \n ([\-=])+ # Match a line of = or -. Save only one in $2. [ ]*\n+ }x   Regex for matching Setext-style headers
AtxHeaderRegexp = %r{ ^(\#{1,6}) # $1 = string of #'s [ ]* (.+?) # $2 = Header text [ ]* \#* # optional closing #'s (not counted) \n+ }x   Regexp for matching ATX-style headers
RefLinkIdRegex = %r{ [ ]? # Optional leading space (?:\n[ ]*)? # Optional newline + spaces \[ (.*?) # Id = $1 \] }x   Pattern to match the linkid part of an anchor tag for reference-style links.
InlineLinkRegex = %r{ \( # Literal paren [ ]* # Zero or more spaces <?(.+?)>? # URI = $1 [ ]* # Zero or more spaces (?: # ([\"\']) # Opening quote char = $2 (.*?) # Title = $3 \2 # Matching quote char )? # Title is optional \) }x
BoldRegexp = %r{ (\*\*|__) (\S|\S.*?\S) \1 }x   Pattern to match strong emphasis in Markdown text
ItalicRegexp = %r{ (\*|_) (\S|\S.*?\S) \1 }x   Pattern to match normal emphasis in Markdown text
InlineImageRegexp = %r{ ( # Whole match = $1 !\[ (.*?) \] # alt text = $2 \([ ]* <?(\S+?)>? # source url = $3 [ ]* (?: # (["']) # quote char = $4 (.*?) # title = $5 \4 # matching quote [ ]* )? # title is optional \) ) }xs   Next, handle inline images: ![alt text](url "optional title") Don‘t forget: encode * and _
ReferenceImageRegexp = %r{ ( # Whole match = $1 !\[ (.*?) \] # Alt text = $2 [ ]? # Optional space (?:\n[ ]*)? # One optional newline + spaces \[ (.*?) \] # id = $3 ) }xs   Reference-style images
CodeEscapeRegexp = %r{( \* | _ | \{ | \} | \[ | \] | \\ )}x   Regexp to match special characters in a code block
HTMLCommentRegexp = %r{ <! ( -- .*? -- \s* )+ > }mx   Matching constructs for tokenizing X/HTML
XMLProcInstRegexp = %r{ <\? .*? \?> }mx
MetaTag = Regexp::union( HTMLCommentRegexp, XMLProcInstRegexp )
HTMLTagOpenRegexp = %r{ < [a-z/!$] [^<>]* }imx
HTMLTagCloseRegexp = %r{ > }x
HTMLTagPart = Regexp::union( HTMLTagOpenRegexp, HTMLTagCloseRegexp )

Attributes

filter_html  [RW]  Filters for controlling what gets output for untrusted input. (But really, you‘re filtering bad stuff out of untrusted input at submission-time via untainting, aren‘t you?)
filter_styles  [RW]  Filters for controlling what gets output for untrusted input. (But really, you‘re filtering bad stuff out of untrusted input at submission-time via untainting, aren‘t you?)
fold_lines  [RW]  RedCloth-compatibility accessor. Line-folding is part of Markdown syntax, so this isn‘t used by anything.

Public Class methods

Public Instance methods

Do block-level transforms on a copy of str using the specified render state rs and return the results.

Apply Markdown span transforms to a copy of the specified str with the given render state rs and return it.

Convert tabs in str to spaces.

Convert tabs to spaces in place and return self if any were converted.

Return a copy of the given str with any backslashed special character in it replaced with MD5 placeholders.

Escape any characters special to HTML and encode any characters special to Markdown in a copy of the given str and return it.

Transform a copy of the given email addr into an escaped version safer for posting publicly.

Return a copy of str with angle brackets and ampersands HTML-encoded.

Escape any markdown characters in a copy of the given str and return it.

Escape special characters in the given str

Wrap all remaining paragraph-looking text in a copy of str inside <p> tags and return it.

Replace all blocks of HTML in str that start in the left margin with tokens.

Return one level of line-leading tabs or spaces from a copy of str and return it.

Strip link definitions from str, storing them in the given RenderState rs.

Render Markdown-formatted text in this string object as HTML and return it. The parameter is for compatibility with RedCloth, and is currently unused, though that may change in the future.

Break the HTML source in str into a series of tokens and return them. The tokens are just 2-element Array tuples with a type and the actual content. If this function is called with a block, the type and text parts of each token will be yielded to it one at a time as they are extracted.

Apply Markdown anchor transforms to a copy of the specified str with the given render state rs and return it.

Transform URLs in a copy of the specified str into links and return it.

Transform Markdown-style blockquotes in a copy of the specified str and return it.

Transform Markdown-style codeblocks in a copy of the specified str and return it.

Transform backticked spans into <code> spans.

Apply Markdown header transforms to a copy of the given str amd render state rs and return the result.

Transform any Markdown-style horizontal rules in a copy of the specified str and return it.

Turn image markup into image tags.

Transform italic- and bold-encoded text in a copy of the specified str and return it.

Transform list items in a copy of the given str and return it.

Transform Markdown-style lists in a copy of the specified str and return it.

Swap escaped special characters in a copy of the given str and return it.

[Validate]