Class | BlueCloth |
In: |
lib/bluecloth.rb
|
Parent: | String |
Bluecloth is a Ruby implementation of Markdown, a text-to-HTML conversion tool.
doc = BlueCloth::new " ## Test document ## Just a simple test. " puts doc.to_html
Original version:
Copyright (c) 2003-2004 John Gruber <http://daringfireball.net/> All rights reserved.
Ruby port:
Copyright (c) 2004 The FaerieMUD Consortium.
BlueCloth is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
BlueCloth is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
$Id: bluecloth.rb 130 2009-07-16 00:08:36Z deveiant $
Version | = | VERSION = '1.0.1' | Release Version | |
SvnRev | = | %q$Rev: 130 $ | SVN Revision | |
SvnId | = | %q$Id: bluecloth.rb 130 2009-07-16 00:08:36Z deveiant $ | SVN Id tag | |
SvnUrl | = | %q$URL: svn+ssh://deveiate/svn/BlueCloth/releases/1.0.0/lib/bluecloth.rb $ | SVN URL | |
RenderState | = | Struct::new( "RenderState", :urls, :titles, :html_blocks, :log ) | Rendering state struct. Keeps track of URLs, titles, and HTML blocks midway through a render. I prefer this to the globals of the Perl version because globals make me break out in hives. Or something. | |
TabWidth | = | 4 | Tab width for detab! if none is specified | |
EmptyElementSuffix | = | "/>"; | The tag-closing string — set to ’>’ for HTML | |
EscapeTable | = | {} | Table of MD5 sums for escaped characters | |
StrictBlockTags | = | %w[ p div h[1-6] blockquote pre table dl ol ul script noscript form fieldset iframe math ins del ] | The list of tags which are considered block-level constructs and an alternation pattern suitable for use in regexps made from the list | |
StrictTagPattern | = | StrictBlockTags.join('|') | ||
LooseBlockTags | = | StrictBlockTags - %w[ins del] | ||
LooseTagPattern | = | LooseBlockTags.join('|') | ||
StrictBlockRegex | = | %r{ ^ # Start of line <(#{StrictTagPattern}) # Start tag: \2 \b # word break (.*\n)*? # Any number of lines, minimal match </\1> # Matching end tag [ ]* # trailing spaces $ # End of line or document }ix |
Nested blocks:
<div> <div> tags for inner block must be indented. </div> </div> |
|
LooseBlockRegex | = | %r{ ^ # Start of line <(#{LooseTagPattern}) # start tag: \2 \b # word break (.*\n)*? # Any number of lines, minimal match .*</\1> # Anything + Matching end tag [ ]* # trailing spaces $ # End of line or document }ix | More-liberal block-matching | |
HruleBlockRegex | = | %r{ ( # $1 \A\n? # Start of doc + optional \n | # or .*\n\n # anything + blank line ) ( # save in $2 [ ]* # Any spaces <hr # Tag open \b # Word break ([^<>])*? # Attributes /?> # Tag close $ # followed by a blank line or end of document ) }ix | Special case for <hr />. | |
LinkRegex | = | %r{ ^[ ]*\[(.+)\]: # id = $1 [ ]* \n? # maybe *one* newline [ ]* <?(\S+?)>? # url = $2 [ ]* \n? # maybe one newline [ ]* (?: # Titles are delimited by "quotes" or (parens). ["(] (.+?) # title = $3 [")] # Matching ) or " [ ]* )? # title is optional (?:\n+|\Z) }x | Link defs are in the form: ^[id]: url "optional title" | |
ListMarkerOl | = | %r{\d+\.} | Patterns to match and transform lists | |
ListMarkerUl | = | %r{[*+-]} | ||
ListMarkerAny | = | Regexp::union( ListMarkerOl, ListMarkerUl ) | ||
ListRegexp | = | %r{ (?: ^[ ]{0,#{TabWidth - 1}} # Indent < tab width (#{ListMarkerAny}) # unordered or ordered ($1) [ ]+ # At least one space ) (?m:.+?) # item content (include newlines) (?: \z # Either EOF | # or \n{2,} # Blank line... (?=\S) # ...followed by non-space (?![ ]* # ...but not another item (#{ListMarkerAny}) [ ]+) ) }x | ||
ListItemRegexp | = | %r{ (\n)? # leading line = $1 (^[ ]*) # leading whitespace = $2 (#{ListMarkerAny}) [ ]+ # list marker = $3 ((?m:.+?) # list item text = $4 (\n{1,2})) (?= \n* (\z | \2 (#{ListMarkerAny}) [ ]+)) }x | Pattern for transforming list items | |
CodeBlockRegexp | = | %r{ (?:\n\n|\A) ( # $1 = the code block (?: (?:[ ]{#{TabWidth}} | \t) # a tab or tab-width of spaces .*\n+ )+ ) (^[ ]{0,#{TabWidth - 1}}\S|\Z) # Lookahead for non-space at # line-start, or end of doc }x | Pattern for matching codeblocks | |
BlockQuoteRegexp | = | %r{ (?: ^[ ]*>[ ]? # '>' at the start of a line .+\n # rest of the first line (?:.+\n)* # subsequent consecutive lines \n* # blanks )+ }x | Pattern for matching Markdown blockquote blocks | |
PreChunk | = | %r{ ( ^ \s* <pre> .+? </pre> ) }xm | ||
AutoAnchorURLRegexp | = | /<((https?|ftp):[^'">\s]+)>/ | ||
AutoAnchorEmailRegexp | = | %r{ < ( [-.\w]+ \@ [-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+ ) > }xi | ||
Encoders | = | [ lambda {|char| "&#%03d;" % char}, lambda {|char| "&#x%X;" % char}, lambda {|char| char.chr }, ] | Encoder functions to turn characters of an email address into encoded entities. | |
SetextHeaderRegexp | = | %r{ (.+) # The title text ($1) \n ([\-=])+ # Match a line of = or -. Save only one in $2. [ ]*\n+ }x | Regex for matching Setext-style headers | |
AtxHeaderRegexp | = | %r{ ^(\#{1,6}) # $1 = string of #'s [ ]* (.+?) # $2 = Header text [ ]* \#* # optional closing #'s (not counted) \n+ }x | Regexp for matching ATX-style headers | |
RefLinkIdRegex | = | %r{ [ ]? # Optional leading space (?:\n[ ]*)? # Optional newline + spaces \[ (.*?) # Id = $1 \] }x | Pattern to match the linkid part of an anchor tag for reference-style links. | |
InlineLinkRegex | = | %r{ \( # Literal paren [ ]* # Zero or more spaces <?(.+?)>? # URI = $1 [ ]* # Zero or more spaces (?: # ([\"\']) # Opening quote char = $2 (.*?) # Title = $3 \2 # Matching quote char )? # Title is optional \) }x | ||
BoldRegexp | = | %r{ (\*\*|__) (\S|\S.*?\S) \1 }x | Pattern to match strong emphasis in Markdown text | |
ItalicRegexp | = | %r{ (\*|_) (\S|\S.*?\S) \1 }x | Pattern to match normal emphasis in Markdown text | |
InlineImageRegexp | = | %r{ ( # Whole match = $1 !\[ (.*?) \] # alt text = $2 \([ ]* <?(\S+?)>? # source url = $3 [ ]* (?: # (["']) # quote char = $4 (.*?) # title = $5 \4 # matching quote [ ]* )? # title is optional \) ) }xs | Next, handle inline images:  Don‘t forget: encode * and _ | |
ReferenceImageRegexp | = | %r{ ( # Whole match = $1 !\[ (.*?) \] # Alt text = $2 [ ]? # Optional space (?:\n[ ]*)? # One optional newline + spaces \[ (.*?) \] # id = $3 ) }xs | Reference-style images | |
CodeEscapeRegexp | = | %r{( \* | _ | \{ | \} | \[ | \] | \\ )}x | Regexp to match special characters in a code block | |
HTMLCommentRegexp | = | %r{ <! ( -- .*? -- \s* )+ > }mx | Matching constructs for tokenizing X/HTML | |
XMLProcInstRegexp | = | %r{ <\? .*? \?> }mx | ||
MetaTag | = | Regexp::union( HTMLCommentRegexp, XMLProcInstRegexp ) | ||
HTMLTagOpenRegexp | = | %r{ < [a-z/!$] [^<>]* }imx | ||
HTMLTagCloseRegexp | = | %r{ > }x | ||
HTMLTagPart | = | Regexp::union( HTMLTagOpenRegexp, HTMLTagCloseRegexp ) |
filter_html | [RW] | Filters for controlling what gets output for untrusted input. (But really, you‘re filtering bad stuff out of untrusted input at submission-time via untainting, aren‘t you?) |
filter_styles | [RW] | Filters for controlling what gets output for untrusted input. (But really, you‘re filtering bad stuff out of untrusted input at submission-time via untainting, aren‘t you?) |
fold_lines | [RW] | RedCloth-compatibility accessor. Line-folding is part of Markdown syntax, so this isn‘t used by anything. |
Do block-level transforms on a copy of str using the specified render state rs and return the results.
Apply Markdown span transforms to a copy of the specified str with the given render state rs and return it.
Return a copy of the given str with any backslashed special character in it replaced with MD5 placeholders.
Escape any characters special to HTML and encode any characters special to Markdown in a copy of the given str and return it.
Render Markdown-formatted text in this string object as HTML and return it. The parameter is for compatibility with RedCloth, and is currently unused, though that may change in the future.
Break the HTML source in str into a series of tokens and return them. The tokens are just 2-element Array tuples with a type and the actual content. If this function is called with a block, the type and text parts of each token will be yielded to it one at a time as they are extracted.
Apply Markdown anchor transforms to a copy of the specified str with the given render state rs and return it.
Apply Markdown header transforms to a copy of the given str amd render state rs and return the result.