Class Kramdown::Parser::Kramdown
In: lib/kramdown/parser/kramdown/extensions.rb
lib/kramdown/parser/kramdown/escaped_chars.rb
lib/kramdown/parser/kramdown/smart_quotes.rb
lib/kramdown/parser/kramdown/autolink.rb
lib/kramdown/parser/kramdown/eob.rb
lib/kramdown/parser/kramdown/table.rb
lib/kramdown/parser/kramdown/block_boundary.rb
lib/kramdown/parser/kramdown/codeblock.rb
lib/kramdown/parser/kramdown/list.rb
lib/kramdown/parser/kramdown/html_entity.rb
lib/kramdown/parser/kramdown/blank_line.rb
lib/kramdown/parser/kramdown/paragraph.rb
lib/kramdown/parser/kramdown/emphasis.rb
lib/kramdown/parser/kramdown/math.rb
lib/kramdown/parser/kramdown/abbreviation.rb
lib/kramdown/parser/kramdown/footnote.rb
lib/kramdown/parser/kramdown/typographic_symbol.rb
lib/kramdown/parser/kramdown/blockquote.rb
lib/kramdown/parser/kramdown/link.rb
lib/kramdown/parser/kramdown/header.rb
lib/kramdown/parser/kramdown/html.rb
lib/kramdown/parser/kramdown/codespan.rb
lib/kramdown/parser/kramdown/line_break.rb
lib/kramdown/parser/kramdown/horizontal_rule.rb
lib/kramdown/parser/kramdown.rb
Parent: Object

Used for parsing a document in kramdown format.

If you want to extend the functionality of the parser, you need to do the following:

  • Create a new subclass
  • add the needed parser methods
  • modify the @block_parsers and @span_parsers variables and add the names of your parser methods

Here is a small example for an extended parser class that parses ERB style tags as raw text if they are used as span-level elements (an equivalent block-level parser should probably also be made to handle the block case):

  require 'kramdown/parser/kramdown'

  class Kramdown::Parser::ERBKramdown < Kramdown::Parser::Kramdown

     def initialize(source, options)
       super
       @span_parsers.unshift(:erb_tags)
     end

     ERB_TAGS_START = /<%.*?%>/

     def parse_erb_tags
       @src.pos += @src.matched_size
       @tree.children << Element.new(:raw, @src.matched)
     end
     define_parser(:erb_tags, ERB_TAGS_START, '<%')

  end

The new parser can be used like this:

  require 'kramdown/document'
  # require the file with the above parser class

  Kramdown::Document.new(input_text, :input => 'ERBKramdown').to_html

Methods

Included Modules

Kramdown::Parser::Html::Parser ::Kramdown

Constants

IAL_CLASS_ATTR = 'class'
ALD_ID_CHARS = /[\w-]/
ALD_ANY_CHARS = /\\\}|[^\}]/
ALD_ID_NAME = /\w#{ALD_ID_CHARS}*/
ALD_TYPE_KEY_VALUE_PAIR = /(#{ALD_ID_NAME})=("|')((?:\\\}|\\\2|[^\}\2])*?)\2/
ALD_TYPE_CLASS_NAME = /\.(#{ALD_ID_NAME})/
ALD_TYPE_ID_NAME = /#(\w[\w:-]*)/
ALD_TYPE_REF = /(#{ALD_ID_NAME})/
ALD_TYPE_ANY = /(?:\A|\s)(?:#{ALD_TYPE_KEY_VALUE_PAIR}|#{ALD_TYPE_ID_NAME}|#{ALD_TYPE_CLASS_NAME}|#{ALD_TYPE_REF})(?=\s|\Z)/
ALD_START = /^#{OPT_SPACE}\{:(#{ALD_ID_NAME}):(#{ALD_ANY_CHARS}+)\}\s*?\n/
EXT_STOP_STR = "\\{:/(%s)?\\}"
EXT_START_STR = "\\{::(\\w+)(?:\\s(#{ALD_ANY_CHARS}*?)|)(\\/)?\\}"
EXT_BLOCK_START = /^#{OPT_SPACE}(?:#{EXT_START_STR}|#{EXT_STOP_STR % ALD_ID_NAME})\s*?\n/
EXT_BLOCK_STOP_STR = "^#{OPT_SPACE}#{EXT_STOP_STR}\s*?\n"
IAL_BLOCK = /\{:(?!:|\/)(#{ALD_ANY_CHARS}+)\}\s*?\n/
IAL_BLOCK_START = /^#{OPT_SPACE}#{IAL_BLOCK}/
BLOCK_EXTENSIONS_START = /^#{OPT_SPACE}\{:/
EXT_SPAN_START = /#{EXT_START_STR}|#{EXT_STOP_STR % ALD_ID_NAME}/
IAL_SPAN_START = /\{:(#{ALD_ANY_CHARS}+)\}/
SPAN_EXTENSIONS_START = /\{:/
ESCAPED_CHARS = /\\([\\.*_+`<>()\[\]{}#!:|"'\$=-])/
SQ_PUNCT = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^_`{|}~]'
SQ_CLOSE = %![^\ \\\\\t\r\n\\[{(-]!
SQ_RULES = [ [/("|')(?=#{SQ_PUNCT}\B)/, [:rquote1]], # Special case for double sets of quotes, e.g.: # <p>He said, "'Quoted' words in a larger quote."</p> [/(\s?)"'(?=\w)/, [1, :ldquo, :lsquo]], [/(\s?)'"(?=\w)/, [1, :lsquo, :ldquo]], # Special case for decade abbreviations (the '80s): [/(\s?)'(?=\d\ds)/, [1, :rsquo]], # Get most opening single/double quotes: [/(\s)('|")(?=\w)/, [1, :lquote2]], # Single/double closing quotes: [/(#{SQ_CLOSE})('|")/, [1, :rquote2]], # Special case for e.g. "<i>Custer</i>'s Last Stand." [/("|')(\s|s\b|$)/, [:rquote1, 2]], # Any remaining single quotes should be opening ones: [/(.?)'/m, [1, :lsquo]], [/(.?)"/m, [1, :ldquo]], ]
SQ_SUBSTS = { [:rquote1, '"'] => :rdquo, [:rquote1, "'"] => :rsquo, [:rquote2, '"'] => :rdquo, [:rquote2, "'"] => :rsquo, [:lquote1, '"'] => :ldquo, [:lquote1, "'"] => :lsquo, [:lquote2, '"'] => :ldquo, [:lquote2, "'"] => :lsquo, }
SMART_QUOTES_RE = /[^\\]?["']/
ACHARS = '\w\x80-\xFF'
ACHARS = '\w'
ACHARS = '[[:alnum:]]'
AUTOLINK_START_STR = "<((mailto|https?|ftps?):.+?|[-.#{ACHARS}]+@[-#{ACHARS}]+(?:\.[-#{ACHARS}]+)*\.[a-z]+)>"
AUTOLINK_START = /#{AUTOLINK_START_STR}/u
AUTOLINK_START = /#{AUTOLINK_START_STR}/
EOB_MARKER = /^\^\s*?\n/
TABLE_SEP_LINE = /^([+|: -]*?-[+|: -]*?)[ \t]*\n/
TABLE_HSEP_ALIGN = /[ ]?(:?)-+(:?)[ ]?/
TABLE_FSEP_LINE = /^[+|: =]*?=[+|: =]*?[ \t]*\n/
TABLE_ROW_LINE = /^(.*?)[ \t]*\n/
TABLE_PIPE_CHECK = /(?:\||.*?[^\\\n]\|)/
TABLE_LINE = /#{TABLE_PIPE_CHECK}.*?\n/
TABLE_START = /^#{OPT_SPACE}(?=\S)#{TABLE_LINE}/
BLOCK_BOUNDARY = /#{BLANK_LINE}|#{EOB_MARKER}|#{IAL_BLOCK_START}|\Z/
CODEBLOCK_START = INDENT
CODEBLOCK_MATCH = /(?:#{BLANK_LINE}?(?:#{INDENT}[ \t]*\S.*\n)+(?:(?!#{BLANK_LINE} {0,3}\S|#{IAL_BLOCK_START}|#{EOB_MARKER}|^#{OPT_SPACE}#{LAZY_END_HTML_STOP}|^#{OPT_SPACE}#{LAZY_END_HTML_START})^[ \t]*\S.*\n)*)*/
FENCED_CODEBLOCK_START = /^~{3,}/
FENCED_CODEBLOCK_MATCH = /^(~{3,})\s*?\n(.*?)^\1~*\s*?\n/m
LIST_ITEM_IAL = /^\s*(?:\{:(?!(?:#{ALD_ID_NAME})?:|\/)(#{ALD_ANY_CHARS}+)\})\s*/
LIST_ITEM_IAL_CHECK = /^#{LIST_ITEM_IAL}?\s*\n/
LIST_START_UL = /^(#{OPT_SPACE}[+*-])([\t| ].*?\n)/
LIST_START_OL = /^(#{OPT_SPACE}\d+\.)([\t| ].*?\n)/
LIST_START = /#{LIST_START_UL}|#{LIST_START_OL}/
DEFINITION_LIST_START = /^(#{OPT_SPACE}:)([\t| ].*?\n)/
BLANK_LINE = /(?:^\s*\n)+/
LAZY_END_HTML_SPAN_ELEMENTS = HTML_SPAN_ELEMENTS + %w{script}
LAZY_END_HTML_START = /<(?>(?!(?:#{LAZY_END_HTML_SPAN_ELEMENTS.join('|')})\b)#{REXML::Parsers::BaseParser::UNAME_STR})\s*(?>\s+#{REXML::Parsers::BaseParser::UNAME_STR}\s*=\s*(["']).*?\1)*\s*\/?>/m
LAZY_END_HTML_STOP = /<\/(?!(?:#{LAZY_END_HTML_SPAN_ELEMENTS.join('|')})\b)#{REXML::Parsers::BaseParser::UNAME_STR}\s*>/m
LAZY_END = /#{BLANK_LINE}|#{IAL_BLOCK_START}|#{EOB_MARKER}|^#{OPT_SPACE}#{LAZY_END_HTML_STOP}|^#{OPT_SPACE}#{LAZY_END_HTML_START}|\Z/
PARAGRAPH_START = /^#{OPT_SPACE}[^ \t].*?\n/
PARAGRAPH_MATCH = /^.*?\n/
PARAGRAPH_END = /#{LAZY_END}|#{DEFINITION_LIST_START}/
EMPHASIS_START = /(?:\*\*?|__?)/
BLOCK_MATH_START = /^#{OPT_SPACE}(\\)?\$\$(.*?)\$\$(\s*?\n)?/m
INLINE_MATH_START = /\$\$(.*?)\$\$/
ABBREV_DEFINITION_START = /^#{OPT_SPACE}\*\[(.+?)\]:(.*?)\n/
FOOTNOTE_DEFINITION_START = /^#{OPT_SPACE}\[\^(#{ALD_ID_NAME})\]:\s*?(.*?\n#{CODEBLOCK_MATCH})/
FOOTNOTE_MARKER_START = /\[\^(#{ALD_ID_NAME})\]/
TYPOGRAPHIC_SYMS = [['---', :mdash], ['--', :ndash], ['...', :hellip], ['\\<<', '&lt;&lt;'], ['\\>>', '&gt;&gt;'], ['<< ', :laquo_space], [' >>', :raquo_space], ['<<', :laquo], ['>>', :raquo]]
TYPOGRAPHIC_SYMS_SUBST = Hash[*TYPOGRAPHIC_SYMS.flatten]
TYPOGRAPHIC_SYMS_RE = /#{TYPOGRAPHIC_SYMS.map {|k,v| Regexp.escape(k)}.join('|')}/
BLOCKQUOTE_START = /^#{OPT_SPACE}> ?/
LINK_DEFINITION_START = /^#{OPT_SPACE}\[([^\n\]]+)\]:[ \t]*(?:<(.*?)>|([^'"\n]*?\S[^'"\n]*?))[ \t]*?(?:\n?[ \t]*?(["'])(.+?)\4[ \t]*?)?\n/
LINK_BRACKET_STOP_RE = /(\])|!?\[/
LINK_PAREN_STOP_RE = /(\()|(\))|\s(?=['"])/
LINK_INLINE_ID_RE = /\s*?\[([^\]]+)?\]/
LINK_INLINE_TITLE_RE = /\s*?(["'])(.+?)\1\s*?\)/m
LINK_START = /!?\[(?=[^^])/
HEADER_ID = /(?:[ \t]\{#(\w[\w-]*)\})?/
SETEXT_HEADER_START = /^(#{OPT_SPACE}[^ \t].*?)#{HEADER_ID}[ \t]*?\n(-|=)+\s*?\n/
ATX_HEADER_START = /^\#{1,6}/
ATX_HEADER_MATCH = /^(\#{1,6})(.+?)\s*?#*#{HEADER_ID}\s*?\n/
HTML_MARKDOWN_ATTR_MAP = {"0" => :raw, "1" => :default, "span" => :span, "block" => :block}   Mapping of markdown attribute value to content model. I.e. :raw when "0", :default when "1" (use default content model for the HTML element), :span when "span", :block when block and for everything else nil is returned.
TRAILING_WHITESPACE = /[ \t]*\n/
HTML_BLOCK_START = /^#{OPT_SPACE}<(#{REXML::Parsers::BaseParser::UNAME_STR}|\?|!--|\/)/
HTML_SPAN_START = /<(#{REXML::Parsers::BaseParser::UNAME_STR}|\?|!--|\/)/
CODESPAN_DELIMITER = /`+/
LINE_BREAK = /( |\\\\)(?=\n)/
HR_START = /^#{OPT_SPACE}(\*|-|_)[ \t]*\1[ \t]*\1(\1|[ \t])*\n/
Data = Struct.new(:name, :start_re, :span_start, :method)   Struct class holding all the needed data for one block/span-level parser method.
INDENT = /^(?:\t| {4})/   Regexp for matching indentation (one tab or four spaces)
OPT_SPACE = / {0,3}/   Regexp for matching the optional space (zero or up to three spaces)

Protected Class methods

Add a parser method

  • with the given name,
  • using start_re as start regexp
  • and, for span parsers, span_start as a String that can be used in a regexp and which identifies the starting character(s)

to the registry. The method name is automatically derived from the name or can explicitly be set by using the meth_name parameter.

Return true if there is a parser called name.

Return the Data structure for the parser name.

Public Instance methods

This helper methods adds the approriate attributes to the element el of type a or img and the element itself to the @tree.

Return true if we are after a block boundary.

Return true if we are before a block boundary.

Normalize the link identifier.

The source string provided on initialization is parsed into the @root element.

Parse the link definition at the current location.

Parse the string str and extract all attributes and add all found attributes to the hash opts.

Parse the Atx header at the current location.

Parse the autolink at the current location.

Parse the blank line at the current postition.

Parse one of the block extensions (ALD, block IAL or generic extension) at the current location.

Parse the HTML at the current position as block-level HTML.

Parse the math block at the current location.

Parse the blockquote at the current location.

Parse the indented codeblock at the current location.

Parse the fenced codeblock at the current location.

Parse the codespan at the current scanner location.

Parse the ordered or unordered list at the current location.

Parse the emphasis at the current location.

Parse the EOB marker at the current location.

Parse the backslash-escaped character at the current location.

Parse the generic extension at the current point. The parameter type can either be :block or :span depending whether we parse a block or span extension tag.

Used for parsing the first line of a list item or a definition, i.e. the line with list item marker or the definition marker.

Parse the foot note definition at the current location.

Parse the footnote marker at the current location.

Parse the horizontal rule at the current location.

Parse the HTML entity at the current location.

Parse the inline math at the current location.

Parse the line break at the current location.

Parse the link at the current scanner position. This method is used to parse normal links as well as image links.

Parse the link definition at the current location.

Parse the ordered or unordered list at the current location.

Parse the paragraph at the current location.

Parse the Setext header at the current location.

Parse the smart quotes at current location.

Parse the extension span at the current location.

Parse the HTML at the current position as span-level HTML.

Parse the table at the current location.

Parse the typographic symbols at the current location.

Replace the abbreviation text with elements.

Update the ial with the information from the inline attribute list opts.

Protected Instance methods

Adapt the object to allow parsing like specified in the options.

Create a new block-level element, taking care of applying a preceding block IAL if it exists. This method should always be used for creating a block-level element!

Parse all block-level elements in text into the element el.

Parse all span-level elements in the source string of @src into el.

If the parameter stop_re (a regexp) is used, parsing is immediately stopped if the regexp matches and if no block is given or if a block is given and it returns true.

The parameter parsers can be used to specify the (span-level) parsing methods that should be used for parsing.

The parameter text_type specifies the type which should be used for created text nodes.

Reset the current parsing environment. The parameter env can be used to set initial values for one or more environment variables.

Restore the current parsing environment.

Return the current parsing environment.

Update the given attributes hash attr with the information from the inline attribute list ial and all referenced ALDs.

Update the tree by parsing all :raw_text elements with the span-level parser (resets the environment) and by updating the attributes from the IALs.

[Validate]