2.3 / 2012-02-20
- Minor enhancement
- Bug fixes
- Applied Mechanize#max_file_buffer to
the Content-Encoding handlers as well to prevent extra Tempfiles for small
gzip or deflate response
- Increased the default Mechanize#max_file_buffer to
100,000 bytes. This gives ~5MB of response bodies in memory with the
default history setting of 50 pages (depending on GC behavior).
- Ignore empty path/domain attributes.
- Cookies with an empty Expires attribute value were stored as session
cookies but cookies without the Expires attribute were not. Issue 78
2.2.1 / 2012-02-13
- Bug fixes
- Add missing file to the gem, ensure that missing files won‘t cause
failures again. Issue 201 by Alex
- Fix minor grammar issue in README. Issue 200 by Shane Becker.
2.2 / 2012-02-12
- API changes
- MetaRefresh#href is not normalized to an absolute URL, but set to the
original value and resolved later. It is even set to nil when the Refresh
URL is unspecified or empty.
- Minor enhancements
- Expose ssl_version from net-http-persistent. Patch by astera.
- SSL parameters and proxy may now be set at any time. Issue 194 by dsisnero.
- Improved Mechanize::Page with
image_with and images_with and Mechanize::Page::Image
various img element attribute accessors, caption, extname, mime_type and
fetch. Pull request 173 by kitamomonga
- Added MIME type parsing for content-types in Mechanize::PluggableParser
for fine-grained parser choices. Parsers will be chosen based on exact
match, simplified type or media type in that order. See Mechanize::PluggableParser#[]=.
- Added Mechanize#download
which downloads a response body to an IO-like or filename.
- Added Mechanize::DirectorySaver
which saves responses in a single directory. Issue 187 by yoshie902a.
- Added Mechanize::Page::Link#noreferrer?
- The documentation for Mechanize::Page#search and at now show that both
XPath and CSS expressions are allowed. Issue 199 by Shane Becker.
- Bug fixes
- Fixed handling of a HEAD request with Accept-Encoding: gzip. Issue 198 by
Oleg Dashevskii
- Use resolve for resolving a Location header value. fixes 197
- A Refresh value can have whitespaces around the semicolon and equal sign.
- MetaRefresh#click no longer sends out Referer.
- A link with an empty href is now resolved correctly where previously the
query part was dropped.
2.1.1 / 2012-02-03
- Bug fixes
- Set missing idle_timeout default. Issue 196
- Meta refresh URIs are now escaped (excluding %). Issue 177
- Fix charset name extraction. Issue 180
- A Referer URI sent on request no longer includes user information or
fragment part.
- Tempfiles for storing response bodies are unlinked upon creation to avoid
possible lack of finalization. Issue 183
- The default maximum history size is now 50 pages to avoid filling up a disk
with tempfiles accidentally. Related to Issue 183
- Errors in bodies with deflate and gzip responses now result in a Mechanize::Error instead of
silently being ignored and causing future errors. Issue 185
- Mechanize now raises an
UnauthorizedError instead of crashing when a 403 response does not contain
a www-authenticate header. Issue 181
- Mechanize gives a useful exception
when attempting to click buttons across pages. Issue 186
- Added note to Mechanize#cert_store
describing how to add certificates in case your system does not come with a
default set. Issue 179
- Invalid content-disposition headers are now ignored. Issue 191
- Fix NTLM by recognizing the "Negotiation" challenge instead of
endlessly looping. Issue 192
- Allow specification of the NTLM domain through Mechanize#auth. Issue 193
- Documented how to convert a Mechanize::ResponseReadError
into a File or Page, along with a new method force_parse. Issue 176
2.1 / 2011-12-20
- Deprecations
- Mechanize#get no longer
accepts an options hash.
- Mechanize::Util::to_native_charset has been removed.
- Minor enhancements
- Mechanize now depends on
net-http-persistent 2.3+. This new version brings idle timeouts to help
with the dreaded "too many connection resets" issue when POSTing
to a closed connection. Issue 123
- SSL connections will be verified against the system certificate store by
default.
- Added Mechanize#retry_change_requests
to allow mechanize to retry POST and other non-idempotent requests when you
know it is safe to do so. Issue 123
- Mechanize can now stream files
directly to disk without loading them into memory first through Mechanize::Download, a
pluggable parser for downloading files.
All responses larger than Mechanize#max_file_buffer are
downloaded to a Tempfile. For backwards compatibility Mechanize::File subclasses still
load the response body into memory.
To force all unknown content types to download to disk instead of memory
set:
agent.pluggable_parser.default = Mechanize::Download
- Added Mechanize#content_encoding_hooks
which allow handling of non-standard content encodings like
"agzip". Patch 125 by kitamomonga
- Added dom_class to elements and the element matcher like dom_id. Patch 156
by Dan Hansen.
- Added support for the HTML5 keygen form element. See dev.w3.org/html5/spec/Overview.html#the-keygen-element
Patch 157 by Victor Costan.
- Mechanize no longer follows meta
refreshes that have no "url=" in the content attribute to avoid
infinite loops. To follow a meta refresh to the same page set Mechanize#follow_meta_refresh_self
to true. Issue 134 by Jo Hund.
- Updated ‘Mac Safari’ User-Agent alias to Safari 5.1.1.
‘Mac Safari 4’ can be used for the old ‘Mac Safari’
alias.
- When given multiple HTTP authentication options mechanize now picks the
strongest method.
- Improvements to HTTP authorization:
- mechanize raises Mechanize::UnathorizedError for 401 responses which is a
sublcass of Mechanize::ResponseCodeError.
- Added support for NTLM authentication, but this has not been tested.
- Mechanize::Cookie.new accepts attributes in a hash.
- Mechanize::CookieJar#<<(cookie)
(alias: add!) is added. Issue 139
- Different mechanize instances may now have different loggers. Issue 122
- Mechanize now accepts a proxy port
as a service name or number string. Issue 167
- Bug fixes
- Mechanize now handles cookies just
as most modern browsers do, roughly based on RFC 6265.
- domain=.example.com (which is invalid) is considered identical to
domain=example.com.
- A cookie with domain=example.com is sent to host.sub.example.com as well as
host.example.com and example.com.
- A cookie with domain=TLD (no dots) is accepted and sent if the host name is
TLD, and rejected otherwise. To retain compatibility and convention,
host/domain names starting with "local" are exempt from this
rule.
- A cookie with no domain attribute is only sent to the original host.
- A cookie with an Effective TLD is rejected based on the public suffix list.
(cf. publicsuffix.org/)
- "Secure" cookies are not sent via non-https connection.
- Subdomain match is not performed against an IP address.
- It is recommended that you clear out existing cookie jars for regeneration
because previously saved cookies may not have been parsed correctly.
- Mechanize takes more care to avoid
saving files with certain unsafe names. You should still take care not to
use mechanize to save files directly into your home directory ($HOME).
Issue 163.
- Mechanize#cookie_jar= works
again. Issue 126
- The original Referer value persists on redirection. Issue 150
- Do not send a referer on a Refresh header based redirection.
- Fixed encoding error in tests when LANG=C. Patch 142 by jinschoi.
- The order of items in a form submission now match the DOM order. Patch 129
by kitamomonga
- Fixed proxy example in EXAMPLE. Issue 146 by NielsKSchjoedt
2.0.1 / 2011-06-28
Mechanize now uses minitest to
avoid 1.9 vs 1.8 assertion availability in test/unit
- Bug Fixes
- Restored Mechanize#set_proxy. Issue
117, 118, 119
- Mechanize::CookieJar#load now lazy-loads YAML. Issue 118
- Mechanize#keep_alive_time no longer crashes but does nothing as
net-http-persistent does not support HTTP/1.0 keep-alive extensions.
2.0 / 2011-06-27
Mechanize is now under the MIT
license
- API changes
- WWW::Mechanize has been removed. Use Mechanize.
- Pre connect hooks are now called with the agent and the request. See Mechanize#pre_connect_hooks.
- Post connect hooks are now called with the agent and the response. See Mechanize#post_connect_hooks.
- Mechanize::Chain is gone, as an internal API this should cause no problems.
- Mechanize#fetch_page no longer accepts an options Hash.
- Mechanize#put now accepts
headers instead of an options Hash as the last argument
- Mechanize#delete now
accepts headers instead of an options Hash as the last argument
- Mechanize#request_with_entity
now accepts headers instead of an options Hash as the last argument
- Mechanize no longer raises
RuntimeError directly, Mechanize::Error or
ArgumentError are raised instead.
- The User-Agent header has changed. It no longer includes the WWW- prefix
and now includes the ruby version. The URL has been updated as well.
- Mechanize now requires ruby 1.8.7
or newer.
- Hpricot support has been removed as webrobots requires nokogiri.
- Mechanize#get no longer
accepts the referer as the second argument.
- Mechanize#get no longer
allows the HTTP method to be changed (:verb option).
- Mechanize::Page::Meta is now Mechanize::Page::MetaRefresh
to accurately depict its responsibilities.
- Mechanize::Page#meta is now Mechanize::Page#meta_refresh as it only
contains meta elements with http-equiv of "refresh"
- Mechanize::Page#charset is now Mechanize::Page::charset. GH 112, patch by
Godfrey Chan.
- Deprecations
- Mechanize#get with an
options hash is deprecated and will be removed after October, 2011.
- Mechanize::Util::to_native_charset is deprecated as it is no longer used by
Mechanize.
- New Features
- Add header reference methods to Mechanize::File so that a reponse
object gets compatible with Net::HTTPResponse.
- Mechanize#click accepts a
regexp or string to click a button/link in the current page. It works as
expected when not passed a string or regexp.
- Provide a way to only follow permanent redirects (301) automatically:
agent.redirect_ok = :permanent GH 73
- Mechanize now supports HTML5 meta
charset. GH 113
- Documented various Mechanize
accessors. GH 66
- Mechanize now uses
net-http-digest_auth. GH 31
- Mechanize now implements session
cookies. GH 78
- Mechanize now implements deflate
decoding. GH 40
- Mechanize now allows a certificate
and key to be passed directly. GH 71
- Mechanize::Form::MultiSelectList
now implements option_with and options_with. GH 42
- Add Mechanize::Page::Link#rel and rel?(kind) to read and test the rel
attribute.
- Add Mechanize::Page#canonical_uri to read a </tt><link
rel="canonical"></tt> tag.
- Add support for Robots Exclusion Protocol (i.e. robots.txt) and
nofollow/noindex in meta tags and the rel attribute. Automatic exclusion
can be turned on by setting:
agent.robots = true
- Manual robots.txt test can be performed with Mechanize#robots_allowed? and
robots_disallowed?.
- Mechanize::Form now supports
the accept-charset attribute. GH 96
- Mechanize::ResponseReadError
is raised if there is an exception while reading the response body. This
allows recovery from broken HTTP servers (or connections). GH 90
- Mechanize#follow_meta_refresh
set to :anywhere will follow meta refresh found outside of a
document‘s head. GH 99
- Add support for HTML5‘s rel="noreferrer" attribute which
indicates no "Referer" information should be sent when following
the link.
- A frame will now load its content when content is called. GH 111
- Added Mechanize#default_encoding to provide a default for pages with no
encoding specified. GH 104
- Added Mechanize#force_default_encoding which only uses
Mechanize#default_encoding for parsing HTML. GH 104
- Bug Fixes:
- Fixed a bug where Referer is not sent when accessing a relative URI
starting with "http".
- Fix handling of Meta Refresh with relative paths. GH 39
- Mechanize::CookieJar now
supports RFC 2109 correctly. GH 85
- Fixed typo in EXAMPLES.rdoc. GH 74
- The base element is now handled correctly for images. GH 72
- Image buttons with no name attribute are now included in the form‘s
button list. GH#56
- Improved handling of non ASCII-7bit compatible characters in links (only an
issue on ruby 1.8). GH 36, GH 75
- Loading cookies.txt is faster. GH 38
- Mechanize no longer sends cookies
for a.b.example to axb.example. GH 41
- Mechanize no longer sends the
button name as a form field for image buttons. GH 45
- Blank cookie values are now skipped. GH 80
- Mechanize now adds a
’.’ to cookie domains if no ’.’ was sent. This is
not allowed by RFC 2109 but does appear in RFC 2965. GH 86
- file URIs are now read in binary mode. GH 83
- Content-Encoding: x-gzip is now treated like gzip per RFC 2616.
- Mechanize now unescapes URIs for
meta refresh. GH 68
- Mechanize now has more robust HTML
charset detection. GH 43
- Mechanize::Form::Textarea
is now created from a textarea element. GH 94
- A meta content-type now overrides the HTTP content type. GH 114
- Mechanize::Page::Link#uri now handles both escaped and unescaped hrefs. GH
107
1.0.0
- New Features:
- An optional verb may be passed to Mechanize#get GH 26
- The WWW constant is deprecated. Switch to the top level constant Mechanize
- SelectList#option_with and options_with for finding options
- Bug Fixes:
- Rescue errors from bogus encodings
- 7bit content-encoding support. Thanks sporkmonger! GH 2
- Fixed a bug with iconv conversion. Thanks awesomeman! GH 9
- meta redirects outside the head are not followed. GH 13
- Form submissions work with nil page encodings. GH 25
- Fixing default values with serialized cookies. GH 3
- Checkboxes and fields are sorted by page appearance before submitting. 11
0.9.3
- Bug Fixes:
- Do not apply encoding if encoding equals ‘none’ Thanks Akinori
MUSHA!
- Fixed Page#encoding= when changing the value from or to nil. Made it return
the assigned value while at it. (Akinori MUSHA)
- Custom request headers may be supplied WWW::Mechanize#request_headers
RF 24516
- HTML Parser may be set on a per instance level WWW::Mechanize#html_parser
RF 24693
- Fixed string encoding in ruby 1.9. RF 2433
- Rescuing Zlib::DataErrors (Thanks Kelley Reynolds)
- Fixing a problem with frozen SSL objects. RF 24950
- Do not send a referer on meta refresh. RF 24945
- Fixed a bug with double semi-colons in Content-Disposition headers
- Properly handling cookies that specify a path. RF 25259
0.9.2 / 2009/03/05
- New Features:
- Bug Fixes:
- Fixed a bug with bad cookie parsing
- Form::RadioButton#click unchecks other buttons (RF 24159)
- Fixed problems with Iconv (RF 24190, RF 24192, RF 24043)
- POST parameters should be CGI escaped
- Made Content-Type match case insensitive (Thanks Kelly Reynolds)
- Non-string form parameters work
0.9.1 2009/02/23
- New Features:
- Encoding may be specified for a page: Page#encoding=
- Bug Fixes:
- m17n fixes. ありがとう konn!
- Fixed a problem with base tags. ありがとう Keisuke
- HEAD requests do not record in the history
- Default encoding to ISO-8859-1 instead of ASCII
- Requests with URI instances should not be polluted RF 23472
- Nonce count fixed for digest auth requests. Thanks Adrian Slapa!
- Fixed a referer issue with requests using a uri. RF 23472
- WAP content types will now be parsed
- Rescued poorly formatted cookies. Thanks Kelley Reynolds!
0.9.0
- Deprecations
- WWW::Mechanize::List is gone!
- Mechanize uses Nokogiri as
it‘s HTML parser but you may switch to Hpricot by using
WWW::Mechanize.html_parser = Hpricot
- Bug Fixes:
- Nil check on page when base tag is used 23021
0.8.5
- Deprecations
- WWW::Mechanize::List will be deprecated in 0.9.0, and warnings have been
added to help you upgrade.
- Bug Fixes:
0.8.4
- Bug Fixes:
- Setting the port number on the host header.
- Fixing Authorization headers for picky servers
0.8.3
- Bug Fixes:
- Making sure logger is set during SSL connections.
0.8.2
- Bug Fixes:
- Doh! I was accidentally setting headers twice.
0.8.1
- Bug Fixes:
- Fixed problem with nil pointer when logger is set
0.8.0
- New Features:
- Bug Fixes:
- Fixed an infinite loop when content-length and body length don‘t
match.
- Only setting headers once
- Adding IIS authentication support
0.7.8
- Bug Fixes:
- Fixed bug when receiving a 304 response (HTTPNotModified) on a page not
cached in history.
- 21428 Default to HTML parser for ‘application/xhtml+xml’
content-type.
- Fixed an issue where redirects were resending posted data
0.7.7
- New Features:
- Page#form_with takes a criteria hash.
- Page#form is changed to Page#form_with
- Mechanize#get takes custom
http headers. Thanks Mike Dalessio!
- Form#click_button submits a form defaulting to the current button.
- Form#set_fields now takes a hash. Thanks Tobi!
- Mechanize#redirection_limit=
for setting the max number of redirects.
- Bug Fixes:
- Added more examples. Thanks Robert Jackson.
- 20480 Making sure the Host header is set.
- 20672 Making sure cookies with weird semicolons work.
- Fixed bug with percent signs in urls. d.hatena.ne.jp/kitamomonga/20080410/ruby_mechanize_percent_url_bug
- 21132 Not checking for EOF errors on redirect
- Fixed a weird gzipping error.
- 21233 Smarter multipart boundry. Thanks Todd Willey!
- 20097 Supporting meta tag cookies.
0.7.6
- New Features:
- Added support for reading Mozilla cookie jars. Thanks Chris Riddoch!
- Moving text, password, hidden, int to default. Thanks Tim Harper!
- Mechanize#history_added callback for page loads. Thanks Tobi Reif!
- Mechanize#scheme_handlers
callbacks for handling unsupported schemes on links.
- Bug Fixes:
0.7.5
- Fixed a bug when fetching files and not pages. Thanks Mat Schaffer!
0.7.4
0.7.3
0.7.2
- Handling gzipped responses with no Content-Length header
0.7.1
- Added iPhone to the user agent aliases. [17572]
- Fixed a bug with EOF errors in net/http. [17570]
- Handling 0 length gzipped responses. [17471]
0.7.0
- Removed Ruby 1.8.2 support
- Changed parser to lazily parse links
- Lazily parsing document
- Adding verify_callback for SSL requests. Thanks Mike Dalessio!
- Fixed a bug with Accept-Language header. Thanks Bill Siggelkow.
0.6.11
0.6.10
0.6.9
0.6.8
- Keep alive can be shut off now with WWW::Mechanize#keep_alive
- Conditional requests can be shut off with WWW::Mechanize#conditional_requests
- Monkey patched Net::HTTP#keep_alive?
- [9877] Moved last request time. Thanks Max Stepanov
- Added WWW::Mechanize::File#save
- Defaulting file name to URI or Content-Disposition
- Updating compatability with hpricot
- Added more unit tests
0.6.7
- Fixed a bug with keep-alive requests
- [9549] fixed problem with cookie paths
0.6.6
- Removing hpricot overrides
- Fixed a bug where alt text can be nil. Thanks Yannick!
- Unparseable expiration dates in cookies are now treated as session cookies
- Caching connections
- Requests now default to keep alive
- [9434] Fixed bug where html entities weren‘t decoded
- [9150] Updated mechanize history to deal with redirects
0.6.5
0.6.4
0.6.3
0.6.2
0.6.1
0.6.0
- Changed main parser to use hpricot
- Made WWW::Mechanize::Page class searchable like hpricot
- Updated WWW::Mechanize#click to
support hpricot links like this: @agent.click (page/"a").first
- Clicking a Frame is now possible: @agent.click
(page/"frame").first
- Removed deprecated attr_finder
- Removed REXML helper methods since the main parser is now hpricot
- Overhauled cookie parser to use WEBrick::Cookie
0.5.4
0.5.3
0.5.2
- Fixed a bug with input names that are nil
- Added a warning when using attr_finder because attr_finder will be
deprecated in 0.6.0 in favor of method calls. So this syntax:
@agent.links(:text => 'foo')
should be changed to this:
@agent.links.text('foo')
- Added support for selecting multiple options in select tags that support
multiple options. See WWW::Mechanize::MultiSelectList.
- New select list methods have been added, select_all, select_none.
- Options for select lists can now be "clicked" which toggles their
selection, they can be "selected" and "unselected". See
WWW::Mechanize::Option
- Added a method to set multiple fields at the same time,
WWW::Mechanize::Form#set_fields. Which can be used like so:
form.set_fields( :foo => 'bar', :name => 'Aaron' )
0.5.1
- Fixed bug with file uploads
- Added performance tweaks to the cookie class
0.5.0
- Added pluggable parsers. (Thanks to Eric Kolve for the idea)
- Changed namespace so all classes are under WWW::Mechanize.
- Updating Forms so that fields can be used as accessors (Thanks Gregory
Brown)
- Added WWW::Mechanize::File as default object used for unknown content
types.
- Added ‘save_as’ method to Mechanize::File, so any page can
be saved.
- Adding ‘save_as’ and ‘load’ to CookieJar so that
cookies can be saved between sessions.
- Added WWW::Mechanize::FileSaver pluggable parser to automatically save
files.
- Added WWW::Mechanize::Page#title for page titles
- Added OpenSSL certificate support (Thanks Mike Dalessio)
- Removed support for body filters in favor of pluggable parsers.
- Fixed cookie bug adding a ’/’ when the url is missing one
(Thanks Nick Dainty)
0.4.7
- Fixed bug with no action in forms. Thanks to Adam Wiggins
- Setting a default user-agent string
- Added house cleaning to the cookie jar so expired cookies don‘t stick
around.
- Added new method WWW::Form#field to find the first field with a given name.
(thanks to Gregory Brown)
- Added WWW::Mechanize#get_file for
fetching non text/html files
0.4.6
- Added support for proxies
- Added a uri field to WWW::Link
- Added a error class WWW::Mechanize::ContentTypeError
- Added image alt text to link text
- Added an visited? method to WWW::Mechanize
- Added Array#value= which will set the first value to the argument. That
allows syntax as such: form.fields.name(‘q’).value =
‘xyz’ Before it was like this:
form.fields.name(‘q’).first.value = ‘xyz‘
0.4.5
- Added support for multiple values of the same name
- Updated build_query_string to take an array of arrays (Thanks Michal
Janeczek)
- Added WWW::Mechanize#body_filter= so that response bodies can be
preprocessed
- Added WWW::Page#body_filter= so that response bodies can be preprocessed
- Added support for more date formats in the cookie parser
- Fixed a bug with empty select lists
- Fixing a problem with cookies not handling no spaces after semicolons
0.4.4
- Fixed error in method signature, basic_authetication is now basic_auth
- Fixed bug with encoding names in file uploads (Big thanks to Alex Young)
- Added options to the select list
0.4.3
- Added syntactic sugar for finding things
- Fixed bug with HttpOnly option in cookies
- Fixed a bug with cookie date parsing
- Defaulted dropdown lists to the first element
- Added unit tests
0.4.2
- Added support for iframes
- Made mechanize dependant on ruby-web rather than narf
- Added unit tests
- Fixed a bunch of warnings
0.4.1
- Added support for file uploading
- Added support for frames (Thanks Gabriel)
- Added more unit tests
- Fixed some bugs
0.4.0
- Added more unit tests
- Added a cookie jar with better cookie support, included expiration of
cookies and general cookie security.
- Updated mechanize to use built in net/http if ruby version is new enough.
- Added support for meta refresh tags
- Defaulted form actions to ‘GET‘
- Fixed various bugs
- Added more unit tests
- Added a response code exception
- Thanks to Brian Ellin (brianellin@gmail.com) for: Added support for CA
files, and support for 301 response codes