Thursday, October 25, 2012

HTML evil tags sanitizing with Python: Bleach is the best library!

Google Search:
html sanitizer for python





>>> import bleach
>>> print bleach.linkify('an http://example.com url')
an <a href="http://example.com" rel="nofollow">http://example.com</a> url
>>> print bleach.linkify('<a href="http://example.com" rel="nofollow">http://example.com</a>')
<a href="http://example.com" rel="nofollow">http://example.com</a>
>>> print bleach.linkify('an <a href="http://example.com" rel="nofollow">http://example.com</a> url')
an <a href="http://example.com" rel="nofollow">http://example.com</a> url
>>>


It's smart enough not to 'double linkify' it.

https://github.com/jsocol/bleach
http://bleach.readthedocs.org/en/latest/goals.html
http://coffeeonthekeyboard.com/bleach-html-sanitizer-and-auto-linker-for-django-344/




Basic Use

The simplest way to use Bleach is:
>>> import bleach

>>> bleach.clean('an <script>evil()</script> example')
u'an &lt;script&gt;evil()&lt;/script&gt; example'

>>> bleach.linkify('an http://example.com url')
u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url
NB: Bleach always returns a unicode object, whether you give it a bytestring or a unicode object, but Bleach does not attempt to detect incoming character encodings, and will assume UTF-8. If you are using a different character encoding, you should convert from a bytestring to unicode before passing the text to Bleach.













No comments:

Post a Comment