html - Why can Haskell not handle characters from a specific website? -

i wondering if can write haskell program check updates of novels on demand, , website using example this. , got problem when displaying contents of (on mac el capitan). simple codes follow:

import network.http  openurl :: string -> io string openurl = (>>= getresponsebody) . simplehttp . getrequest  display :: string -> io () display = (>>= putstrln) . openurl

then, when run display "http://www.piaotian.net/html/7/7430/" on ghci, strange characters appear; first lines this:

<title>×ß½øÐÞÏÉ×îÐÂÕÂ½Ú,×ß½øÐÞÏÉÎÞµ¯´°È«ÎÄÔÄ¶Á_Æ®ÌìÎÄÑ§</title> <meta http-equiv="content-type" content="text/html; charset=gbk" /> <meta name="keywords" content="×ß½øÐÞÏÉ,×ß½øÐÞÏÉ×îÐÂÕÂ½Ú,×ß½øÐÞÏÉÎÞµ¯´° Æ®ÌìÎÄÑ§" /> <meta name="description" content="Æ®ÌìÎÄÑ§ÍøÌá¹©×ß½øÐÞÏÉ×îÐÂÕÂ½ÚÃâ·ÑÔÄ¶Á£¬Çë½«×ß½øÐÞÏÉÕÂ½ÚÄ¿Â¼¼ÓÈëÊÕ²Ø·½±ãÏÂ´ÎÔÄ¶Á,Æ®ÌìÎÄÑ§Ð¡ËµÔÄ¶ÁÍø¾¡Á¦ÔÚµÚÒ»Ê±¼ä¸üÐÂÐ¡Ëµ×ß½øÐÞÏÉ£¬Èç·¢ÏÖÎ´¼°Ê±¸üÐÂ£¬ÇëÁªÏµÎÒÃÇ¡£" /> <meta name="copyright" content="×ß½øÐÞÏÉ°æÈ¨ÊôÓÚ×÷ÕßÎáµÀ³¤²»¹Â" /> <meta name="author" content="ÎáµÀ³¤²»¹Â" /> <link rel="stylesheet" href="/scripts/read/list.css" type="text/css" media="all" /> <script type="text/javascript">

i tried download file follows:

import network.http  openurl :: string -> io string openurl = (>>= getresponsebody) . simplehttp . getrequest  downloading :: string -> io () downloading = (>>= writefile filename) . openurl

but after downloading file, in photo:

if download page python (using urllib example) characters displayed normally. also, if write chinese html , parse it, there seems no problem. seems problem on website. however, don't see difference between characters of site , write.

any on reason behind appreciated.

p.s.
python code follows:

import urllib  urllib.urlretrieve('http://www.piaotian.net/html/7/7430/', thefic)  thefic = file_path

and file fine , good.

since said interested in links, there no need convert gbk encoding unicode.

here version prints out links "123456.html" in document:

#!/usr/bin/env stack {- stack   --resolver lts-6.0 --install-ghc runghc   --package wreq --package lens   --package tagsoup -}  {-# language overloadedstrings #-}  import network.wreq import qualified data.bytestring.lazy.char8 lbs import control.lens import text.html.tagsoup import data.char import control.monad  -- match \d+\.html isnumberhtml lbs = (lbs.dropwhile isdigit lbs) == ".html"  wanted t = istagopenname "a" t && isnumberhtml (fromattrib "href" t)  main =   r <- "http://www.piaotian.net/html/7/7430/"   let body = r ^. responsebody :: lbs.bytestring       tags = parsetags body       links = filter wanted tags       hrefs = map (fromattrib "href") links   form_ hrefs lbs.putstrln

WIn

Search This Blog

html - Why can Haskell not handle characters from a specific website? -

Comments

Post a Comment