html - Why can Haskell not handle characters from a specific website? -


i wondering if can write haskell program check updates of novels on demand, , website using example this. , got problem when displaying contents of (on mac el capitan). simple codes follow:

import network.http  openurl :: string -> io string openurl = (>>= getresponsebody) . simplehttp . getrequest  display :: string -> io () display = (>>= putstrln) . openurl 

then, when run display "http://www.piaotian.net/html/7/7430/" on ghci, strange characters appear; first lines this:

<title>×ß½øÐÞÏÉ×îÐÂÕ½Ú,×ß½øÐÞÏÉÎÞµ¯´°È«ÎÄÔĶÁ_Æ®ÌìÎÄѧ</title> <meta http-equiv="content-type" content="text/html; charset=gbk" /> <meta name="keywords" content="×ß½øÐÞÏÉ,×ß½øÐÞÏÉ×îÐÂÕ½Ú,×ß½øÐÞÏÉÎÞµ¯´° Æ®ÌìÎÄѧ" /> <meta name="description" content="Æ®ÌìÎÄÑ§ÍøÌṩ×ß½øÐÞÏÉ×îÐÂÕ½ÚÃâ·ÑÔĶÁ£¬Ç뽫×ß½øÐÞÏÉÕ½ÚĿ¼¼ÓÈëÊղط½±ãÏ´ÎÔĶÁ,Æ®ÌìÎÄѧС˵ÔĶÁÍø¾¡Á¦ÔÚµÚһʱ¼ä¸üÐÂС˵×ß½øÐÞÏÉ£¬Èç·¢ÏÖδ¼°Ê±¸üУ¬ÇëÁªÏµÎÒÃÇ¡£" /> <meta name="copyright" content="×ß½øÐÞÏɰæÈ¨ÊôÓÚ×÷ÕßÎáµÀ³¤²»¹Â" /> <meta name="author" content="ÎáµÀ³¤²»¹Â" /> <link rel="stylesheet" href="/scripts/read/list.css" type="text/css" media="all" /> <script type="text/javascript"> 

i tried download file follows:

import network.http  openurl :: string -> io string openurl = (>>= getresponsebody) . simplehttp . getrequest  downloading :: string -> io () downloading = (>>= writefile filename) . openurl 

but after downloading file, in photo: enter image description here

if download page python (using urllib example) characters displayed normally. also, if write chinese html , parse it, there seems no problem. seems problem on website. however, don't see difference between characters of site , write.

any on reason behind appreciated.

p.s.
python code follows:

import urllib  urllib.urlretrieve('http://www.piaotian.net/html/7/7430/', thefic)  thefic = file_path 

and file fine , good.

since said interested in links, there no need convert gbk encoding unicode.

here version prints out links "123456.html" in document:

#!/usr/bin/env stack {- stack   --resolver lts-6.0 --install-ghc runghc   --package wreq --package lens   --package tagsoup -}  {-# language overloadedstrings #-}  import network.wreq import qualified data.bytestring.lazy.char8 lbs import control.lens import text.html.tagsoup import data.char import control.monad  -- match \d+\.html isnumberhtml lbs = (lbs.dropwhile isdigit lbs) == ".html"  wanted t = istagopenname "a" t && isnumberhtml (fromattrib "href" t)  main =   r <- "http://www.piaotian.net/html/7/7430/"   let body = r ^. responsebody :: lbs.bytestring       tags = parsetags body       links = filter wanted tags       hrefs = map (fromattrib "href") links   form_ hrefs lbs.putstrln 

Comments