i wondering if can write haskell program check updates of novels on demand, , website using example this. , got problem when displaying contents of (on mac el capitan). simple codes follow:
import network.http openurl :: string -> io string openurl = (>>= getresponsebody) . simplehttp . getrequest display :: string -> io () display = (>>= putstrln) . openurl
then, when run display "http://www.piaotian.net/html/7/7430/"
on ghci, strange characters appear; first lines this:
<title>×ß½øÐÞÏÉ×îÐÂÕ½Ú,×ß½øÐÞÏÉÎÞµ¯´°È«ÎÄÔĶÁ_Æ®ÌìÎÄѧ</title> <meta http-equiv="content-type" content="text/html; charset=gbk" /> <meta name="keywords" content="×ß½øÐÞÏÉ,×ß½øÐÞÏÉ×îÐÂÕ½Ú,×ß½øÐÞÏÉÎÞµ¯´° Æ®ÌìÎÄѧ" /> <meta name="description" content="Æ®ÌìÎÄÑ§ÍøÌṩ×ß½øÐÞÏÉ×îÐÂÕ½ÚÃâ·ÑÔĶÁ£¬Ç뽫×ß½øÐÞÏÉÕ½ÚĿ¼¼ÓÈëÊղط½±ãÏ´ÎÔĶÁ,Æ®ÌìÎÄѧС˵ÔĶÁÍø¾¡Á¦ÔÚµÚһʱ¼ä¸üÐÂС˵×ß½øÐÞÏÉ£¬Èç·¢ÏÖδ¼°Ê±¸üУ¬ÇëÁªÏµÎÒÃÇ¡£" /> <meta name="copyright" content="×ß½øÐÞÏɰæÈ¨ÊôÓÚ×÷ÕßÎáµÀ³¤²»¹Â" /> <meta name="author" content="ÎáµÀ³¤²»¹Â" /> <link rel="stylesheet" href="/scripts/read/list.css" type="text/css" media="all" /> <script type="text/javascript">
i tried download file follows:
import network.http openurl :: string -> io string openurl = (>>= getresponsebody) . simplehttp . getrequest downloading :: string -> io () downloading = (>>= writefile filename) . openurl
but after downloading file, in photo:
if download page python (using urllib example) characters displayed normally. also, if write chinese html , parse it, there seems no problem. seems problem on website. however, don't see difference between characters of site , write.
any on reason behind appreciated.
p.s.
python code follows:
import urllib urllib.urlretrieve('http://www.piaotian.net/html/7/7430/', thefic) thefic = file_path
and file fine , good.
since said interested in links, there no need convert gbk encoding unicode.
here version prints out links "123456.html" in document:
#!/usr/bin/env stack {- stack --resolver lts-6.0 --install-ghc runghc --package wreq --package lens --package tagsoup -} {-# language overloadedstrings #-} import network.wreq import qualified data.bytestring.lazy.char8 lbs import control.lens import text.html.tagsoup import data.char import control.monad -- match \d+\.html isnumberhtml lbs = (lbs.dropwhile isdigit lbs) == ".html" wanted t = istagopenname "a" t && isnumberhtml (fromattrib "href" t) main = r <- "http://www.piaotian.net/html/7/7430/" let body = r ^. responsebody :: lbs.bytestring tags = parsetags body links = filter wanted tags hrefs = map (fromattrib "href") links form_ hrefs lbs.putstrln
Comments
Post a Comment