i'm trying web scraping of site generates it's data via javascript. i've done enough reading on here know way scrape these to:
- watch network tab in firebug happens when make request
- isolate xhr requests , recreate them in script.
so, when 1, post request sent link visible in screenshot: , can see response gets. looks great, right?
but when try , recreate request & response, payload see under post tab in firebug, in python so:
import requests bs4 import beautifulsoup payload = {"max":999,"rectcoord":"89,-179,-89,179","source":"","sortfield":"newid()","officename":"","firstname" :"","lastname":"da","cityname":"","zipcode":"","category":"s","seclanguagereq":"","officecode":""} r = requests.post('http://search.cnyrealtor.com/myajaxservice.asmx/membersearch', data=payload) print(r.content)
i page displays error message: request format unrecognized url unexpectedly ending in \'/membersearch\'
so, question - why getting response when response in firebug works fine? missing in requests.post(url)
line in python script?
you need dump dictionary json , send payload. important set content-type
request header well:
import json import requests payload = {"max": 999, "rectcoord": "89,-179,-89,179", "source": "", "sortfield": "newid()", "officename": "", "firstname": "", "lastname": "", "cityname": "", "zipcode": "", "category": "s", "seclanguagereq": "", "officecode": ""} requests.session() session: session.get("http://search.cnyrealtor.com/sitecontent/syr/membersearchsyr.aspx") r = session.post('http://search.cnyrealtor.com/myajaxservice.asmx/membersearch', data=json.dumps(payload), headers={"content-type": "application/json; charset=utf-8"}) print(r.content)
Comments
Post a Comment