i have been making web scraper website , wanting extract node numbers html table using .findall or work struggling it, getting errors not putting in right tags.
can help, html code follows
</div> <table class="datatable" cellpadding="5" cellspacing="0" rules="all" border="1" id="ctl00_contentplaceholder1_dgnodes" style="border-collapse:collapse;"> <tr class="header nobreak"> <td> </td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl00','')">node name</a></td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl01','')">description</a></td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl02','')">mac address</a></td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl03','')"></a> <a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$linoderoleheader','')" id="ctl00_contentplaceholder1_dgnodes_ctl00_linoderoleheader">node role</a> </td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl04','')">firmware</a></td><td> <a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$lbuptimeheader','')" id="ctl00_contentplaceholder1_dgnodes_ctl00_lbuptimeheader">uptime</a> </td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl05','')">users</a></td> </tr><tr onmouseover="this.classname = 'highlightedrow';" onmouseout="this.classname = 'normalrow';" onclick="gotonodepage('522');" style="height:18px;">
i need extract number 522 on last line of code , other gotonodepage numbers cant figure out, appreciated. want put extracted numbers list of later use.
r2 = s2.get(webpage) bsobjswap = beautifulsoup(r2.content) listy = [] link in bsobjswap.findall('tr'): if 'onclick' in link.attrs: listy.append(link) print (listy)
error link in bsobjswap.findall('tr'): typeerror: 'nonetype' object not callable
try this:
from bs4 import beautifulsoup xml = """<table class="datatable" cellpadding="5" cellspacing="0" rules="all" border="1" id="ctl00_contentplaceholder1_dgnodes" style="border-collapse:collapse;"> <tr class="header nobreak"> <td> </td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl00','')">node name</a></td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl01','')">description</a></td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl02','')">mac address</a></td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl03','')"></a> <a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$linoderoleheader','')" id="ctl00_contentplaceholder1_dgnodes_ctl00_linoderoleheader">node role</a> </td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl04','')">firmware</a></td><td> <a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$lbuptimeheader','')" id="ctl00_contentplaceholder1_dgnodes_ctl00_lbuptimeheader">uptime</a> </td><td><a href="javascript:__dopostback('ctl00$contentplaceholder1$dgnodes$ctl00$ctl05','')">users</a></td> </tr><tr onmouseover="this.classname = 'highlightedrow';" onmouseout="this.classname = 'normalrow';" onclick="gotonodepage('522');" style="height:18px;">""" soup = beautifulsoup(xml) print([i.get('onclick') in soup.findall('tr', attrs={'onclick':true})])
this return ["gotonodepage('522');"]
from here can extract number regex example
print([re.findall("\d+", i.get('onclick')) in soup.findall('tr', attrs={'onclick':true})])
this return [['522']]
Comments
Post a Comment