i using bs4 scrape text. current output of text has 7 different fields put 7 different lists. code follows:
from bs4 import beautifulsoup import requests urlyears = ['2012'] year in urlyears: soup = beautifulsoup(requests.get("https://en.wikipedia.org/wiki/" + "2012" + "_nfl_draft").content,"html.parser") table = soup.select_one("table.wikitable.sortable") row in table.select("tr + tr"): tds=row.text print (tds)
the printed output show this:
7^ 252 st. louis rams richardson, daryldaryl richardson rb abilene christian lone star 7^ 253 indianapolis colts harnish, chandlerchandler harnish qb niu mac
how can create lists each of these? ultimate goal export csv.
a trivial approach split()
text @ newlines?
import os bs4 import beautifulsoup import requests soup = beautifulsoup(requests.get("https://en.wikipedia.org/wiki/2012_nfl_draft").content, "html.parser") table = soup.select_one("table.wikitable.sortable") row in table.select("tr + tr"): tds=row.text.split(os.linesep) print tds
yields
[u'', u'', u'1', u'1', u'indianapolis colts', u'luck, andrewandrew luck\xa0\u2020', u'qb', u'stanford', u'pac-12', u'', u''] [u'', u'', u'1', u'2', u'washington redskins', u'griffin iii, robertrobert griffin iii\xa0\u2020', u'qb', u'baylor', u'big 12', u'from st. louis\xa0[r1 - 1];', u'2011 heisman trophy winner\xa0[n 2]', u''] [u'', u'', u'1', u'3', u'cleveland browns', u'richardson, trenttrent richardson\xa0', u'rb', u'alabama', u'sec', u'from minnesota\xa0[r1 - 2]', u''] [u'', u'', u'1', u'4', u'minnesota vikings', u'kalil, mattmatt kalil\xa0\u2020', u'ot', u'usc', u'pac-12', u'from cleveland\xa0[r1 - 3]', u''] [u'', u'', u'1', u'5', u'jacksonville jaguars', u'blackmon, justinjustin blackmon\xa0', u'wr', u'oklahoma state', u'big 12', u'from tampa bay\xa0[r1 - 4]', u''] ...
hth dtk
edit: can .splitlines()
have python handle newlines correctly. saves os
import well.
Comments
Post a Comment