trying achieve following logic:
if url in text surrounded paragraph tags (example: <p>url</p>), replace in place become link instead: <a href="url">click here</a>
the original file database dump (sql, utf-8). urls exist in desired format. need fix missing links.
i working on script, uses beautifulsoup. if other solutions make more sense (regex, etc.), open suggestions.
you can search p elements has text starting http. then, replace with link:
for elm in soup.find_all("p", text=lambda text: text , text.startswith("http")): elm.replace_with(soup.new_tag("a", href=elm.get_text())) example working code:
from bs4 import beautifulsoup data = """ <div> <p>http://google.com</p> <p>https://stackoverflow.com</p> </div> """ soup = beautifulsoup(data, "html.parser") elm in soup.find_all("p", text=lambda text: text , text.startswith("http")): elm.replace_with(soup.new_tag("a", href=elm.get_text())) print(soup.prettify()) prints:
<div> <a href="http://google.com"></a> <a href="https://stackoverflow.com"></a> </div> i can imagine approach break, should start you.
if additionally want add texts links, set .string property:
soup = beautifulsoup(data, "html.parser") elm in soup.find_all("p", text=lambda text: text , text.startswith("http")): = soup.new_tag("a", href=elm.get_text()) a.string = "link" elm.replace_with(a)
Comments
Post a Comment