trying achieve following logic:
if url in text surrounded paragraph tags (example: <p>url</p>
), replace in place become link instead: <a href="url">click here</a>
the original file database dump (sql, utf-8). urls exist in desired format. need fix missing links.
i working on script, uses beautifulsoup. if other solutions make more sense (regex, etc.), open suggestions.
you can search p
elements has text starting http
. then, replace with link:
for elm in soup.find_all("p", text=lambda text: text , text.startswith("http")): elm.replace_with(soup.new_tag("a", href=elm.get_text()))
example working code:
from bs4 import beautifulsoup data = """ <div> <p>http://google.com</p> <p>https://stackoverflow.com</p> </div> """ soup = beautifulsoup(data, "html.parser") elm in soup.find_all("p", text=lambda text: text , text.startswith("http")): elm.replace_with(soup.new_tag("a", href=elm.get_text())) print(soup.prettify())
prints:
<div> <a href="http://google.com"></a> <a href="https://stackoverflow.com"></a> </div>
i can imagine approach break, should start you.
if additionally want add texts links, set .string
property:
soup = beautifulsoup(data, "html.parser") elm in soup.find_all("p", text=lambda text: text , text.startswith("http")): = soup.new_tag("a", href=elm.get_text()) a.string = "link" elm.replace_with(a)
Comments
Post a Comment