python - Anchors (<a href="URL">URL</a>) instead of text (<p>URL</p>) -


trying achieve following logic:

if url in text surrounded paragraph tags (example: <p>url</p>), replace in place become link instead: <a href="url">click here</a>

the original file database dump (sql, utf-8). urls exist in desired format. need fix missing links.

i working on script, uses beautifulsoup. if other solutions make more sense (regex, etc.), open suggestions.

you can search p elements has text starting http. then, replace with link:

for elm in soup.find_all("p", text=lambda text: text , text.startswith("http")):     elm.replace_with(soup.new_tag("a", href=elm.get_text())) 

example working code:

from bs4 import beautifulsoup  data = """ <div>     <p>http://google.com</p>     <p>https://stackoverflow.com</p> </div> """  soup = beautifulsoup(data, "html.parser") elm in soup.find_all("p", text=lambda text: text , text.startswith("http")):     elm.replace_with(soup.new_tag("a", href=elm.get_text()))  print(soup.prettify()) 

prints:

<div>   <a href="http://google.com"></a>   <a href="https://stackoverflow.com"></a> </div> 

i can imagine approach break, should start you.


if additionally want add texts links, set .string property:

soup = beautifulsoup(data, "html.parser") elm in soup.find_all("p", text=lambda text: text , text.startswith("http")):     = soup.new_tag("a", href=elm.get_text())     a.string = "link"     elm.replace_with(a) 

Comments