BeautifulSoup extract certain information after located text?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • razodiac
    New Member
    • Aug 2021
    • 1

    BeautifulSoup extract certain information after located text?

    From html page:

    Code:
    <div class="peoples-info">
    <ul>
    <li><strong>Gender:</strong> F</li>
    <li><strong>Birthdate:</strong> 00/00/2000</li>
    <li><strong>Family Phone:</strong> 000-000-0000</li>
    <li><strong>Personal Phone:</strong> 000-000-0000</li>
    </ul>
    </div>
    </div>
    <div>
    I wanted to extract using BeautifulSoup's find_next function, but I could only do tables such as:

    Code:
    for gender in soup.find('td', text='gender:'):
        print(gender.find_next("td").text)
    Which does not work with div when I replace "td" with "li"; also, title and number are in the same line with only the format changed a bit. Is there a way to extract only information such as phone numbers and birthdays without their titles ("000-000-0000")? Thanks!
  • SioSio
    Contributor
    • Dec 2019
    • 272

    #2
    This is a brute force way,
    Code:
    peoplesinfo = soup.find('div', class_='peoples-info')
    for element in peoplesinfo.find_all("li"):
        el = element.find_all("strong")
        print(element.text.replace(el[0].text, ''))

    Comment

    Working...