splitting a long string into a list

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ronrsr

    splitting a long string into a list

    I have a single long string - I'd like to split it into a list of
    unique keywords. Sadly, the database wasn't designed to do this, so I
    must do this in Python - I'm having some trouble using the .split()
    function, it doesn't seem to do what I want it to - any ideas?

    thanks very much for your help.

    r-sr-


    longstring = 'Agricultural subsidies; Foreign aidAgriculture;
    Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
    Childhood Development, Birth Defects; Toxic ChemicalsAntibi otics,
    AnimalsAgricult ural Subsidies, Global TradeAgricultur al
    SubsidiesBiodiv ersityCitizen ActivismCommuni ty
    GardensCooperat ivesDietingAgri culture, CottonAgricultu re, Global
    TradePesticides , MonsantoAgricul ture, SeedCoffee, HungerPollution ,
    Water, FeedlotsFood PricesAgricultu re, WorkersAnimal Feed, Corn,
    PesticidesAquac ultureChemical
    WarfareCompostD ebtConsumerismF earPesticides, US, Childhood Development,
    Birth DefectsCorporat e Reform, Personhood (Dem. Book)Corporate Reform,
    Personhood, Farming (Dem. Book)Crime Rates, Legislation,
    EducationDebt, Credit CardsDemocracyP opulation, WorldIncomeDemo cracy,
    Corporate Personhood, Porter Township (Dem. Book)Disaster
    ReliefDwellings , SlumsEconomics, MexicoEconomy, LocalEducation,
    ProtestsEndange red Habitat, RainforestEndan gered SpeciesEndanger ed
    Species, Extinctionantib iotics, livestockAgricu ltural subsidies;
    Foreign aid;Agriculture ; Sustainable Agriculture - Support; Organic
    Agriculture; Pesticides, US, Childhood Development, Birth Defects;
    Toxic Chemicals;Antib iotics, Animals;Agricul tural Subsidies, Global
    Trade;Agricultu ral Subsidies;Biodi versity;Citizen Activism;Commun ity
    Gardens;Coopera tives;Dieting;A griculture, Cotton;Agricult ure, Global
    Trade;Pesticide s, Monsanto;Agricu lture, Seed;Coffee, Hunger;Pollutio n,
    Water, Feedlots;Food Prices;Agricult ure, Workers;Animal Feed, Corn,
    Pesticides;Aqua culture;Chemica l
    Warfare;Compost ;Debt;Consumeri sm;Fear;Pestici des, US, Childhood
    Development, Birth Defects;Corpora te Reform, Personhood (Dem.
    Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
    Legislation, Education;Debt, Credit Cards;Democracy ;Population,
    World;Income;De mocracy, Corporate Personhood, Porter Township (Dem.
    Book);Disaster Relief;Dwelling s, Slums;Economics , Mexico;Economy,
    Local;Education , Protests;Endang ered Habitat, Rainforest;Enda ngered
    Species;Endange red Species, Extinction;anti biotics,
    livestock;Pesti cides, Water;Environme nt, Environmentalis t;Food, Hunger,
    Agriculture, Aid, World, Development;Agr iculture, Cotton
    Trade;Agricultu re, Cotton, Africa;Environm ent, Energy;Fair Trade (Dem.
    Book);Farmland, Sprawl;Fast Food, Globalization, Mapping;depress ion,
    mental illness, mood disorders;Econo mic Democracy, Corporate
    Personhood;Braz il, citizen activism, hope, inspiration, labor
    issues;citizen activism, advice, hope;Pharmaceut icals, Medicine,
    Drugs;Community Investing;Envir onment, Consumer Waste Reduction,
    Consumer Behavior and Taxes;Hunger, US, Poverty;FERTILI TY,
    Women;Agricultu ral subsidies; Foreign aid;Agriculture ; Sustainable
    Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
    Development, Birth Defects; Toxic Chemicals;Antib iotics,
    Animals;Agricul tural Subsidies, Global Trade;Agricultu ral
    Subsidies;Biodi versity;Citizen Activism;Commun ity
    Gardens;Coopera tives;Dieting;A gricultural subsidies; Foreign
    aid;Agriculture ; Sustainable Agriculture - Support; Organic
    Agriculture; Pesticides, US, Childhood Development, Birth Defects;
    Toxic Chemicals;Antib iotics, Animals;Agricul tural Subsidies, Global
    Trade;Agricultu ral Subsidies;Biodi versity;Citizen Activism;Commun ity
    Gardens;Coopera tives;Dieting;A griculture, Cotton;Agricult ure, Global
    Trade;Pesticide s, Monsanto;Agricu lture, Seed;Coffee, Hunger;Pollutio n,
    Water, Feedlots;Food Prices;Agricult ure, Workers;Animal Feed, Corn,
    Pesticides;Aqua culture;Chemica l
    Warfare;Compost ;Debt;Consumeri sm;Fear;Pestici des, US, Childhood
    Development, Birth Defects;Corpora te Reform, Personhood (Dem.
    Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
    Legislation, Education;Debt, Credit Cards;'

  • Raphael

    #2
    Re: splitting a long string into a list

    What exactly seems to be the problem?


    "ronrsr" <ronrsr@gmail.c omwrote in message
    news:1164692082 .272978.66720@1 4g2000cws.googl egroups.com...
    >I have a single long string - I'd like to split it into a list of
    unique keywords. Sadly, the database wasn't designed to do this, so I
    must do this in Python - I'm having some trouble using the .split()
    function, it doesn't seem to do what I want it to - any ideas?
    >
    thanks very much for your help.
    >
    r-sr-

    Comment

    • Robert Kern

      #3
      Re: splitting a long string into a list

      ronrsr wrote:
      I have a single long string - I'd like to split it into a list of
      unique keywords. Sadly, the database wasn't designed to do this, so I
      must do this in Python - I'm having some trouble using the .split()
      function, it doesn't seem to do what I want it to - any ideas?
      Did you follow the recommendations given to you the last time you asked this
      question? What did you try? What results do you want to get?

      --
      Robert Kern

      "I have come to believe that the whole world is an enigma, a harmless enigma
      that is made terrible by our own mad attempt to interpret it as though it had
      an underlying truth."
      -- Umberto Eco

      Comment

      • ronrsr

        #4
        Re: splitting a long string into a list

        still having a heckuva time with this.

        here's where it stand - the split function doesn't seem to work the way
        i expect it to.


        longkw1,type(lo ngkw): Agricultural subsidies; Foreign
        aid;Agriculture ; Sustainable Agriculture - Support; Organic
        Agriculture; Pesticides, US, Childhood Development, Birth Defects;
        <type 'list'1

        longkw.replace( ',',';')

        Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
        Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
        Development


        kw = longkw.split("; ,") #kw is now a list of len 1

        kw,typekw= ['Agricultural subsidies; Foreign aid;Agriculture ;
        Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
        Childhood Development, Birth Defects; Toxic Chemicals;Antib iotics,
        Animals;Agricul tural Subsidies


        what I would like is to break the string into a list of the delimited
        words, but have had no luck doing that - I thought split wuld do that,
        but it doens't.

        bests,

        -rsr-


        Robert Kern wrote:
        ronrsr wrote:
        I have a single long string - I'd like to split it into a list of
        unique keywords. Sadly, the database wasn't designed to do this, so I
        must do this in Python - I'm having some trouble using the .split()
        function, it doesn't seem to do what I want it to - any ideas?
        >
        Did you follow the recommendations given to you the last time you asked this
        question? What did you try? What results do you want to get?
        >
        --
        Robert Kern
        >
        "I have come to believe that the whole world is an enigma, a harmless enigma
        that is made terrible by our own mad attempt to interpret it as though it had
        an underlying truth."
        -- Umberto Eco

        Comment

        • Tim Roberts

          #5
          Re: splitting a long string into a list

          "ronrsr" <ronrsr@gmail.c omwrote:
          >I have a single long string - I'd like to split it into a list of
          >unique keywords. Sadly, the database wasn't designed to do this, so I
          >must do this in Python - I'm having some trouble using the .split()
          >function, it doesn't seem to do what I want it to - any ideas?
          >
          >thanks very much for your help.
          >
          >r-sr-
          >
          >
          >longstring = 'Agricultural subsidies; Foreign aidAgriculture;
          >Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
          >Childhood Development, Birth Defects; Toxic ChemicalsAntibi otics,
          >AnimalsAgricul tural Subsidies, Global TradeAgricultur al
          >SubsidiesBiodi versityCitizen ActivismCommuni ty...
          What do you want out of this? It looks like there are several levels
          crammed together here. At first blush, it looks like topics separated by
          "; ", so this should get you started:

          topics = longstring.spli t("; ")
          --
          Tim Roberts, timr@probo.com
          Providenza & Boekelheide, Inc.

          Comment

          • John Machin

            #6
            Re: splitting a long string into a list


            ronrsr wrote:
            I have a single long string - I'd like to split it into a list of
            unique keywords. Sadly, the database wasn't designed to do this, so I
            must do this in Python - I'm having some trouble using the .split()
            function, it doesn't seem to do what I want it to - any ideas?
            >
            thanks very much for your help.
            >
            r-sr-
            >
            >
            longstring = 'Agricultural subsidies; Foreign aidAgriculture;
            Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
            [snip most of VERY long string]
            Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
            Legislation, Education;Debt, Credit Cards;'

            Hi ronster,

            As far as I recall, without digging in the archives:

            We would probably agree (if shown the schema) that the database wasn't
            designed. However it seems to have changed. Last time you asked, it
            was at least queryable and producing rows, each containing one column
            (a string of structure unknown to us and not divulged by you). You were
            given extensive advice: how to use split(), plus some questions to
            answer about the data e.g. the significance (if any) of semicolon
            versus comma. You were also asked about the SQL that was used. You were
            asked to explain what you meant by "keywords". All of those questions
            were asked so that we could understand your problem, and help you.
            Since then, nothing.

            Now you have what appears to be something like your previous results
            stripped of newlines and smashed together (are the newlines of no
            significance at all?), and you appear to be presenting it as a new
            problem.

            What's going on?

            Regards,
            John

            Comment

            • Peter Otten

              #7
              Re: splitting a long string into a list

              ronrsr wrote:
              still having a heckuva time with this.
              You don't seem to get it.
              here's where it stand - the split function doesn't seem to work the way
              i expect it to.
              >
              >
              longkw1,type(lo ngkw): Agricultural subsidies; Foreign
              aid;Agriculture ; Sustainable Agriculture - Support; Organic
              Agriculture; Pesticides, US, Childhood Development, Birth Defects;
              <type 'list'1
              >
              longkw.replace( ',',';')
              >>sample = "eat, drink; man, woman"
              >>sample.replac e(";", ",")
              'eat, drink, man, woman'
              >>sample
              'eat, drink; man, woman'

              Aha, Python doesn't replace in place, it creates a new string instead.
              Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
              Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
              Development
              >
              >
              kw = longkw.split("; ,") #kw is now a list of len 1
              >>sample = "eat+-drink+man-woman"
              >>sample.split( "+-")
              ['eat', 'drink+man-woman']
              >>sample.split( "+")
              ['eat', '-drink', 'man-woman']

              Aha, Python interprets the complete split() argument as the delimiter, not
              each of its characters.

              Do you think you can combine these two findings to make your code work? You
              will have to replace() first and then split().

              Peter

              Comment

              • Cameron Walsh

                #8
                Re: splitting a long string into a list

                ronrsr wrote:
                still having a heckuva time with this.
                >
                here's where it stand - the split function doesn't seem to work the way
                i expect it to.
                >
                >
                longkw1,type(lo ngkw): Agricultural subsidies; Foreign
                aid;Agriculture ; Sustainable Agriculture - Support; Organic
                Agriculture; Pesticides, US, Childhood Development, Birth Defects;
                <type 'list'1
                >
                longkw.replace( ',',';')
                >
                Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
                Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
                Development
                Here you have discovered that string.replace( ) returns a string and does
                NOT modify the original string. Try this for clarification:
                >>a="DAWWIJFWA, dwadw;djwkajdw"
                >>a
                'DAWWIJFWA,,,,, ,dwadw;djwkajdw '
                >>a.replace("," ,";")
                'DAWWIJFWA;;;;; ;dwadw;djwkajdw '
                >>a
                'DAWWIJFWA,,,,, ,dwadw;djwkajdw '
                >>b = a.replace(',',' ;')
                >>b
                'DAWWIJFWA;;;;; ;dwadw;djwkajdw '


                >
                >
                kw = longkw.split("; ,") #kw is now a list of len 1
                Yes, because it is trying to split longkw wherever it finds the whole
                string "; '" and NOT wherever it finds ";" or " " or ",". This has been
                stated before by NickV, Duncan Booth, Fredrik Lundh and Paul McGuire
                amongst others. You will need to do either:

                a.)

                # First split on every semicolon
                a = longkw.split("; ")
                b = []
                # Then split those results on whitespace
                #(the default action for string.split())
                for item in a:
                b.append(item.s plit())
                # Then split on commas
                kw = []
                for item in b:
                kw.append(item. split(","))

                or b.)

                # First replace commas with spaces
                longkw = longkw.replace( ",", " ")
                # Then replace semicolons with spaces
                longkw = longkw.replace( ";", " ")
                # Then split on white space, (default args)
                kw = longkw.split()


                Note that we did:
                longkw = longkw.replace( ",", " ")
                and not just:
                longkw.replace( ",", " ")


                You will find that method A may give empty strings as some elements of
                kw. If so, use method b.


                Finally, if you have further problems, please please do the following:

                1.) Provide your input data clearly, exactly as you have it.
                2.) Show exactly what you want the output to be, including any special
                cases.
                3.) If something doesn't work the way you expect it to, tell us how you
                expect it to work so we know what you mean by "doesn't work how I expect
                it to"
                4.) Read all the replies carefully and if you don't understand the
                reply, ask for clarification.
                5.) Read the help functions carefully - what the input parameters have
                to be and what the return value will be, and whether or not it changes
                the parameters or original object. Strings are usually NOT mutable so
                any functions that operate on strings tend to return the result as a new
                string and leave the original string intact.

                I really hope this helps,

                Cameron.

                Comment

                • Frederic Rentsch

                  #9
                  Re: splitting a long string into a list

                  ronrsr wrote:
                  still having a heckuva time with this.
                  >
                  here's where it stand - the split function doesn't seem to work the way
                  i expect it to.
                  >
                  >
                  longkw1,type(lo ngkw): Agricultural subsidies; Foreign
                  aid;Agriculture ; Sustainable Agriculture - Support; Organic
                  Agriculture; Pesticides, US, Childhood Development, Birth Defects;
                  <type 'list'1
                  >
                  longkw.replace( ',',';')
                  >
                  Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
                  Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
                  Development
                  >
                  >
                  kw = longkw.split("; ,") #kw is now a list of len 1
                  >
                  kw,typekw= ['Agricultural subsidies; Foreign aid;Agriculture ;
                  Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
                  Childhood Development, Birth Defects; Toxic Chemicals;Antib iotics,
                  Animals;Agricul tural Subsidies
                  >
                  >
                  what I would like is to break the string into a list of the delimited
                  words, but have had no luck doing that - I thought split wuld do that,
                  but it doens't.
                  >
                  bests,
                  >
                  -rsr-
                  >
                  >
                  >
                  >>Split_Marke r = SE.SE (' ,=| ;=| ') # Translates both ',' and
                  ';' into an arbitrary split mark ('|')
                  >>for item in Split_Marker (longstring).sp lit ('|'): print item
                  Agricultural subsidies
                  Foreign aidAgriculture
                  Sustainable Agriculture - Support
                  Organic Agriculture

                  .... etc.

                  To get rid of the the leading space on some lines simply add
                  corresponding replacements. SE does any number of substitutions in one
                  pass. Defining them is a simple matter of writing them up in one single
                  string from which the translator object is made:
                  >>Split_Marke r = SE.SE (' ,=| ;=| ", =|" "; =|" ')
                  >>for item in Split_Marker (longstring).sp lit ('|'): print item
                  Agricultural subsidies
                  Foreign aidAgriculture
                  Sustainable Agriculture - Support
                  Organic Agriculture


                  Regards

                  Frederic


                  Comment

                  Working...