multiple conditional substring function

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ck25python
    New Member
    • Jan 2020
    • 20

    multiple conditional substring function

    Hi There,

    I am new to python so please be kind to me.
    I am learning the substring function, so I am just trying different cases to get familiar.

    I have managed to get the output for a single substring function. However when I apply multiple conditions then I get the following error. Appreciate if you guys educate me on this.

    Code:
    data = {'name': ['John', 'Aaron', 'Anie', 'Nancy', 'Steve'], 
            'Gender': ['00M00','00M00','00F00','00F00','00x00'], 
            'Dept': ['01MK00', '02FN00', '03LG00', '04HR00', '05DR00']}
    df = pd.DataFrame(data, columns = ['name', 'Gender', 'Dept'])
    df
    
    
    var=[]
    
    for i in df["Gender"]:
    for x in df["Dept"]:
        
        if i[2].lower()=='m' & x[2:4].lower()=='mk':
            var.append('Male in Marketing')
            
        elif i[2].lower()=='f' & x[2:4].lower()=='fn':
            var.append('Female in Finance')
            
        else:
            var.append('Others')
    
    [B]Error message below[/B]
      File "<ipython-input-79-ff06a7e562be>", line 4
        for x in df["Dept"]:
          ^
    IndentationError: expected an indented block
    Regards,
    CK
  • SioSio
    Contributor
    • Dec 2019
    • 272

    #2
    If you just want to fix the error in this code:
    Code:
    import pandas as pd
    
    data = {'name': ['John', 'Aaron', 'Anie', 'Nancy', 'Steve'],
            'Gender': ['00M00','00M00','00F00','00F00','00x00'],
            'Dept': ['01MK00', '02FN00', '03LG00', '04HR00', '05DR00']}
    df = pd.DataFrame(data, columns = ['name', 'Gender', 'Dept'])
    df
    
    
    var=[]
    
    for i in df["Gender"]:
        for x in df["Dept"]:
    
            if i[2].lower() in 'm' and x[2:4].lower() in 'mk':
                var.append('Male in Marketing')
            elif i[2].lower()in 'f' and x[2:4].lower() in 'fn':
                var.append('Female in Finance')
            else:
                var.append('Others')

    Comment

    • ck25python
      New Member
      • Jan 2020
      • 20

      #3
      Hi There,

      Thanks for this,

      However, still, I am getting the following error after running the above code:

      Is there any better way to enhance the code to get the right output.

      Code:
      
      var=[]
       
      for i in df["Gender"]:
          for x in df["Dept"]:
       
              if i[2].lower() in 'm' and x[2:4].lower() in 'mk':
                  var.append('Male in Marketing')
              elif i[2].lower()in 'f' and x[2:4].lower() in 'fn':
                  var.append('Female in Finance')
              else:
                  var.append('Others')
      
      df["new_col"]=var
      df.head()
      
      [B]Error message below[/B]
      
      
      ValueError                                Traceback (most recent call last)
      <ipython-input-93-dd3e254bfbaf> in <module>
      ----> 1 df["new_col"]=var
            2 df.head(5)
      
      H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
         2936         else:
         2937             # set column
      -> 2938             self._set_item(key, value)
         2939 
         2940     def _setitem_slice(self, key, value):
      
      H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
         2998 
         2999         self._ensure_valid_index(value)
      -> 3000         value = self._sanitize_column(key, value)
         3001         NDFrame._set_item(self, key, value)
         3002 
      
      H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
         3634 
         3635             # turn me into an ndarray
      -> 3636             value = sanitize_index(value, self.index, copy=False)
         3637             if not isinstance(value, (np.ndarray, Index)):
         3638                 if isinstance(value, list) and len(value) > 0:
      
      H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index, copy)
          609 
          610     if len(data) != len(index):
      --> 611         raise ValueError("Length of values does not match length of index")
          612 
          613     if isinstance(data, ABCIndexClass) and not copy:
      
      ValueError: Length of values does not match length of index

      Comment

      • SioSio
        Contributor
        • Dec 2019
        • 272

        #4
        Error Message: "Value length does not match index length"

        The array size of df is 5, but var is 5x5 = 25.

        Comment

        • ck25python
          New Member
          • Jan 2020
          • 20

          #5
          Hi There,

          Is there any workaround to satisfy the above condition.

          Comment

          • SioSio
            Contributor
            • Dec 2019
            • 272

            #6
            It can use the built-in function zip() to get the values ​​of multiple columns at once.

            Code:
            for Gender, Dept in zip(df['Gender'], df['Dept']):
                if Gender[2].lower() in 'm' and Dept[2:4].lower() in 'mk':
                    var.append('Male in Marketing')
                elif Gender[2].lower()in 'f' and Dept[2:4].lower() in 'fn':
                    var.append('Female in Finance')
                else:
                    var.append('Others')

            Comment

            • ck25python
              New Member
              • Jan 2020
              • 20

              #7
              HI SioSio,

              Thanks for the advice and help with this.

              Kind regards,
              CK

              Comment

              • markelvy
                New Member
                • Jul 2021
                • 1

                #8
                The ValueError: Length of values does not match length of index raised because the previous columns you have added in the DataFrame are not the same length as the most recent one you have attempted to add in the DataFrame. So, you need make sure that the length of the array you are assign to a new column is equal to the length of the dataframe .

                The simple solution is that you first convert the list/array to a pandas Series , and then when you do assignment, missing index in the Series will be filled with NaN values .

                Code:
                df = pd.DataFrame({'X': [1,2,3,4]})
                df['Y'] = pd.Series([3,4])
                Last edited by Niheel; Jul 5 '21, 07:28 AM. Reason: link marketing

                Comment

                Working...