multiple conditional substring function

**SioSio** · Jul 8 '20, 02:34 AM

If you just want to fix the error in this code:

Code:

import pandas as pd

data = {'name': ['John', 'Aaron', 'Anie', 'Nancy', 'Steve'],
        'Gender': ['00M00','00M00','00F00','00F00','00x00'],
        'Dept': ['01MK00', '02FN00', '03LG00', '04HR00', '05DR00']}
df = pd.DataFrame(data, columns = ['name', 'Gender', 'Dept'])
df


var=[]

for i in df["Gender"]:
    for x in df["Dept"]:

        if i[2].lower() in 'm' and x[2:4].lower() in 'mk':
            var.append('Male in Marketing')
        elif i[2].lower()in 'f' and x[2:4].lower() in 'fn':
            var.append('Female in Finance')
        else:
            var.append('Others')

**ck25python** · Jul 8 '20, 03:47 AM

Hi There,

Thanks for this,

However, still, I am getting the following error after running the above code:

Is there any better way to enhance the code to get the right output.

Code:


var=[]
 
for i in df["Gender"]:
    for x in df["Dept"]:
 
        if i[2].lower() in 'm' and x[2:4].lower() in 'mk':
            var.append('Male in Marketing')
        elif i[2].lower()in 'f' and x[2:4].lower() in 'fn':
            var.append('Female in Finance')
        else:
            var.append('Others')

df["new_col"]=var
df.head()

[B]Error message below[/B]


ValueError                                Traceback (most recent call last)
<ipython-input-93-dd3e254bfbaf> in <module>
----> 1 df["new_col"]=var
      2 df.head(5)

H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   2936         else:
   2937             # set column
-> 2938             self._set_item(key, value)
   2939 
   2940     def _setitem_slice(self, key, value):

H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   2998 
   2999         self._ensure_valid_index(value)
-> 3000         value = self._sanitize_column(key, value)
   3001         NDFrame._set_item(self, key, value)
   3002 

H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   3634 
   3635             # turn me into an ndarray
-> 3636             value = sanitize_index(value, self.index, copy=False)
   3637             if not isinstance(value, (np.ndarray, Index)):
   3638                 if isinstance(value, list) and len(value) > 0:

H:\Softwares\PythonSoftware\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index, copy)
    609 
    610     if len(data) != len(index):
--> 611         raise ValueError("Length of values does not match length of index")
    612 
    613     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index

**SioSio** · Jul 8 '20, 05:32 AM

Error Message: "Value length does not match index length"

The array size of df is 5, but var is 5x5 = 25.

**ck25python** · Jul 8 '20, 06:11 AM

Hi There,

Is there any workaround to satisfy the above condition.

**SioSio** · Jul 8 '20, 07:17 AM

It can use the built-in function zip() to get the values of multiple columns at once.

Code:

for Gender, Dept in zip(df['Gender'], df['Dept']):
    if Gender[2].lower() in 'm' and Dept[2:4].lower() in 'mk':
        var.append('Male in Marketing')
    elif Gender[2].lower()in 'f' and Dept[2:4].lower() in 'fn':
        var.append('Female in Finance')
    else:
        var.append('Others')

**ck25python** · Jul 8 '20, 07:28 AM

HI SioSio,

Thanks for the advice and help with this.

Kind regards,
CK

**markelvy** · Jul 5 '21, 06:22 AM

The ValueError: Length of values does not match length of index raised because the previous columns you have added in the DataFrame are not the same length as the most recent one you have attempted to add in the DataFrame. So, you need make sure that the length of the array you are assign to a new column is equal to the length of the dataframe .

The simple solution is that you first convert the list/array to a pandas Series , and then when you do assignment, missing index in the Series will be filled with NaN values .

Code:

df = pd.DataFrame({'X': [1,2,3,4]})
df['Y'] = pd.Series([3,4])

multiple conditional substring function

multiple conditional substring function

Comment

Comment

Comment

Comment

Comment

Comment

Comment