LinuxQuestions.org - python merge lines

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - python merge lines (https://www.linuxquestions.org/questions/programming-9/python-merge-lines-4175726701/)

python merge lines

I have to use python because it will be part of existing python code.

Example input

Quote:

[
['dev1', 'devType', 'x1']
['dev1', 'devType', 'x2']
['dev1', 'devType', 'x3']
['dev2', 'devType', 'y1']
['dev2', 'devType', 'y2']
['dev2', 'devType', 'y3']
['dev2', 'devType', 'y4']
['dev2', 'devType', 'y5']
['dev3']
['dev4']
['dev5', 'devType', 'z1']
['dev5', 'devType', 'z2']

]

Desired output

Quote:

[
['dev1', 'devType x1:x2:x3']
['dev2', 'devType y1:y2:y3:y4:y5']
['dev3']
['dev4']
['dev5', 'devType z1:z2']
]

The concept seems simple, just combine dev#s that have devTypes, but the solution eludes me.

Below is the monstrosity that I tried and it fails miserably. In the below example 'regressionList' corresponds to the above sample input.

Code:

def writeTest(regressionList):

  # extract all cellNames

  cellsToPlace = []

  for test in regressionList:

    if test[0] not in cellsToPlace:

      cellsToPlace.append(test[0])



  for cell in cellsToPlace:

    devTypes = []

    for test in regressionList:

      if test[0] == cell and len(test) > 1:

        devTypes.append(test[2])

        print(cell,devTypes)

Use a defaultdict where the key is the first column and the default value is an empty list.

Quote:

Originally Posted by dugan (Post 6440629)

Use a defaultdict where the key is the first column and the default value is an empty list.

I'm sure there's something obvious to you in this advice but I have no idea how this moves me toward my goal. I do not know how to build a defaultdict from the input.

This looks suspiciously like a job interview question. But whatever:

Code:

import collections

import pprint



original_rows = [

    ["dev1", "devType", "x1"],

    ["dev1", "devType", "x2"],

    ["dev1", "devType", "x3"],

    ["dev2", "devType", "y1"],

    ["dev2", "devType", "y2"],

    ["dev2", "devType", "y3"],

    ["dev2", "devType", "y4"],

    ["dev2", "devType", "y5"],

    ["dev3"],

    ["dev4"],

    ["dev5", "devType", "z1"],

    ["dev5", "devType", "z2"],

]



new_rows = collections.defaultdict(list)



# column 1

for original_row in original_rows:

    if original_row[0] not in new_rows:

        new_rows[original_row[0]].append(original_row[0])



# It looks like this now:

"""

defaultdict(<class 'list'>,

            {'dev1': ['dev1'],

            'dev2': ['dev2'],

            'dev3': ['dev3'],

            'dev4': ['dev4'],

            'dev5': ['dev5']})

"""

# column 2

for original_row in original_rows:

    if len(original_row) > 1 and original_row[0] in new_rows:

        if len(new_rows[original_row[0]]) == 1:

            new_rows[original_row[0]].append(original_row[1])





# Now it looks like this:

"""

defaultdict(<class 'list'>,

            {'dev1': ['dev1', 'devType'],

            'dev2': ['dev2', 'devType'],

            'dev3': ['dev3'],

            'dev4': ['dev4'],

            'dev5': ['dev5', 'devType']})

"""



# And column 3 now

for original_row in original_rows:

    if original_row[0] in new_rows and len(original_row) == 3 and len(new_rows[original_row[0]]) == 2:

        new_rows[original_row[0]][1] += ":" + original_row[2]

    

# And now it's like this:

"""

defaultdict(<class 'list'>,

            {'dev1': ['dev1', 'devType:x1:x2:x3'],

            'dev2': ['dev2', 'devType:y1:y2:y3:y4:y5'],

            'dev3': ['dev3'],

            'dev4': ['dev4'],

            'dev5': ['dev5', 'devType:z1:z2']})

"""



# So, for the final result:



pprint.pprint(list(new_rows.values()))



# That prints:

"""

[['dev1', 'devType:x1:x2:x3'],

 ['dev2', 'devType:y1:y2:y3:y4:y5'],

 ['dev3'],

 ['dev4'],

 ['dev5', 'devType:z1:z2']]

"""

That works just as advertised.

Not an interview question - but the concern is understood. It is probably clear that I would never qualify for even an entry level python programming position. I work for a semiconductor design company and 95% of my work consist of SKILL programming for pcell development.

Python was needed for this particular project to extract data from an excel file. SKILL has no built in functions for handling excel spreadsheets so I decided to use python to extract the required data from the excel file and write the result to a csv file.

The final csv file would look like this.

Quote:

dev1, devType x1:x2:x3
dev2, devType y1:y2:y3:y4:y5
dev3
dev4
dev5, devType z1:z2

Thanks for the help and my apologies for basically coaxing you into doing a bit of my job for me. :)

I did notice one small glitch.

There should be a space between 'devType' and the first x1,y1, or z1, not a colon.

I will attempt to modify the code to meet that spec. Will let you know how it goes.

Modified # column 2 to add a space after the input.

original: new_rows[original_row[0]].append(original_row[1])
change: new_rows[original_row[0]].append(original_row[1]+" ")

Modified # column 2 to place the colon after the input.
original: new_rows[original_row[0]][1] += ":" + original_row[2]
change: new_rows[original_row[0]][1] += original_row[2] + ":"

The final result still has a trailing ":" that is not needed.

Quote:

[['dev1', 'devType x1:x2:x3:'],
['dev2', 'devType y1:y2:y3:y4:y5:'],
['dev3'],
['dev4'],
['dev5', 'devType z1:z2:']]

I can't believe you manually tried to write a diff.