LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   python merge lines (https://www.linuxquestions.org/questions/programming-9/python-merge-lines-4175726701/)

sharky 07-06-2023 08:14 PM

python merge lines
 
I have to use python because it will be part of existing python code.

Example input
Quote:

[
['dev1', 'devType', 'x1']
['dev1', 'devType', 'x2']
['dev1', 'devType', 'x3']
['dev2', 'devType', 'y1']
['dev2', 'devType', 'y2']
['dev2', 'devType', 'y3']
['dev2', 'devType', 'y4']
['dev2', 'devType', 'y5']
['dev3']
['dev4']
['dev5', 'devType', 'z1']
['dev5', 'devType', 'z2']

]
Desired output
Quote:

[
['dev1', 'devType x1:x2:x3']
['dev2', 'devType y1:y2:y3:y4:y5']
['dev3']
['dev4']
['dev5', 'devType z1:z2']
]
The concept seems simple, just combine dev#s that have devTypes, but the solution eludes me.

Below is the monstrosity that I tried and it fails miserably. In the below example 'regressionList' corresponds to the above sample input.

Code:

def writeTest(regressionList):
  # extract all cellNames
  cellsToPlace = []
  for test in regressionList:
    if test[0] not in cellsToPlace:
      cellsToPlace.append(test[0])

  for cell in cellsToPlace:
    devTypes = []
    for test in regressionList:
      if test[0] == cell and len(test) > 1:
        devTypes.append(test[2])
        print(cell,devTypes)


dugan 07-06-2023 09:12 PM

Use a defaultdict where the key is the first column and the default value is an empty list.

sharky 07-06-2023 10:08 PM

Quote:

Originally Posted by dugan (Post 6440629)
Use a defaultdict where the key is the first column and the default value is an empty list.

I'm sure there's something obvious to you in this advice but I have no idea how this moves me toward my goal. I do not know how to build a defaultdict from the input.

dugan 07-06-2023 10:47 PM

This looks suspiciously like a job interview question. But whatever:

Code:

import collections
import pprint

original_rows = [
    ["dev1", "devType", "x1"],
    ["dev1", "devType", "x2"],
    ["dev1", "devType", "x3"],
    ["dev2", "devType", "y1"],
    ["dev2", "devType", "y2"],
    ["dev2", "devType", "y3"],
    ["dev2", "devType", "y4"],
    ["dev2", "devType", "y5"],
    ["dev3"],
    ["dev4"],
    ["dev5", "devType", "z1"],
    ["dev5", "devType", "z2"],
]

new_rows = collections.defaultdict(list)

# column 1
for original_row in original_rows:
    if original_row[0] not in new_rows:
        new_rows[original_row[0]].append(original_row[0])

# It looks like this now:
"""
defaultdict(<class 'list'>,
            {'dev1': ['dev1'],
            'dev2': ['dev2'],
            'dev3': ['dev3'],
            'dev4': ['dev4'],
            'dev5': ['dev5']})
"""
# column 2
for original_row in original_rows:
    if len(original_row) > 1 and original_row[0] in new_rows:
        if len(new_rows[original_row[0]]) == 1:
            new_rows[original_row[0]].append(original_row[1])


# Now it looks like this:
"""
defaultdict(<class 'list'>,
            {'dev1': ['dev1', 'devType'],
            'dev2': ['dev2', 'devType'],
            'dev3': ['dev3'],
            'dev4': ['dev4'],
            'dev5': ['dev5', 'devType']})
"""

# And column 3 now
for original_row in original_rows:
    if original_row[0] in new_rows and len(original_row) == 3 and len(new_rows[original_row[0]]) == 2:
        new_rows[original_row[0]][1] += ":" + original_row[2]
   
# And now it's like this:
"""
defaultdict(<class 'list'>,
            {'dev1': ['dev1', 'devType:x1:x2:x3'],
            'dev2': ['dev2', 'devType:y1:y2:y3:y4:y5'],
            'dev3': ['dev3'],
            'dev4': ['dev4'],
            'dev5': ['dev5', 'devType:z1:z2']})
"""

# So, for the final result:

pprint.pprint(list(new_rows.values()))

# That prints:
"""
[['dev1', 'devType:x1:x2:x3'],
 ['dev2', 'devType:y1:y2:y3:y4:y5'],
 ['dev3'],
 ['dev4'],
 ['dev5', 'devType:z1:z2']]
"""


sharky 07-07-2023 07:17 AM

That works just as advertised.

Not an interview question - but the concern is understood. It is probably clear that I would never qualify for even an entry level python programming position. I work for a semiconductor design company and 95% of my work consist of SKILL programming for pcell development.

Python was needed for this particular project to extract data from an excel file. SKILL has no built in functions for handling excel spreadsheets so I decided to use python to extract the required data from the excel file and write the result to a csv file.

The final csv file would look like this.

Quote:

dev1, devType x1:x2:x3
dev2, devType y1:y2:y3:y4:y5
dev3
dev4
dev5, devType z1:z2
Thanks for the help and my apologies for basically coaxing you into doing a bit of my job for me. :)

sharky 07-07-2023 07:40 AM

I did notice one small glitch.

There should be a space between 'devType' and the first x1,y1, or z1, not a colon.

I will attempt to modify the code to meet that spec. Will let you know how it goes.

sharky 07-07-2023 10:03 AM

Modified # column 2 to add a space after the input.

original: new_rows[original_row[0]].append(original_row[1])
change: new_rows[original_row[0]].append(original_row[1]+" ")

Modified # column 2 to place the colon after the input.
original: new_rows[original_row[0]][1] += ":" + original_row[2]
change: new_rows[original_row[0]][1] += original_row[2] + ":"

The final result still has a trailing ":" that is not needed.

Quote:

[['dev1', 'devType x1:x2:x3:'],
['dev2', 'devType y1:y2:y3:y4:y5:'],
['dev3'],
['dev4'],
['dev5', 'devType z1:z2:']]

dugan 07-07-2023 10:05 AM

I can't believe you manually tried to write a diff.


All times are GMT -5. The time now is 03:23 AM.