LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-25-2024, 11:12 AM   #16
michaelk
Moderator
 
Registered: Aug 2002
Posts: 26,112

Rep: Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062Reputation: 6062

Code:
#!/usr/bin/python3

import openpyxl
from array import *

wb = openpyxl.load_workbook("test.xlsx")  
sheet = wb.active  
 # find where the data starts and delete the random text
for i, row in enumerate(sheet): 
    name = row[0].value 
    if name == "Action":
       sheet.delete_rows(1, i)
       break
col1=[]
col2=[]
for column in sheet.iter_cols(): 
    column_name = column[0].value 
    if column_name == "Dpt": 
       for i, cell in enumerate(column): 
           col1.append(cell.value) 
    if column_name == "Total":
       for i, cell in enumerate(column): 
           col2.append(cell.value) 
print(col1)
print(col2)
I was trying to come up with something more clever but I failed. This collects the data in two arrays and I will leave it up to you to do something with it.
 
Old 05-26-2024, 03:31 AM   #17
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,706

Rep: Reputation: Disabled
I'd try DataFrame.transpose() (or just DataFrame.T) then grepping for the lines that contain required data then transpose again.

An equivalent of grep in pandas could be achieved via list comprehension:
Code:
df.loc[ : , [(df[col].str.contains('Total|Team', case=False)).any() for col in df.columns]]
OTOH, you probably can directly do something like this on rows as well (i.e. df.index rather then df.columns) without transposition.

Last edited by shruggy; 05-26-2024 at 04:06 AM.
 
Old 05-29-2024, 03:46 PM   #18
lxs602
Member
 
Registered: Oct 2018
Distribution: Ubuntu 24.04
Posts: 50

Original Poster
Rep: Reputation: 1
I came up with the following after some hours reading bash tutorials:

Code:
#!/bin/env bash
# One-liner (without mkdir and file rename)
# for FILE in *.csv; do for i in $(seq $(csvcut -n "$FILE" | wc -l)); do csvcut -H -c $i "$FILE" > columns/$i; done && grep -H -m 1 -i "Total\|Team" columns/* | awk 'length < 50' | cut -d "/" -f 2 | cut -d ":" -f 1 | while read line; do csvcut -c $line "$FILE" > ./sheets/$line; done && for n in $(seq 9); do if [ -f $i ]; then mv $i 0$i; fi; done && paste sheets/* > "$FILE.temp"; done

mkdir -p columns
mkdir -p sheets
mkdir -p newcsv

# For each csv file in current directory
for FILE in *.csv; do
  # Find number of columns in current file using csvcut and wc
  for i in $(seq $(csvcut -n "$FILE" | wc -l)); do
    # Separate all columns into files into the column directory
    csvcut -H -c $i "$FILE" > columns/$i
  done

  ## Should probably use variables and separate the next line of code
  # a. For each column, search for terms 'Total' and 'Team', but exclude results > 50 characters as likely irrelevant comments
  # b. Cut the output to make a list of column numbers
  # c. For each line of the list, use csvcut to write matching columns to sheets directory
  grep -H -m 1 -i "Total\|Team" columns/* | awk 'length < 50' | cut -d "/" -f 2 | cut -d ":" -f 1 | while read line; do csvcut -c $line "$FILE" > ./sheets/$line; done

  # Prepend a zero to files named as integers <= 9; e.g. 3 -> 03; so that paste joins columns in order
  # (Assume there must be a better way than having to do this)
  for n in $(seq 9); do if [ -f $n ]; then mv $n 0$n; fi; done

  # Join columns together
  paste sheets/* > "$FILE.temp"

done

# Rename files and move to new directory
for file in *.csv.temp; do
  mv -- "$file" "${file%.csv.temp}_filtered.csv"
done

mv *_filtered.csv newcsv


Quote:
Originally Posted by michaelk View Post
Code:
#!/usr/bin/python3

import openpyxl
from array import *

wb = openpyxl.load_workbook("test.xlsx")  
sheet = wb.active  
 # find where the data starts and delete the random text
for i, row in enumerate(sheet): 
    name = row[0].value 
    if name == "Action":
       sheet.delete_rows(1, i)
       break
col1=[]
col2=[]
for column in sheet.iter_cols(): 
    column_name = column[0].value 
    if column_name == "Dpt": 
       for i, cell in enumerate(column): 
           col1.append(cell.value) 
    if column_name == "Total":
       for i, cell in enumerate(column): 
           col2.append(cell.value) 
print(col1)
print(col2)
I was trying to come up with something more clever but I failed. This collects the data in two arrays and I will leave it up to you to do something with it.
Thanks. I will study this tomorrow.

Last edited by lxs602; 05-31-2024 at 09:03 AM.
 
  


Reply

Tags
beginner, python


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
transpose selected groups of rows of columns into multiple columns (arrays) with other columns retained to single row-group TempleStone4510! Linux - General 15 08-22-2022 06:52 PM
LXer: Select Pandas Dataframe Rows And Columns Using iloc loc and ix LXer Syndicated Linux News 0 11-28-2019 12:55 PM
LXer: Python explosion blamed on pandas LXer Syndicated Linux News 0 09-16-2017 07:07 PM
[SOLVED] sum a value of column based on the value of other columns umix Linux - Newbie 5 04-11-2013 03:15 PM
difference between value *value and value * value PoleStar Linux - Newbie 1 11-26-2010 03:37 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration