LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-12-2021, 02:39 PM   #16
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484

Quote:
Originally Posted by MadeInGermany View Post
Have a loop and a "found" variable
Code:
wordfile=wordcount_file
txtfound=
for i in *.txt
do
  [ -f "$i" ] || continue
  wc -w "$i"
  txtfound=1
done > $wordfile
if [ -z "$txtfound" ]
then
  for filename in *.*
  do
    ext=${filename##*.}
    case $ext in
    docx)
      docx2txt "$filename"
    ;;
    odt)
      odt2txt "$filename" --output="$filename".txt
    ;;
    pdf)
      pdf2txt -o "$i".txt "$filename"
    esac
  done
  for i in *.txt
  do
    [ -f "$i" ] || continue
    wc -w "$i"
  done > $wordfile
fi
The redirection of the whole loop allows to overwrite the wordcount file ( >> would append).
Ummm, This has the same problem as the original posted by the OP. The third for loop counts the original *.txt files that were already counted in the first loop, so it does double counting.

It also seems to only enter the second for loop if there were no text files found and counted in the first for loop. My understanding is that there may be both text files and other documents so the OP wants to count both types.
 
1 members found this post helpful.
Old 07-12-2021, 02:50 PM   #17
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
I rewrote my proposed script, made only one for loop, and simplified the processing. It also writes both the file name processed and the count out. If the filenames are not necessary simply remove the echo statements.
Code:
#!/usr/bin/bash

wordfile=wordcount_file
if [ -f $wordfile ]; then
    tail $wordfile > $wordfile
fi

for filename in *.* 
do
    ext=${filename##*.}
    case "$ext" in
        docx)   
            echo "$filename" >> $wordfile
            docx2txt "$filename" | wc -w >> $wordfile
        ;;
        odt) 
            echo "$filename" >> $wordfile
            odt2txt "$filename"  | wc -w >> $wordfile
        ;;
        pdf) 
            echo "$filename" >> $wordfile
            pdf2txt "$filename" | wc -w >> $wordfile
        ;;
        txt)
            echo "$filename" >> $wordfile
            cat "$filename" | wc -w  >> $wordfile
        ;;
        *)
            continue
        ;;
    esac
 done
I tested it with txt, odt, and pdf files. Note that $filename is enclosed in quotes, as this allows it to process even filenames that contain spaces.

There are no extra .txt files created, simply counting the words in the existing docs.

Last edited by computersavvy; 07-12-2021 at 02:52 PM.
 
1 members found this post helpful.
Old 07-12-2021, 06:13 PM   #18
igadoter
Senior Member
 
Registered: Sep 2006
Location: wroclaw, poland
Distribution: many, primary Slackware
Posts: 2,717
Blog Entries: 1

Rep: Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625
I think you need find utility - do some action of .txt files. Globing inside script may yield strange behavior. Star * in find command is a pattern - not globing of file names. More or less. Say
Code:
$ find ./ -name '*.txt' -exec foo '{}' \;
foo is custom script to perform action on found file. Just read manual for find. There are many useful options. Just don't get custom to create poor scripts. Poorly designed.

Edit: I think you don't need any case. Conversion programs should detect file format. So this should work
Code:
$ pdf2txt || docx2txt || odt2txt
order depends on what kind of files are more frequent.

Last edited by igadoter; 07-12-2021 at 06:31 PM.
 
1 members found this post helpful.
Old 07-13-2021, 04:34 AM   #19
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,794

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
Quote:
Originally Posted by salmanahmed View Post
Please see the red-quoted text. In the beginning the variable "txtfound" is mentioned without any value, then later the value "1" is given to it. I am not sure but does "1" here means that the file is present?
How this "txtfound" works here?
Thanks
Yes, a 1 (not-empty) value means that a .txt file was found.
[ -z "$txtfound" ]
is true if the variable is empty (zero).

A correction:
Code:
    pdf2txt -o "$filename".txt "$filename"
I kept the intention in post #1, perhaps it needs a correction as well.

Last edited by MadeInGermany; 07-13-2021 at 04:47 AM.
 
1 members found this post helpful.
Old 07-13-2021, 12:00 PM   #20
salmanahmed
Member
 
Registered: Jun 2020
Posts: 158

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by igadoter View Post
I think you need find utility - do some action of .txt files. Globing inside script may yield strange behavior. Star * in find command is a pattern - not globing of file names. More or less. Say
Code:
$ find ./ -name '*.txt' -exec foo '{}' \;
foo is custom script to perform action on found file. Just read manual for find. There are many useful options. Just don't get custom to create poor scripts. Poorly designed.

Edit: I think you don't need any case. Conversion programs should detect file format. So this should work
Code:
$ pdf2txt || docx2txt || odt2txt
order depends on what kind of files are more frequent.
May be 'find' will also work in this situation (I am not sure), but just on the lighter note, I will reply by quoting a lyric of Daft Punk's song "Get Lucky":
Quote:
we've come too far to give up who we are
As a newbiew in bash scripting, I put up so much effort in this script that even the thought of re-writing it makes me tired. I will definitely rest for few days after completing this

Last edited by salmanahmed; 07-13-2021 at 12:10 PM.
 
Old 07-13-2021, 12:05 PM   #21
salmanahmed
Member
 
Registered: Jun 2020
Posts: 158

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by computersavvy View Post
I rewrote my proposed script, made only one for loop, and simplified the processing. It also writes both the file name processed and the count out. If the filenames are not necessary simply remove the echo statements.
Code:
#!/usr/bin/bash

wordfile=wordcount_file
if [ -f $wordfile ]; then
    tail $wordfile > $wordfile
fi

for filename in *.* 
do
    ext=${filename##*.}
    case "$ext" in
        docx)   
            echo "$filename" >> $wordfile
            docx2txt "$filename" | wc -w >> $wordfile
        ;;
        odt) 
            echo "$filename" >> $wordfile
            odt2txt "$filename"  | wc -w >> $wordfile
        ;;
        pdf) 
            echo "$filename" >> $wordfile
            pdf2txt "$filename" | wc -w >> $wordfile
        ;;
        txt)
            echo "$filename" >> $wordfile
            cat "$filename" | wc -w  >> $wordfile
        ;;
        *)
            continue
        ;;
    esac
 done
I tested it with txt, odt, and pdf files. Note that $filename is enclosed in quotes, as this allows it to process even filenames that contain spaces.

There are no extra .txt files created, simply counting the words in the existing docs.
No. it's not calculating the wordcount of all the files. The suggestions made by "MadeInGermany" worked perfectly. However, I must say that you also helped me a lot. I really appreciate that you've spared some of your precious time and look into my problem.
Thanks a lot buddy
 
Old 07-13-2021, 12:07 PM   #22
salmanahmed
Member
 
Registered: Jun 2020
Posts: 158

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by MadeInGermany View Post
Yes, a 1 (not-empty) value means that a .txt file was found.
[ -z "$txtfound" ]
is true if the variable is empty (zero).

A correction:
Code:
    pdf2txt -o "$filename".txt "$filename"
I kept the intention in post #1, perhaps it needs a correction as well.
Initially I was confused, but then "man test" helped me about "-z" option. After that everything was clear. Your suggestions solved my problem. Thanks a lot for sparing your precious time for me
 
Old 07-13-2021, 12:08 PM   #23
salmanahmed
Member
 
Registered: Jun 2020
Posts: 158

Original Poster
Rep: Reputation: Disabled
One last thing before closing the topic. Can you please recommend me some good books on bash programming for following levels:
1. Beginners level
2. Intermediate level
3. Advance level

Thanks
 
Old 07-13-2021, 12:22 PM   #24
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
This and this are very good tutorials, among many others found with a simple online search for "bash tutorial" or similar.

Last edited by computersavvy; 07-13-2021 at 12:32 PM.
 
1 members found this post helpful.
Old 07-13-2021, 12:55 PM   #25
salmanahmed
Member
 
Registered: Jun 2020
Posts: 158

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by computersavvy View Post
This and this are very good tutorials, among many others found with a simple online search for "bash tutorial" or similar.
Thanks a lot
 
Old 07-13-2021, 05:45 PM   #26
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,601

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546
Quote:
Originally Posted by salmanahmed View Post
May be 'find' will also work in this situation (I am not sure), but just on the lighter note, I will reply by quoting a lyric of Daft Punk's song "Get Lucky":
Quote:
we've come too far to give up who we are
...
That's not always a good approach (and not quite what they are advocating), so maybe you should take your inspiration from track twelve instead.


Anyway, not a book/tutorial, but ShellCheck is a really useful tool which can highlight (some) bugs and warn against potential issues.


Last edited by boughtonp; 07-13-2021 at 05:49 PM.
 
1 members found this post helpful.
Old 07-14-2021, 07:00 AM   #27
salmanahmed
Member
 
Registered: Jun 2020
Posts: 158

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by boughtonp View Post
Anyway, not a book/tutorial, but ShellCheck is a really useful tool which can highlight (some) bugs and warn against potential issues.
Great utility. Thanks a lot
 
Old 07-14-2021, 09:37 AM   #28
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,226

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
This is one of the newer BASH books. It has a good reputation.

https://linuxcommand.org/tlcl.php
 
1 members found this post helpful.
Old 07-14-2021, 11:10 AM   #29
igadoter
Senior Member
 
Registered: Sep 2006
Location: wroclaw, poland
Distribution: many, primary Slackware
Posts: 2,717
Blog Entries: 1

Rep: Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625Reputation: 625
Ok what about
Code:
DOCX=(*.docx)
ODT=(*.odt)
PDF=(*.pdf)

# correct conversion command format so they produce files with .txt suffix
for i in ${DOCX[@]} ; do docx2txt "$i" ; done 
for i in ${ODT[@]} ; do  odt2txt "$i$ ; done
for i in ${PDF[@]} ; do pdf2txt  "$i"; done

# the last
wc *.txt > total_word_count
Say
Code:
$ wc -w *.info > /tmp/total_word_info
$ cat /tmp/total_word_info 
  11 Jinja2.info
  11 MarkupSafe.info
  21 Sphinx.info
  11 alabaster.info
  11 imagesize.info
  11 mando.info
  15 python3-babel.info
  11 pytz.info
  11 snowballstemmer.info
  11 sphinxcontrib-applehelp.info
  11 sphinxcontrib-devhelp.info
  11 sphinxcontrib-htmlhelp.info
  11 sphinxcontrib-jsmath.info
  11 sphinxcontrib-qthelp.info
  11 sphinxcontrib-serializinghtml.info
 179 total
So you really don't want to go back? At least run what I posted.
 
1 members found this post helpful.
Old 07-14-2021, 12:45 PM   #30
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,794

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
You can and should quote the @ references in order to protect them from expansions (field splitting and filename generation).
Code:
DOCX=(*.docx)
for i in "${DOCX[@]}" ; do docx2txt "$i" ; done
Or directly feed the loop:
Code:
for i in *.docx; do docx2txt "$i" ; done
Field splitting: split at $IFS (normally whitespace).
Filename generation: expand wildcards like * with matching filenames.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] [bash] `test -n $VAR` : Too many arguments Michael Uplawski Programming 2 04-17-2018 12:45 AM
[SOLVED] Bash Script too Many Arguments error knowlgain Linux - Newbie 2 05-30-2016 08:36 PM
[SOLVED] Bash curdir throwing "too many arguments" tc60045 Programming 4 04-08-2013 12:30 PM
logname: no login name, -bash: [: too many arguments da_kidd_er Linux - General 1 10-27-2004 02:09 PM
BASH says "too many arguments" in terminal tmitch70377 Linux - Newbie 4 12-06-2003 05:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration