LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-07-2012, 10:33 AM   #1
Jajamd
Member
 
Registered: Aug 2004
Posts: 37

Rep: Reputation: 0
Read variable from wordlist within a script


Hello everyone,

A newbie requests your help.
I'm working on the creation of authentic wordlists as part of my linguistic studies. I chanced upon a script the other day to make wordlists from twitter:

#!/bin/bash

wget -q "http://search.twitter.com/search.json?q=$1&rpp=1000"

cat search.json* | tr "," \\n | grep "^\"text" | cut -d"\"" -f4- | tr " " \\n | sed s/\"//g | sed s/\^\#//g | sed s/\^\@//g | grep -v "^http:" | grep -v "\\\\" | sort | uniq > $1.txt

rm -f search.json*


The thing is that I would like to read $1 from a text file containing a list of predefined words, and not having to specify it manually.

Any idea how I could do that?

Thank you very much !
 
Old 06-07-2012, 10:37 AM   #2
Ian John Locke II
Member
 
Registered: Mar 2008
Location: /dev/null
Distribution: Slackware, Android, Slackware64
Posts: 130

Rep: Reputation: 17
I saw the same script and it wasn't for linguistic studies. If I remember correctly, it was to build a wordlist to attempt to crack some passwords of twitter users with specified interests. As such, I'm pretty sure that kind of content is not taken kindly to here.
 
Old 06-07-2012, 10:41 AM   #3
Jajamd
Member
 
Registered: Aug 2004
Posts: 37

Original Poster
Rep: Reputation: 0
Well, I guess linguists and crackers use the same tools. That makes sense. In all honesty, I'm not trying to crack twitter accounts. I'm trying to establish a list of authentic words related to a specific field of interest and analyze them. I've done it for wikipedia, and since I saw it was possible for twitter, then I thought why not give it a try... That's all.
 
Old 06-07-2012, 11:03 AM   #4
Ian John Locke II
Member
 
Registered: Mar 2008
Location: /dev/null
Distribution: Slackware, Android, Slackware64
Posts: 130

Rep: Reputation: 17
Well to automate the reading in of lines you would do:

Code:
while read -r line; do
    # Do whatever with $line, e.g.,
    echo $line
done < filename
If of course you're not a linguist, I disclaim any responsibility since a simple google search would have sufficed.
 
1 members found this post helpful.
Old 06-07-2012, 11:08 AM   #5
Jajamd
Member
 
Registered: Aug 2004
Posts: 37

Original Poster
Rep: Reputation: 0
Thank you Ian. Rest assured, I'm not trying to crack accounts of any kind.
 
Old 06-07-2012, 12:41 PM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
That's a very ugly bit of code, by the way. It could certainly be replaced by a single, and much more efficient, awk command. If we had an example of the input text to work (and what needs to be extracted from it) with we might give it a try.
 
Old 06-07-2012, 01:54 PM   #7
Ian John Locke II
Member
 
Registered: Mar 2008
Location: /dev/null
Distribution: Slackware, Android, Slackware64
Posts: 130

Rep: Reputation: 17
Quote:
Originally Posted by David the H. View Post
That's a very ugly bit of code, by the way. It could certainly be replaced by a single, and much more efficient, awk command. If we had an example of the input text to work (and what needs to be extracted from it) with we might give it a try.
Not to be inflamatory, but how difficult would it be to run:

Code:
wget -q "http://search.twitter.com/search.json?q=$1&rpp=1000"
with any random search term you might think of instead of $1?

For example, I used food to get this: http://sprunge.us/DZLL
 
Old 06-07-2012, 02:50 PM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Perhaps something like
Code:
wget -q -O - 'http://search.twitter.com/search.json?q='"search"'&rpp=1000' | awk '#
    BEGIN {
        RS = "\"text\":\""
        FS = "\""
    }

    (NF > 1 && length($1) > 0) {
        n = split($1, temp, /[\t\n\v\f\r ]+/)
        for (i = 1; i <= n; i++) {
            w = tolower(temp[i]);

            gsub(/[-!?.,_:;$\047()<>+]+/, "", w)

            if (w ~ /^[@#]/) continue
            if (w ~ /[0-9]/) continue
            if (w ~ /^https*:/) continue
            if (w ~ /^ftps*:/) continue
            if (w ~ /^www\./) continue
            if (w ~ /[.\/].*[.\/]/) continue

            word[w]++
        }
    }

    END {
        for (w in word)
            printf("%s %d\n", w, word[w])
    }' | sort
Each line contains one word, followed by its frequency (count) as an integer.

The middle of the snippet filters out unwanted words; URLs, hashtags, targets. The gsub() removes typical punctuation.

Last edited by Nominal Animal; 06-07-2012 at 02:51 PM.
 
Old 06-07-2012, 03:17 PM   #9
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by Ian John Locke II View Post
Not to be inflamatory, but how difficult would it be to run: [wget]
Yeah, I could do that. I wasn't paying enough attention.

Anyway, it's late... <looks out at bright yellow thing shining through window> err... early, and I'm too tired right now. I'm getting some sleep.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with script writing: Storing cmd in variable, print variable, then exe cmds Arodef Programming 3 01-17-2012 12:26 AM
A script to manipulate a wordlist Jajamd Programming 7 12-03-2011 09:15 AM
Bash: How do I read data from a variable while that variable is being populated? theaceoffire Programming 4 04-23-2010 02:29 PM
Cannot read text from text file and store it in a variable using shell script anurupr Linux - Newbie 2 03-03-2010 01:38 PM
Challenge in Bash script with read variable manya Programming 2 07-26-2009 11:00 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration