LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Query based Search Engine..Help required. (https://www.linuxquestions.org/questions/linux-newbie-8/query-based-search-engine-help-required-4175496515/)

mehreen124 02-28-2014 05:11 AM

Query based Search Engine..Help required.
 
Its very urgent and I have no idea how to even get started with it. If anyone can only explain it to me how to start and explain what am I supposed to do in this question.
Design a Query based Search Engine which can categorize email addresses
on basis of Domain name. Make a supposition that recognized domains
are (@gmail.com, @hotmail.com, @yahoo.com, @nu.edu.pk). Text files based report must
be generated reflecting:
You must use Functions for each requirement mentioned below. In total there should be atleast 4 functions.
 Percentage of Email Occurrences of unique domain. ( A duplicate email entry must be
counted once)
 Create a filter to remove duplicate entries from data set and store the result in a file
named “filtered_data.txt”.
 Separate files for each unique domain name consisting of emails belong to it.
 Set the permissions of files that only owner can read and write, also make these files hidden through script.

TenTenths 02-28-2014 06:36 AM

-Homework Assignment Alert-

Why not ask your tutor for clarification on the requirements?

Other than that, yes, it all looks like it can be done, however as you don't tell us what you can / can't use, what platform it is, whether you have access to a database or are having to do this all with "flat" files and BASH shell scripts we can't really help.

As with your other thread, post what you've done and we might be able to point you in the right direction.

Oh, and "It's very urgent", it may be for you, but as we all offer our time and experience for free it's not particularly urgent for us.

mehreen124 02-28-2014 06:52 AM

Using flat files and shell scripting. I just need some direction as to how am i supposed to do this task
Thankyou in advance for any any help offered.

TenTenths 02-28-2014 07:02 AM

Query based Search Engine..Help required.
 
Thing - Command
Searching - grep
Sorting - sort
Counting - wc -l
Splitting line into fields - awk or cut if they are fixed width
Filtering - uniq
Piping output through different commands - |

A lot depends on the source data file format, the above commands should give you a start.

szboardstretcher 02-28-2014 07:18 AM

Quote:

Originally Posted by TenTenths (Post 5126424)
Thing - Command
Searching - grep
Sorting - sort
Counting - wc -l
Splitting line into fields - awk or cut if they are fixed width
Filtering - uniq
Piping output through different commands - |

A lot depends on the source data file format, the above commands should give you a start.

Looks to me like they are looking for a pure bash script that does this via functions.

Habitual 02-28-2014 07:35 AM

Here's something that can perhaps be modified,

Good luck.

mehreen124 02-28-2014 08:19 AM

if I have a file which is full of email id's of various domains and I want to use that file to calculate the percentage of Email Occurrences then how am I supposed to link that file to the function where I actually calculate the percentage of unique occurrence? how can I make checks on the data in that .txt file?

szboardstretcher 02-28-2014 08:27 AM

Have you been paying attention in class? Doing the labs and such?

Have you even tried googling for this information?

We are not here to do your homework step by step for you. If you have a problem, either research it yourself, or post the code you have tried writing that is giving you the problem.

mehreen124 02-28-2014 08:34 AM

I have never been using any such forums before and doing all my work by myself. I never asked for the solution to my question, only asked for a little guidance. We were not taught such things, thats y I had to ask and learn myself.
Anyways thankyou for all the help.

szboardstretcher 02-28-2014 08:47 AM

Thats righteous that you are doing the work by yourself to learn something. Everyone will agree.

But, you aren't trying the researching bit very hard, evidenced by this: I went to google and typed in part of your question, and the second result is an entire conversation about this exact thing. It took me less than 3 seconds.

http://lmgtfy.com/?q=calculate+the+p...nces+in+a+file

Again, if you need help with part of your code, post it and ask a question. Because it's easier to understand your question when you post the code related to the question.

TenTenths 02-28-2014 09:03 AM

Let's have a little play......

data.txt
Code:

abc@example.com,Mr,Alf,Bravo,"Ordered Cake"
abc@example2.com,Mrs,Anne,Bravo,"Ordered Coffee"
abc@example.com,Mr,Alf,Bravo,"Ordered Coffee"
aaa@example.com,Mrs,Aurora,Ardvark,"Ordered Coffee"
abc@example.com,Mr,Alf,Bravo,"Ordered Cake"

Count the number of unique e-mail addresses in the file
Code:

awk -F, {'print $1'} data.txt | sort | uniq | wc -l
Returns 3, as expected.

Removed complete duplicates from the file
Code:

sort dats.txt | uniq
Result:
Code:

aaa@example.com,Mrs,Aurora,Ardvark,"Ordered Coffee"
abc@example2.com,Mrs,Anne,Bravo,"Ordered Coffee"
abc@example.com,Mr,Alf,Bravo,"Ordered Cake"
abc@example.com,Mr,Alf,Bravo,"Ordered Coffee"

So, as you can see with the commands I gave you to look at you can process a data file in a number of ways.

Good luck and come back with what you're trying.

chrism01 03-01-2014 11:48 PM

If this is part of a class (which it sounds like), then your class notes/book should give you all the pieces you need. If they are not clear ask your instructor.

As above, show us your code; we are here to help, but not to give the soln.

You should read this http://rute.2038bug.com/index.html.gz and see also http://www.grymoire.com/Unix/Awk.html, http://www.grymoire.com/Unix/Sed.html.

grail 03-02-2014 01:26 AM

I am with chrism01, you say you are doing a course but it is expecting you to learn the course material on your own???

Either you have not been paying attention or you have been ripped off


All times are GMT -5. The time now is 01:21 AM.