ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a text file (CSV) which has missing ranges and overlapping ranges. What I am looking for is the easiest method to find these gaps and overlapping ranges from the command line so I can use in another script.
Below is an example of a missing range
0,3
4,10
11,14
17,23
As you can see I am missing 15,16
Below is an example of an overlapping range
0,3
4,10
8,10
11,14
As you can see 8,10 is overlapping with 4,10 or 4,10 is overlapping with 8,10
What I would like to do is have a script to output the missing ranges to STDOUT and another to output the overlapping to STDOUT.
Has anyone done anything similar or can point me along the right path.
For homework questions, please make a real try at it yourself and ask specific questions about specific details on which you get stuck. Don't ask us to do your homework for you.
Meanwhile: You showed the input sorted. Is the actual input sorted? In that case the problem seems trivial.
If the actual input isn't sorted, it would be easiest to first sort it, then the rest is trivial. I suggest sorting primarily by the first number of the range and in case of a tie by the second number.
hhmm thanks for that. Its not a homework question. I don't know the best way to go around tackling this. I have created a php script to look for the missing ranges and this works but is slow. I was hoping not to load all data into an array and then process it as this could and will lead to memory issues. I was looking to see if anyone can point me in the path of a better solution using sed or awk or something along those lines.
I was hoping not to load all data into an array and then process it as this could and will lead to memory issues. I was looking to see if anyone can point me in the path of a better solution using sed or awk or something along those lines.
You still didn't say whether it starts out sorted.
If you want to get the job done (rather than turn in a homework answer) and you don't want to read all the data into an array, then your shell script should call a sorting program before it calls your processing program, so the data will get pre sorted.
As you read the sorted data, remember the largest value seen so far.
When you read a pair, if the small value of the pair is less than or equal to the largest value already seen then you can report the overlap.
If the small value of the pair is more than one greater than the largest value seen you can report the gap.
If the large value of the pair is more than the largest value seen, you must adjust the largest value seen.
I was hoping something was already available as an option to sort or something within awk. Like I said before I already have a php script that I wrote to do this but it is rather clunky. My examples gave a simple way of seeing this data but in reality I have millions of lines of data to go through, hence the faster or better method.
So it looks like I will need to rewrite the php into bash to get the job done.
Oh and on re-reading my first post it does look somewhat like a homework question
I put together a quick one just for the practice, it only checks for missing ranges but you should be able to mod it for the second test
- return code is the error count
Gaps are easy.
How much info do you need on overlaps, especially multiple overlaps (or are those impossible?)
1,9
3,7
4,12
It's easy to say the 3,7 overlaps something and a bit harder to say it overlaps the 1,9.
It's easy to say the 4,9 portion of 4,12 overlaps something and harder to say it overlaps the 1,9 and much harder to say the 4,7 portion of it also overlaps the 3,7.
If you need complicated reporting of unlimited multiple overlaps, maybe you need a program in a more powerful language, such as C++.
If you need fewer details reported, the task is so simple that language shouldn't matter. Reading the file takes the time and processing it isn't significant.
I don't know if multiple overlaps are possible or not from the data. I know I have overlaps though.
I think it is easy to say the 3,7 overlaps 1,9 but harder to say 3,7 overlaps 4,12. If we assume I will be reading the file sequentially and then looking at the previous values.
I think I will have to analyze the data further to see how many overlaps I have before scripting something to report on the output. I have a feeling that my overlaps will be as simple as your 1,9 and 3,7 example, without the 4,12. I will analyze and see what I have.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.