LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Getting Segmention fault error while searching string in csv file (https://www.linuxquestions.org/questions/linux-newbie-8/getting-segmention-fault-error-while-searching-string-in-csv-file-857179/)

saurabhmehan 01-18-2011 11:44 PM

Getting Segmention fault error while searching string in csv file
 
I am using grep command to search in a particular file whose size is 11 GB and i am getting Segmentation fault error as an output.

My command and output is as follows:
Code:

[sdpuser@gnnsdp40 test]$ cat new* | grep 8858406465
Segmentation fault

My linux version is as follows:
Code:

[sdpuser@gnnsdp40 test]$ uname -a
Linux gnnsdp40 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Please guide me how i can parse the complete file for searching string.
I have also used split command of linux which splits the file of 11 GB to 11 files of 1 GB each respectively.
But still getting the same "Segmentation fault" error while using grep.
Please guide me for the above.

Thanks in advance.

grail 01-18-2011 11:54 PM

What about if you use grep as intended:
Code:

grep 8858406465 new*
Are there potentially more files starting with new? Maybe try inputing just the filename of the individual file and see what happens?

saurabhmehan 01-19-2011 12:09 AM

Reply to Grail
 
Quote:

Originally Posted by grail (Post 4229656)
What about if you use grep as intended:
Code:

grep 8858406465 new*
Are there potentially more files starting with new? Maybe try inputing just the filename of the individual file and see what happens?

It will give no output and string is there for sure.
Following is the output shown on server:
Code:

[sdpuser@gnnsdp40 test]$ grep 8858406465 new*
[sdpuser@gnnsdp40 test]$


matthewg42 01-19-2011 12:17 AM

what is the output of:
Code:

ls -l new*

saurabhmehan 01-19-2011 12:28 AM

Reply to matthewg42
 
Quote:

Originally Posted by matthewg42 (Post 4229673)
what is the output of:
Code:

ls -l new*

Following is the output of the command mentioned:
Code:

[sdpuser@gnnsdp40 test]$ ls -l new*
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newaa
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newab
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newac
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newad
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newae
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newaf
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newag
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newah
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newai
-rw-r--r-- 1 sdpuser sdpadmin 1073741824 Jan 19 11:37 newaj
-rw-r--r-- 1 sdpuser sdpadmin  401405086 Jan 19 11:37 newak


matthewg42 01-19-2011 01:26 AM

What's in the files? Are there any really long lines?

saurabhmehan 01-19-2011 01:33 AM

Reply to matthewg42
 
Quote:

Originally Posted by matthewg42 (Post 4229730)
What's in the files? Are there any really long lines?

yups, some lines are too long.Sample content are as follows:
Code:

21 Dec 2010 23:59:59,016 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] the request soap envelope is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0"><soapenv:Body><a1:BalanceRetrAll><a1:SubscriberID>917871321424</a1:SubscriberID><a1:AgentID>UW_SDP_AGENT</a1:AgentID><a1:ClientID>UW_SDP</a1:ClientID></a1:BalanceRetrAll></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,017 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] the soap envelope object is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0"><soapenv:Body><a1:BalanceRetrAll><a1:SubscriberID>917871321424</a1:SubscriberID><a1:AgentID>UW_SDP_AGENT</a1:AgentID><a1:ClientID>UW_SDP</a1:ClientID></a1:BalanceRetrAll></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,327 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] AmountChargingPluginImpl::ChargeAmount: Entry
21 Dec 2010 23:59:59,327 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] reading chargeAmount request parameters
21 Dec 2010 23:59:59,327 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] validating MSISDN
21 Dec 2010 23:59:59,327 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] stripped MSISDN is : 8436108539
21 Dec 2010 23:59:59,327 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] The request parameters received from the application:
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  THE CHARGEABLE AMOUNT =0.0
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  THE CURRENCY  =INR
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  THE CODE =
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  THE DESCRIPTION  = Retrieve-Balance
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  THE MSISDN  =8436108539
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  THE REFERENCE CODE =balanceEnquiry:true;subsChannel:OTA;channelType:Subscription;transactionId:-50225716%3A12d0a2a4801%3A-7196;pricePtAvl:true;eventType:subscription;contentId:5048;serviceId:PT_LocalNews01;Circle_Name:WB;Circle_ID:1;isRated:Yes;productName:NWLOCAL001;basePrice:0.0;subsType:RECURRING;Sub_Profile:Pre-Paid
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Calling the getBalanceEnquiry now
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] balance enquiry is : true
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] AmountChargingPluginImpl::calling getBalance:
21 Dec 2010 23:59:59,328 DEBUG -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Initialize chgAmtResponse appLog in get Balance Wrapper..
21 Dec 2010 23:59:59,328 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] CommunicationImplDCI:getBalance():Entry
21 Dec 2010 23:59:59,337 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] using url: http://10.16.6.50:8180/axis2/services/SAM_BR_Service_v1_0 username and password : unitech1 unitecha
21 Dec 2010 23:59:59,337 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] constructing get balance request to be sent to IN
21 Dec 2010 23:59:59,337 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  balance retrieve all request sent to IN
21 Dec 2010 23:59:59,337 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] the request soap envelope is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0"><soapenv:Body><a1:BalanceRetrAll><a1:SubscriberID>918436108539</a1:SubscriberID><a1:AgentID>UW_SDP_AGENT</a1:AgentID><a1:ClientID>UW_SDP</a1:ClientID></a1:BalanceRetrAll></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,338 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] the soap envelope object is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0"><soapenv:Body><a1:BalanceRetrAll><a1:SubscriberID>918436108539</a1:SubscriberID><a1:AgentID>UW_SDP_AGENT</a1:AgentID><a1:ClientID>UW_SDP</a1:ClientID></a1:BalanceRetrAll></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,490 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] the response soap envelope object is <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><ns1:BalanceRetrAllResp xmlns:ns1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0">

          <StatusCode>0</StatusCode>

          <BalanceBucketTable>1;20091003;20110222;3.00;|</BalanceBucketTable>

          <BalanceVersion>1</BalanceVersion>

          <BalanceBucketMaxRows>20</BalanceBucketMaxRows>

        </ns1:BalanceRetrAllResp></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,491 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] the request is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0"><soapenv:Body><a1:BalanceRetrAll><a1:SubscriberID>917871321424</a1:SubscriberID><a1:AgentID>UW_SDP_AGENT</a1:AgentID><a1:ClientID>UW_SDP</a1:ClientID></a1:BalanceRetrAll></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,491 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0]
21 Dec 2010 23:59:59,491 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0]
21 Dec 2010 23:59:59,491 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0]
21 Dec 2010 23:59:59,492 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] the response is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><ns1:BalanceRetrAllResp xmlns:ns1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0">

          <StatusCode>0</StatusCode>

          <BalanceBucketTable>1;20091003;20110222;3.00;|</BalanceBucketTable>

          <BalanceVersion>1</BalanceVersion>

          <BalanceBucketMaxRows>20</BalanceBucketMaxRows>

        </ns1:BalanceRetrAllResp></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,492 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0]  the get balance response is : {StatusCode=0, BalanceBucketTable=1;20091003;20110222;3.00;|, BalanceVersion=1, BalanceBucketMaxRows=20}
21 Dec 2010 23:59:59,492 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] ****** Response From IN********
21 Dec 2010 23:59:59,492 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] Mobile No      :7871321424
21 Dec 2010 23:59:59,492 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] status code          :0
21 Dec 2010 23:59:59,492 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] ******
21 Dec 2010 23:59:59,492 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] getBalanceWrapper::getBalance:
21 Dec 2010 23:59:59,493 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] Inside parsing Balance Buckets-[2;, 1;, 27;]
21 Dec 2010 23:59:59,493 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] Current Bucket Lookedup  :1;
21 Dec 2010 23:59:59,493 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] Parse Bucket Details-->{1;=3.00, STR_BUCKET_TOTAL=3.00}
21 Dec 2010 23:59:59,493 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] Bucket details after summation : 1;20091003;20110222;3.00;|
21 Dec 2010 23:59:59,493 INFO  -  [dci_plugin_InstanceTwo#wlng_nt_payment_dci#1.0] the balance bucket is : 1;20091003;20110222;3.00;|
21 Dec 2010 23:59:59,757 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] the response soap envelope object is <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><ns1:BalanceRetrAllResp xmlns:ns1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0">

          <StatusCode>0</StatusCode>

          <BalanceBucketTable>1;20091003;20110619;20.50;|13;20101210;20110109;102400.00;|20;20101210;20110109;99854.00;|24;20101208;20110107;334.00;|</BalanceBucketTable>

          <BalanceVersion>1</BalanceVersion>

          <BalanceBucketMaxRows>20</BalanceBucketMaxRows>

        </ns1:BalanceRetrAllResp></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,759 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] the request is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:a1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0"><soapenv:Body><a1:BalanceRetrAll><a1:SubscriberID>918436108539</a1:SubscriberID><a1:AgentID>UW_SDP_AGENT</a1:AgentID><a1:ClientID>UW_SDP</a1:ClientID></a1:BalanceRetrAll></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,759 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]
21 Dec 2010 23:59:59,759 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]
21 Dec 2010 23:59:59,759 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]
21 Dec 2010 23:59:59,759 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] the response is : <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><ns1:BalanceRetrAllResp xmlns:ns1="http://telcordia.com/cvas/rcs/SAM/BalanceRetrieve/schemas/BalanceRetrAll/v1_0">

          <StatusCode>0</StatusCode>

          <BalanceBucketTable>1;20091003;20110619;20.50;|13;20101210;20110109;102400.00;|20;20101210;20110109;99854.00;|24;20101208;20110107;334.00;|</BalanceBucketTable>

          <BalanceVersion>1</BalanceVersion>

          <BalanceBucketMaxRows>20</BalanceBucketMaxRows>

        </ns1:BalanceRetrAllResp></soapenv:Body></soapenv:Envelope>
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0]  the get balance response is : {StatusCode=0, BalanceBucketTable=1;20091003;20110619;20.50;|13;20101210;20110109;102400.00;|20;20101210;20110109;99854.00;|24;20101208;20110107;334.00;|, BalanceVersion=1, BalanceBucketMaxRows=20}
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] ****** Response From IN********
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Mobile No      :8436108539
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] status code          :0
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] ******
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] getBalanceWrapper::getBalance:
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Inside parsing Balance Buckets-[2;, 1;, 27;]
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Current Bucket Lookedup  :1;
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Current Bucket Lookedup  :13;
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Current Bucket Lookedup  :20;
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Current Bucket Lookedup  :24;
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Parse Bucket Details-->{1;=20.50, STR_BUCKET_TOTAL=20.50}
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] Bucket details after summation : 1;20091003;20110619;20.50;|13;20101210;20110109;102400.00;|20;20101210;20110109;99854.00;|24;20101208;20110107;334.00;|
21 Dec 2010 23:59:59,760 INFO  -  [dci_plugin_InstanceOne#wlng_nt_payment_dci#1.0] the balance bucket is : 1;20091003;20110619;20.50;|13;20101210;20110109;102400.00;|20;20101210;20110109;99854.00;|24;20101208;20110107;334.00;|


matthewg42 01-19-2011 02:47 AM

By long, I was thinking like over a gigabyte of data without a newline character. Hmm, this looks grepable. Can't see anything here which would cause grep to behave badly.

Perhaps there are some embedded NUL characters. I've found that non-gnu implementations of grep sometimes freak out with embedded NULs.

Perl handles them well, so you might have some joy by emulating grep with Perl:

Code:

perl -n -e 'print if (/8858406465/);' new*

saurabhmehan 01-19-2011 03:06 AM

Reply to matthewg42
 
Quote:

Originally Posted by matthewg42 (Post 4229796)
By long, I was thinking like over a gigabyte of data without a newline character. Hmm, this looks grepable. Can't see anything here which would cause grep to behave badly.

Perhaps there are some embedded NUL characters. I've found that non-gnu implementations of grep sometimes freak out with embedded NULs.

Perl handles them well, so you might have some joy by emulating grep with Perl:

Code:

perl -n -e 'print if (/8858406465/);' new*

Hi Matthewg42
Its working but its searching the string that exist at the end of the file only not the complete file.I came to know because the data in the output starts at 18 Jan 2011 23:39:20,246 and end at 18 Jan 2011 23:59:57,520.
Command used is as follows:
Code:

perl -n -e 'print if (/18 Jan 2011 22/);' DCILog_DEBUG.csv.2011-01-18
If possible please help me with splitting of files too that can be readable as i have tried with following command and only last file exist is readable :
Code:

split --bytes=1024m DCILog_DEBUG.csv.2011-01-18

matthewg42 01-19-2011 03:39 AM

Quote:

Originally Posted by saurabhmehan (Post 4229811)
Its working but its searching the string that exist at the end of the file only not the complete file.

I don't understand what you are saying. I also don't see how you can be sure the pattern is in the file in places other than the command to search for it finds it... if you already know where and what is in the file, why are you trying to work out a command to find out?

Quote:

Originally Posted by saurabhmehan (Post 4229811)
I came to know because the data in the output starts at 18 Jan 2011 23:39:20,246 and end at 18 Jan 2011 23:59:57,520.
Command used is as follows:
Code:

perl -n -e 'print if (/18 Jan 2011 22/);' DCILog_DEBUG.csv.2011-01-18

I don't know the structure of your data, and so to me everything after the "because" in your statement is meaningless to me.

Quote:

Originally Posted by saurabhmehan (Post 4229811)
If possible please help me with splitting of files too that can be readable as i have tried with following command and only last file exist is readable :
Code:

split --bytes=1024m DCILog_DEBUG.csv.2011-01-18

You can use the split command to do this. GNU's implementation of split can split by size in bytes, number of lines and various other criteria. I would probably go for lines, since this is a log-like data. For example, to split into chunks of 100,000 lines each:
Code:

split --lines=100000 inputfile
This will create files called xaa, xab, xac....


All times are GMT -5. The time now is 02:57 PM.