Small C++ project

Rawful · 06-02-2011, 05:39 PM

I am new to linux and fairly new to C++, and I have a small project that I am trying to do for my job to help us out. Basically, I need to make a tool that lets me view all the sectors of a hard drive in hexadecimal format to make sure they are all zero after a low-level format. I need it to be very minimal, and display the data in a way that I can scroll down and skim through the sectors. Doesn't have to be pretty, just functional.

I need it to do more things down the road, but this is the first hurdle I need to overcome. I would like to create a GUI interface so it looks nice, but first I am only concerned with the sector viewing function. I am not entirely sure where I should start.

I see there is a tool called dd I could use to read the hard drive and I am wondering if I need to use that, or if I can just open /dev/hda as a file and be able to view all the sectors that way.

I am really hoping someone has a bit of spare time and can get in touch with me and help me along.

Also, just to clarify, I am wanting to write this tool for linux, specifically DSL. I need it to be a very small distribution that can be loaded quickly from a usb drive, cd, or over the network with PXE.

Sergei Steshenko · 06-02-2011, 06:09 PM

Quote:

Originally Posted by Rawful

... I need to make a tool that lets me view all the sectors of a hard drive in hexadecimal format to make sure they are all zero after a low-level format. ...

And why not use, say, 'dd' for that ? I.e. using 'dd' you copy partition into a file, and then in the file you verify that there is non non-0 data.

And this verification can be done by simply comparing with a file of the same size which for sure contains only zeros, and this references only zeros file can also be created by 'dd'.

And for comparison you can use 'diff' or 'cmp'.

So, why writing anything in C++ ?

Rawful · 06-02-2011, 06:37 PM

I want to write a C++ program to it because of what I will be doing with it later. Instead of saving it to a file, I would like it to output it in the program. As I said, this is just the first hurdle for me. Now, perhaps instead of recreating the functionality of dd, it would be easier to try to execute the shell command from within the C++ program and retrieve the output.

Sergei Steshenko · 06-02-2011, 06:40 PM

Quote:

Originally Posted by Rawful

I want to write a C++ program to it because of what I will be doing with it later. Instead of saving it to a file, I would like it to output it in the program. As I said, this is just the first hurdle for me. Now, perhaps instead of recreating the functionality of dd, it would be easier to try to execute the shell command from within the C++ program and retrieve the output.

Again, why a C++ program and not a script using the suggested commands ?

Rawful · 06-02-2011, 06:52 PM

I do not understand, could you please elaborate? The shell script will just run dd and have it output to a file? Would there be away to have it output to a C++ program? I cannot sidestep the need for the C++ program due to what it will also be used to do in the future, but I may just not be understanding what you are suggesting.

To elaborate a little further, I will be using this tool on IDE/SATA drives of any size over 40GB in lots of different computers (all x86 based, though). I will not be installing any OS on these machines after wiping, but I need to create this tool to check the drives after their wipe.

SigTerm · 06-02-2011, 11:03 PM

Quote:

Originally Posted by Rawful

I am new to linux and fairly new to C++, and I have a small project that I am trying to do for my job to help us out. Basically, I need to make a tool that lets me view all the sectors of a hard drive in hexadecimal format to make sure they are all zero after a low-level format. I need it to be very minimal, and display the data in a way that I can scroll down and skim through the sectors. Doesn't have to be pretty, just functional.

You reinventing the wheel - this has already been done before. Linux has multiple hex viewers (hexdump), and you can open any "disk" as a file. Just open disk device file (say /dev/sda1) with any hex viewer and you'll be able to see sectors. There's absolutely no need to write another C++ program for that.

Sergei Steshenko · 06-03-2011, 02:47 AM

Quote:

Originally Posted by Rawful

I do not understand, could you please elaborate? The shell script will just run dd and have it output to a file? Would there be away to have it output to a C++ program? I cannot sidestep the need for the C++ program due to what it will also be used to do in the future, but I may just not be understanding what you are suggesting.

To elaborate a little further, I will be using this tool on IDE/SATA drives of any size over 40GB in lots of different computers (all x86 based, though). I will not be installing any OS on these machines after wiping, but I need to create this tool to check the drives after their wipe.

You can boot Linux from USB/CD/DVD. I am saying that most likely you do not need viewer, it appears all you need to know is whether all the sectors are 0. As another poster has already pointed out, there is 'hexdump' to view the contents. If you want to browse text files (containing non-zero sectors from your wiped drive), there is, for example, 'less' utility.

The "output to a C++ program" statement is vague. But your (unneeded IMO) C++ program can read from stdin, or file, or named pipe.

Rawful · 06-03-2011, 06:58 AM

I realized already before making the first post that I can do that part of what I need without writing a c++ program for it. However, that is not all the program is going to do, which is what I am trying to get across. It will be doing a lot more than just letting you see the sectors of the drive. So, with that in mind, the program is a necessity.

I am not sure how to be less vague. Like any other hex viewer, my program will have a large text display to neatly format and show the sectors on the hard drive. If I can just open the hard drive as a file in C++, what would the syntax be to read each sector? If it has to get the information from, say, 'dd,' what would be the syntax for that?

I'm really not trying to be difficult or make unnecessary work, but this NEEDS to be in a C++ program.

SigTerm · 06-03-2011, 07:21 AM

Quote:

Originally Posted by Rawful

If I can just open the hard drive as a file in C++, what would the syntax be to read each sector?

fopen, lseek/fseek, fread, as with any normal file. You may need to extract sector size from somewhere, though - sectors were normally 512 bytes big, but right now there are drives with larger sectors.

Rawful · 06-03-2011, 08:46 AM

We don't encounter many drives with advanced format, as we are only dealing with used machines that are typically at least 3 years old. The majority of drives will be 40GB or 80GB. I was assuming that I would be segregating the sectors myself (starting at sector 1 and incrementing every 512 bytes). I know that it is possible to read from the drive itself what the model, serial number, number of sectors, and number of bytes per sector are, but that is an entirely different bridge that I will have to cross later.

Would the most efficient way to read and output this be in blocks of 512 bytes or 1 byte at a time? Can I read it like so (for example):

FILE * HDD;

int a = 0; //for keeping track of how many bytes per line have been outputted
int b = 0; //for keeping track of how many bytes per sector have been outputted
unsigned int c = 0; //for keeping track of how many sectors have been outputted
char Bytes[512];

HDD = fopen ("/dev/hda" , "r");

while (!feof(HDD))
{
b=0;
fgets (Bytes , 512 , HDD);
for(b=0;b<512;b++)
{
for(a=0;a<16;a++) //displaying 16 bytes per line
{
printf("%x",Bytes[b]);
}
cout << "\n";
}
c++;
}

Would this be logically correct? Should I open the file as binary? If so, would I have to read it bit by bit instead of a byte at a time?

Sergei Steshenko · 06-03-2011, 11:26 AM

Quote:

Originally Posted by Rawful

We don't encounter many drives with advanced format, as we are only dealing with used machines that are typically at least 3 years old. The majority of drives will be 40GB or 80GB. I was assuming that I would be segregating the sectors myself (starting at sector 1 and incrementing every 512 bytes). I know that it is possible to read from the drive itself what the model, serial number, number of sectors, and number of bytes per sector are, but that is an entirely different bridge that I will have to cross later.

Would the most efficient way to read and output this be in blocks of 512 bytes or 1 byte at a time? Can I read it like so (for example):

FILE * HDD;

int a = 0; //for keeping track of how many bytes per line have been outputted
int b = 0; //for keeping track of how many bytes per sector have been outputted
unsigned int c = 0; //for keeping track of how many sectors have been outputted
char Bytes[512];

HDD = fopen ("/dev/hda" , "r");

while (!feof(HDD))
{
b=0;
fgets (Bytes , 512 , HDD);
for(b=0;b<512;b++)
{
for(a=0;a<16;a++) //displaying 16 bytes per line
{
printf("%x",Bytes[b]);
}
cout << "\n";
}
c++;
}

Would this be logically correct? Should I open the file as binary? If so, would I have to read it bit by bit instead of a byte at a time?

Here is a QND script in Perl showing the very first sector of my /dev/sda:

Code:

amdam2:~ # cat -n /home/sergei/junk/try_dev_acces/test.pl
     1  #!/usr/bin/perl
     2
     3  use strict;
     4  use warnings;
     5
     6  my $device = '/dev/sda';
     7  my $sector_length = 512;
     8
     9  my $zeros_buffer = chr(0) x $sector_length;
    10
    11  open(my $fh, '<', $device) or die "cannot open '$device' $!";
    12
    13  my $buffer;
    14
    15  my $number_of_bytes_read = sysread($fh, $buffer, $sector_length);
    16
    17  warn "\$number_of_bytes_read=$number_of_bytes_read";
    18
    19  if($buffer ne $zeros_buffer)
    20    {
    21    my @buffer = split('', $buffer);
    22
    23    print STDERR "buffer: "; map {print STDERR sprintf("%02x ", ord($_))} @buffer; print STDERR "\n";
    24    }
amdam2:~ # /home/sergei/junk/try_dev_acces/test.pl
$number_of_bytes_read=512 at /home/sergei/junk/try_dev_acces/test.pl line 17.
buffer: 31 c0 8e d0 bc 00 7c 8e c0 8e d8 bf 1e 06 be 1e 7c 50 57 b9 e2 01 f3 a4 b9 00 02 f3 ab cb 80 fa 8f 7e 02 b2 80 52 52 bb 94 07 8d af 2a 00 8a 46 04 66 8b 7e 08 66 03 3e b3 06 84 c0 74 0b 80 7e 00 80 75 05 66 89 3e 84 0b 83 c5 10 83 c3 09 80 fb b8 75 da b8 e1 00 c1 e0 02 89 c6 66 8b ac 00 08 66 85 ed 75 19 b8 c5 06 be bb 06 e8 a5 00 89 c6 e8 9a 00 5a 31 c0 cd 13 cd 18 fb f4 eb fc 66 89 2e b3 06 be ab 06 b4 42 5a 52 cd 13 b8 d9 06 72 d7 a0 00 7c 84 c0 74 03 a1 fe 7d 3d 55 aa b8 e9 06 75 c5 66 89 ee 5a e9 55 75 10 00 01 00 00 7c 00 00 00 00 00 00 00 00 00 00 45 72 72 6f 72 20 00 0d 0a 00 4e 6f 20 61 63 74 69 76 65 20 70 61 72 74 69 74 69 6f 6e 00 44 69 73 6b 20 72 65 61 64 20 65 72 72 6f 72 00 4e 6f 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 65 6d 00 49 6e 76 61 6c 69 64 20 43 48 53 20 72 65 61 64 00 e8 03 00 be c2 06 60 ac b4 0e bb 01 00 cd 10 ac 84 c0 75 f4 61 c3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1c 80 b6 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ca 73 03 00 00 00 00 01 01 00 82 fe bf 0a 3f 00 00 00 0c 34 80 00 80 00 81 0b 83 fe ff ff 4b 34 80 00 9a 24 40 01 00 fe ff ff 83 fe ff ff e5 58 c0 01 3f 34 80 0c 00 fe ff ff 0f fe ff ff 24 8d 40 0e 1d bf f7 2b 55 aa
amdam2:~ #

.

Why C++ ?

SigTerm · 06-03-2011, 11:41 AM

Quote:

Originally Posted by Rawful

Would the most efficient way to read and output this be in blocks of 512 bytes or 1 byte at a time?

It is normally faster to read data in large blocks. Also, on 64bit system you could probably mmap entire drive into memory instead of reading it with fread.

Quote:

Originally Posted by Rawful

Should I open the file as binary?

Technically, yes, you should, although it will work on linux just fine without that.

Quote:

Originally Posted by Rawful

If so, would I have to read it bit by bit instead of a byte at a time?

AFAIK there are no C++/C functions that allow you to read file bit-by-bit. Byte is the smallest unit.

Quote:

Originally Posted by Rawful

Would this be logically correct? Should I open the file as binary? If so, would I have to read it bit by bit instead of a byte at a time?

The majority of those questions are fairly trivial. You could simply test you program instead of asking.

Rawful · 06-03-2011, 11:53 AM

Quote:

Originally Posted by Sergei Steshenko

Why C++ ?

I do greatly appreciate your tips and input, but please, I am using C++ as I have stated many times now because of what else I will be doing with this program after I accomplish this. It is not going to just display the sectors. That is one of it's functions, but there will be more. That is why I am using C++, because I can do everything I need this program to do in C++. I am firm on using C++, and I cannot be dissuaded. I know there are ways to do this task outside of C++, but that would not really be very helpful as I would still need to import the data into my program for display (and for other purposes which I will be implementing later on).

Quote:

Originally Posted by SigTerm

It is normally faster to read data in large blocks. Also, on 64bit system you could probably mmap entire drive into memory instead of reading it with fread.

How would you accomplish this? Wouldn't you fill up the memory long before you mapped the entire drive?

Quote:

Originally Posted by SigTem

The majority of those questions are fairly trivial. You could simply test you program instead of asking.

I do apologize for that. I am not at home at the moment, so I have no way to test anything.

SigTerm · 06-03-2011, 05:38 PM

Quote:

Originally Posted by Rawful

How would you accomplish this? Wouldn't you fill up the memory long before you mapped the entire drive?

As far as I know, mmap doesn't LOAD entire file into memory, it maps file contents into memory. So you should be able to access mmapped file contents even if the file is larger than size of physical memory + swap combined. AFAIK that's the whole point of memory-mapped files. You will need 64bit system for entire drive, because on 32bit you won't be able to mmap everything - there simply won't be enough address space (you can't address more than 2..4GB of virtual memory at once on 32bit process, no matter what).

smallpond · 06-07-2011, 11:12 AM

The standard rule is "First make it work, then make it fast", but for a drive of any size you're going to run into performance issues right away. For a drive of 40 GB (you said they were a few years old), reading and comparing a byte at a time will be slooooow, 40 billion reads, even though the OS will buffer up a 1K or 4K block for you. You probably want to read something like 256KB at a time into your own buffer, scan it for anything non-zero, and then on to the next. That means you're only doing 160,000 reads. Make sure you handle the partial read at the end of the disk correctly.

If that's still too slow, you can use direct I/O to avoid copying or async I/O to be reading the next block while scanning the current one, but get the simple case working first.