LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-07-2012, 07:50 AM   #1
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 133

Rep: Reputation: 16
Dynamically parse BibTeX and create hash of hash


Hello gurus, Iam trying to parse following BibTex file (bibliography.bib):

Code:
@book{Lee2000a,
abstract = {Abstract goes here},
author = {Lee, Wenke and Stolfo, Salvatore J},
title = {{Data mining approaches for intrusion detection}},
year = {2000}
}
@article{Forrest1996,
abstract = {Abstract goes here},
author = {Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji},
title = {{Computer immunology}},
year = {1996}
}
I am using BibTeX-Parser for this which works as expected, problem that I have is with creating hash of hash array. Following code:

Code:
#!/usr/bin/perl
# http://search.cpan.org/~gerhard/BibTeX-Parser-0.62/lib/BibTeX/Parser.pm
use BibTeX::Parser;
use IO::File;
use Data::Dumper;
use strict;
use warnings;

my $filename="bibliography.bib";
my (%bibliography, %article);
my $i;
my ($entry, @entries, $type, $key);
my (my $hkey, my $hvalue);

# open BibTeX
my $fh = IO::File->new("$filename") or die "could not open $filename: $!\n";

# create parser object ...
my $parser = BibTeX::Parser->new($fh);
    
# ... and iterate over entries
while ($entry = $parser->next ) {
  if ($entry->parse_ok) {
  
    # return BibTeX elements like abstract, author, title ...
    @entries = $entry->fieldlist();
    
    # create %article as a hash array e.g. year -> 1996; isbn -> 1581138709 etc.
    foreach (@entries) {
      $article{"$_"} = $entry->field("$_");
    }
    
    # return article's key (Lee2000a, Forrest1996)
    $key = $entry->key;
    
    # append %article into %bibliography with approporiate key
    $bibliography{"$key"} = \%article;
    
    #Debug
    #print $entry->key, "\n";
    #print Dumper (\%article);

    # removes all elements of %article (prepare for next iteration)
    %article = ();
    
    #Debug
    #print "================================\n";
  }
  
  else {
    warn "Error parsing file: " . $entry->error;
 }
}

    #Debug
    #print Dumper (\%bibliography);

CURRENT output of Dumper (\%bibliography);
Code:
$VAR1 = {
          'Lee2000a' => {},
          'Forrest1996' => $VAR1->{'Lee2000a'}
        };

EXPECTED output of Dumper (\%bibliography);
Code:
$VAR1 = {
          'Lee2000a' => {
			    'abstract' => 'Abstract goes here',
			    'author' => 'Lee, Wenke and Stolfo, Salvatore J'
			    'title' => 'Data mining approaches for intrusion detection'
			    'year' => '2000'
			  },
          'Forrest1996' => {
			    'abstract' => 'Abstract goes here',
			    'author' => 'Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji'
			    'title' => 'Computer immunology'
			    'year' => '1996'
			    }
        };

What I am doing Wrong ? Many thanks.
 
Old 12-07-2012, 09:18 AM   #2
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
Hi,

I have no files to test your code. But I suppose you've forgotten to create a new hash %article in every sequence of the loop. When you want to have a hash of hashes, you create a hash of hash-references.
Quote:
Code:
# create %article as a hash array e.g. year -> 1996; isbn -> 1581138709 etc.
    foreach (@entries) {
      $article{"$_"} = $entry->field("$_");
    }
    
    # return article's key (Lee2000a, Forrest1996)
    $key = $entry->key;
    
    # append %article into %bibliography with approporiate key
    $bibliography{"$key"} = \%article;
you push always a reference for the same hash %article to the hash bibliography. Instead you should simply do the "my %article" into the loop in order to create a new hash in every loopsequence.
Code:
# create %article as a hash array e.g. year -> 1996; isbn -> 1581138709 etc.
my %article ; # one hash for every entry
foreach (@entries) {
      $article{"$_"} = $entry->field("$_");
    }
    
    # return article's key (Lee2000a, Forrest1996)
    $key = $entry->key;
    
    # append %article into %bibliography with approporiate key
    $bibliography{"$key"} = \%article;
Markus

Last edited by markush; 12-07-2012 at 09:20 AM.
 
Old 12-07-2012, 12:17 PM   #3
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 133

Original Poster
Rep: Reputation: 16
Thank you, you've absolutelly right. One question, can you please advise how can I firstly sort this structure according "outer - %bibliography" hash keys (Forrest1996, Lee2000a) and then according "inner/nested - %article" hash keys (author, abstract, title, year e.g.) I know that if I am printing hash it first prints key and then value and the order cannot be guaranteed, therfore my idea was to iterate over hashes, bud did not work as expected. Code I have so far

Code:
for $i (sort keys(%bibliography)){
print "$i", "\n";
  for $j (sort keys ($i)){
  print "$j\n";
  }
}


Desired output (during iteration)
Code:
$VAR1 = {
          'Forrest1996' => {
                             'abstract' => 'Abstract goes here',
                             'author' => 'Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji',
                             'title' => '{Computer immunology}',
                             'year' => '1996'
                           },
          'Lee2000a' => {
                          'abstract' => 'Abstract goes here',
                          'author' => 'Lee, Wenke and Stolfo, Salvatore J',
                          'title' => '{Data mining approaches for intrusion detection}',
                          'year' => '2000'
                        },
};
 
Old 12-07-2012, 12:39 PM   #4
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
This approach is absolutely correct. A hash is never ordered. Otherwise the bibliographic entry as hashkey is most efficient. I would write a function for this purpose, here an example, not tested
Code:
sub show_biblio {
    my $hash = shift ;
    my %hash = %$hash ;
    my $key ;
    foreach $key ("abstract", "author", "title", "year") {
        print " key: $hash{$key}\n" ;
    }
}

# and for the complete bibliography 
for (sort keys %bibliography) {
    show_biblio $_ ;
}
just try it out, I have no files for testing the code.

Markus
 
Old 12-07-2012, 01:06 PM   #5
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 133

Original Poster
Rep: Reputation: 16
Added your code at end of script but it exit with following message: "Can't use string ("Forrest1996") as a HASH ref while "strict refs" in use at ./BibParsV2.pl line 83, <GEN0> line 12."
 
Old 12-07-2012, 01:11 PM   #6
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
mh, i'm not sure, what happens when you comment "use strict ;"? does the program run?

Markus
 
Old 12-07-2012, 01:27 PM   #7
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 133

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by markush View Post
mh, i'm not sure, what happens when you comment "use strict ;"? does the program run?

Markus
No, it ends up with following message: "Use of uninitialized value within %shash in concatenation (.) or string at ./BibParsV2.pl line 96, <GEN0> line 2628." Seems that this code works

Code:
for $i (sort keys(%bibliography)){
  print "$i => ", "\n";
  #print Dumper ($bibliography{"$i"});
  for $j (sort keys ($bibliography{"$i"})){
  print "\t $j -> ", $bibliography{"$i"}{"$j"},"\n";
  }
}
 
Old 12-07-2012, 01:34 PM   #8
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
You could within my above function
Code:
sub show_biblio {
    my $hash = shift ;
    my %hash = %$hash ;
    my $key ;
    foreach $key ("abstract", "author", "title", "year") {
        # print " key: $hash{$key}\n" ;
        print $key ;
    }
}
put a print statement and check if the keys are correct.

Markus
 
Old 12-07-2012, 01:57 PM   #9
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 133

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by markush View Post
You could within my above function
Code:
sub show_biblio {
    my $hash = shift ;
    my %hash = %$hash ;
    my $key ;
    foreach $key ("abstract", "author", "title", "year") {
        # print " key: $hash{$key}\n" ;
        print $key ;
    }
}
put a print statement and check if the keys are correct.

Markus
Nope, "Can't use string ("Albag2001") as a HASH ref while "strict refs" in use at ./BibParsV2.pl line 93, <GEN0> line 2628.". Albag2001 - this was tested on another dataset but this should no mather.
 
Old 12-10-2012, 07:16 AM   #10
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 133

Original Poster
Rep: Reputation: 16
Hi, now I have structure which looks like this (hash of hashes):

Code:
$VAR1 = {
          'Lee2000a' => {
                'abstract' => 'Abstract goes here',
                'author' => 'Lee, Wenke and Stolfo, Salvatore J'
                'title' => 'Data mining approaches for intrusion detection'
                'year' => '2000'
              },
          'Forrest1996' => {
                'abstract' => 'Abstract goes here',
                'author' => 'Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji'
                'title' => 'Computer immunology'
                'year' => '1996'
                }
        };

I would like to sort this structure according three conditions (in this order):

1st - according year value (1996,2000)
2nd - according "outer" (Lee2000a, Forrest1996) structure keys
3rd - according to "inner" structure keys (abstract, author, title, year) in alpahabetical order.

So far I have two codes which I need to combine somehow:

I. code meets 2nd and 3rd criterium
Code:
for $i (sort keys(%bibliography)){
   print "$i => ", "\n";
   for $j (sort keys ($bibliography{"$i"})){
   print "\t $j -> ", $bibliography{"$i"}{"$j"},"\n";
   }
}

II. code meets 1st condition
Code:
for $i (sort { ($bibliography{$a}->{year} || 0) <=> ($bibliography{$b}->{year} || 0) } keys %bibliography){
  print "$i => ", "\n";
  for $j (sort keys ($bibliography{"$i"})){
    print "\t $j -> ", $bibliography{"$i"}{"$j"},"\n";
  }
}

Thank you very much
 
Old 12-10-2012, 08:10 AM   #11
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
Hi,
Quote:
Originally Posted by wakatana
3rd - according to "inner" structure keys (abstract, author, title, year) in alpahabetical order.
this seems odd to me. Wouldn't it be better to sort according to "author", "title"? I mean, when I search in such a table, I can understand that the the entries are sorted according to the years, but then, when there are several entries for a year, I would expect, that these are sorted according to the author.

As for your question, I consider if it wouldn't be better to use an array of hashes (or hash references respectively) as datastructure. Each hash in the form
Code:
      %hash = { 'entry' => 'Lee2000a',
                'abstract' => 'Abstract goes here',
                'author' => 'Lee, Wenke and Stolfo, Salvatore J'
                'title' => 'Data mining approaches for intrusion detection'
                'year' => '2000'
              };
      %hash = { 'entry' => 'Forrest1996',
                'abstract' => 'Abstract goes here',
                'author' => 'Forrest, Stephanie and Hofmeyr, Steven A. and Anil, Somayaji'
                'title' => 'Computer immunology'
                'year' => '1996'
                };
The you have not to distinguish between sorting the entries and sorting their content.

Code:
@sorted = sort { $a{ year } <=> $b{ year } or
                 $a{ author } cmp $b { author } or
                 $a{ title } cmp $b{ title } } @array ;
not tested. You should read the documentation
Code:
perldoc -f sort
Markus

Last edited by markush; 12-14-2012 at 01:06 AM.
 
Old 12-13-2012, 04:59 PM   #12
wakatana
Member
 
Registered: Jul 2009
Location: Slovakia
Posts: 133

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by markush View Post
Hi,
this seems odd to me. Wouldn't it be better to sort according to "author", "title"?
I think yes, it would be possible as you say. I use this code as a little workadround. All I need it is comaparing two BibTeXs. Imagine that you have two BibTexTs that you want compare e.g. with diff utility, the problem is that those BibTexTs can be the same but their records can be "reordered" this is why I have sorted them this way.



Quote:
Originally Posted by markush View Post
Hi,
I consider if it wouldn't be better to use an array of hashes (or hash references respectively) as datastructure
Can you please post examples how to use those structures in my code ?

Many thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is the hash in "rootpw --iscrypted HASH" in Kickstart encrypted with md5? Rogue45 Linux - Newbie 1 08-01-2012 12:46 AM
Perl Hashes -- Updating a hash ref via hash value 0.o Programming 5 06-05-2012 12:45 PM
Perl Hash of Hash reference query kdelover Programming 1 02-19-2011 04:47 AM
Using hash value as key for other hash in Perl scuzzman Programming 6 02-14-2006 05:08 PM
Create a hash with any data Nerox Programming 3 07-31-2004 08:15 AM


All times are GMT -5. The time now is 08:59 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration