LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-19-2004, 10:29 PM   #1
kuronai
LQ Newbie
 
Registered: Aug 2003
Location: Ballarat, Victoria, Australia
Distribution: Redhat 9.0
Posts: 14

Rep: Reputation: 0
Question Remove XML style tags using C


Hi all

I'm writing a client / server application in C, and want to send a string containing XML style tags. When the string is recieved by the server, i need to strip the string of those tags so we can use the value within.

So far i've tried reversing the string and dropping the last n number of chars from the array, as well as attempting to take out the value using the strtok() function. I've looked through a list of the functions outlined in string.h, but there doesnt seem to be anything that does what i want straight off the bat.

The application has to be written in pure C, is there any existing function I can use to retrieve the content between two string tags? Or if any of you have another idea which may be better, feel free to jump right in.

Here's what i'm trying to achieve:

Code:
char *theString = "<tag>Hello</tag>";
char *theTag = "<tag>";

/*
Do something to get rid of the two tags

Resulting in "Hello"
*/
Thanks in advance
 
Old 10-19-2004, 10:44 PM   #2
blackzone
Member
 
Registered: Jun 2004
Posts: 256

Rep: Reputation: 30
I suggest you do it from scratch yourself.

And if you finish feel free to send me a copy. because I have to do it too

You can try the link below to see if it's useful. Will take sometime to read and get through though.
http://www.easysw.com/~mike/mxml/
 
Old 10-19-2004, 10:44 PM   #3
wapcaplet
LQ Guru
 
Registered: Feb 2003
Location: Colorado Springs, CO
Distribution: Gentoo
Posts: 2,018

Rep: Reputation: 48
Perhaps a ready-made XML-parsing library, such as Mini-xml would work for you. You could probably find others by searching the libraries on freshmeat.

edit: blackzone - Wow, what are the chances we'd both pick the same arbitrary XML library? heh.
 
Old 10-20-2004, 04:17 AM   #4
hack_in_box
LQ Newbie
 
Registered: Sep 2004
Location: India
Distribution: Fedora core1, PCQ Linux2004, SuSE8
Posts: 13

Rep: Reputation: 0
Dont forget libxml if u r working on linux. Its a good one for xml parsing. Also tryout expact (this is OS independent)
 
Old 10-24-2004, 08:43 PM   #5
kuronai
LQ Newbie
 
Registered: Aug 2003
Location: Ballarat, Victoria, Australia
Distribution: Redhat 9.0
Posts: 14

Original Poster
Rep: Reputation: 0
Hi again,

First off, Thanks for your suggestions.

wapcaplet: mini-xml seems like it would be a good choice if i needed a full blown xml parser, but for what i need it to do it looks pretty complex

hack_in_box: i havent heard of libxml, i'll definitely have to look into it.

blackzone: judging by my level of c programming skills, it might be a while before ive got anything working...

I'll keep chipping away at it anyway, thanks again. If you have any further ideas feel free to lend a hand
 
Old 10-27-2004, 01:28 AM   #6
blackzone
Member
 
Registered: Jun 2004
Posts: 256

Rep: Reputation: 30
If you just want simple xml parsing you can try the function below.
Might have bug I dont' know.

It dont' support attribute. (ie. <person born="1978.9.9"></person>)
and DTD and other things.
======================================================
example:
<? version ?>
<tag1>
<innertag1>content1 </innertag1>
<innertag2>content2</innertag2>
</tag1>
<tag2> content2 </tag2>

parse into tree:
[tag="" content= "version"]
|
[tag="tag1" content=""] --> [tag=tag2 content="content2"]
|
[tag="innertag1" content="content1"] --> [tag="innertag2" content="content2"]

The first line is optional:
You don't have to have the <?header tag?>
=======================================================
Code:
//#define DEBUG 1
#include <stdint.h>
#include <stdio.h>
#include <string.h>

struct xmlNode{
  char content[BUFSIZ];  
  char tag[BUFSIZ];
  struct xmlNode *next;
  struct xmlNode *child;
};

void parseOne(char *str, struct xmlNode *head);

int32_t parseXML( char* str, struct xmlNode* head ) 
{
  char substr[BUFSIZ];
  char string[BUFSIZ];
  char tag[BUFSIZ];

  char *ptr1;
  char *ptr2;
  int cur1;
  int cur2;
  char needle;

  struct xmlNode *node;

  strcpy(substr, str); 

#ifdef DEBUG
  printf("Original String:%s\n", str);
#endif
  
  // find content of <?content?>
  ptr1 = strstr( (const char *)substr, "<?" );
  ptr2 = strstr( (const char *)substr, "?>" );
  if( ptr1 != NULL && ptr2 != NULL)
  {
    cur1 = ptr1 - substr;
    cur2 = ptr2 - substr;
    strncpy(string, substr+cur1+2, cur2-cur1-2);
    string[cur2-cur1-2] = '\0';
  
    strcpy(substr, substr+cur2+2); 
    strcpy(head->content, string);
#ifdef DEBUG
    printf("version tag:%s\n", string);
    printf("substring:%s\n", substr);
#endif
  }
  else
  {
    strcpy(head->content, "");
#ifdef DEBUG
    printf("version tag:is NULL\n");
#endif
  }
  node = (struct xmlNode *)malloc( sizeof(struct xmlNode) );
  node->next = NULL;
  head->child = node;
  parseOne(substr, node);
}

void parseOne(char *str, struct xmlNode *head)
{
  char substr[BUFSIZ];
  char string[BUFSIZ];
  char tag[BUFSIZ];

  char *ptr1;
  char *ptr2;
  int cur1;
  int cur2;
  struct xmlNode *node;
  struct xmlNode *cur;

  cur = head; 
  strcpy(substr, str); 

while(1){
  //parse 1
  ptr1 = strstr( (const char *)substr, "<" );
  ptr2 = strstr( (const char *)substr, ">" );
  if( ptr1 != NULL && ptr2 != NULL)
  {
    cur1 = ptr1 - substr;
    cur2 = ptr2 - substr;
    strncpy(string, substr+cur1+1, cur2-cur1-1);
    string[cur2-cur1-1] = '\0';
      
#ifdef DEBUG
    printf("tag:%s\n", string);
#endif

    strcpy(cur->tag, string );
    strcpy(substr, substr+cur2+1); 
#ifdef DEBUG
    printf("substring:%s\n", substr);
#endif
    strcpy(tag, "</");
    strcat(tag, string);
    strcat(tag, ">");
 
    ptr1 = strstr( (const char *)substr, tag );
    cur1 = ptr1 - substr;
    strncpy(string, substr, cur1);
    string[cur1] = '\0';
#ifdef DEBUG
    printf("content:%s\n", string);
#endif

    ptr1 = strstr( (const char *)string, "<" );
    ptr2 = strstr( (const char *)string, ">" );
    if( ptr1 != NULL && ptr2 != NULL)
    {
      strcpy(cur->content, "");
      node = (struct xmlNode *)malloc(sizeof(struct xmlNode) );
      node->next  = NULL;
      node->child = NULL;
      cur->child = node;
      parseOne(string, node);
    }    
    else
    {
      strcpy(cur->content, string);
    }
    strcpy(substr, substr+cur1+strlen(tag)); 
#ifdef DEBUG
    printf("substring:%s\n", substr);
#endif
    ptr1 = strstr( (const char *)substr, "<" );
    ptr2 = strstr( (const char *)substr, ">" );
    if( ptr1 != NULL && ptr2 != NULL)
    {
      node = (struct xmlNode *)malloc(sizeof(struct xmlNode) );
      node->next  = NULL;
      node->child = NULL;
      cur->next = node;
      cur = cur->next;
    }
  }  
  else
    break;
}

}

int32_t main( int32_t argc, char *argv[] )
{
  char myBuf[BUFSIZ]; 
  struct xmlNode *head;
  strcpy(myBuf, "<? version ?><tag1> <innertag1>content1 </innertag1><innertag2>content2</innertag2></tag1> <tag2> content2 </tag2>");

  head = (struct xmlNode *)malloc(sizeof(struct xmlNode));
  parseXML( myBuf, head);
  printf("string|%s|\n", myBuf);
  printf("head->content|%s|\n", head->content); 
  printf("head->child->tag|%s|\n",head->child->tag);
  printf("head->child->child->tag|%s|\n", head->child->child->tag);
  printf("head->child->child->content|%s|\n", head->child->child->content);
  printf("head->child->child->next->tag|%s|\n",  head->child->child->next->tag);
  printf("head->child->child->next->content|%s|\n", head->child->child->next->content);
  printf("head->child->next->tag|%s|\n", head->child->next->tag);
  printf("head->child->next->content|%s|\n", head->child->next->content);
}
 
Old 11-10-2004, 08:45 PM   #7
kuronai
LQ Newbie
 
Registered: Aug 2003
Location: Ballarat, Victoria, Australia
Distribution: Redhat 9.0
Posts: 14

Original Poster
Rep: Reputation: 0
Well, finally got something that works

Basically i took blackzone's code, and hacked it up till it worked. Still doesnt parse attributes, but i dont need it to do that right now.
All i need to remember now is that two little ++'s in the wrong spot can cause disaster. Be careful.
Methinks a bit more learning is required on my behalf

Thanks heaps guys.


Last edited by kuronai; 11-10-2004 at 08:47 PM.
 
Old 11-11-2004, 12:56 AM   #8
blackzone
Member
 
Registered: Jun 2004
Posts: 256

Rep: Reputation: 30
The algorithm is very bad though. So if parsing large files use lots of memory.

If it's for production use try expat.
 
Old 11-12-2004, 12:27 AM   #9
kuronai
LQ Newbie
 
Registered: Aug 2003
Location: Ballarat, Victoria, Australia
Distribution: Redhat 9.0
Posts: 14

Original Poster
Rep: Reputation: 0
The purpose i need it for is to parse a predefined set of XML, so i'm always going to know how big the actual
XML string is... its not actually ever parsing a file, just a string i send and recieve in a client / server type app.
So for what i need it for, it'll do the task

Cheers again
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help to strip XML & XSL tags from multiple files dfrechet Programming 9 10-12-2005 06:52 AM
How to remove single-line XML nodes from files dfrechet Programming 1 10-11-2005 02:00 PM
After Editing Tags with JuK - XMMS do not display tags correctly Artik Linux - Software 0 07-23-2005 05:55 AM
xml style sheet working on windows but not on linux alix123 Programming 1 12-07-2004 06:14 AM
Parsing XML tags with php, can't get attributes of a tag jimieee Programming 1 05-05-2004 10:32 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration