GIDForums  

Go Back   GIDForums > Computer Programming Forums > C Programming Language
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 18-Apr-2006, 22:51
Heresy Heresy is offline
New Member
 
Join Date: Feb 2006
Posts: 10
Heresy is on a distinguished road

Need help getting started on a database search program


I have an assignment (for an intro C programming class) that I have a few weeks to complete but need a place to start. The assignment is to make a program that takes a user input of a word or characters and finds all the matches from a seperate database file. "Wildcard" characters in the form of periods and astrix are used to further complicate this.

To explain it better, if the user inputs "a.d" then the period signifies that there is a letter missing, but only a single letter. If the user inputs "a*d" then the astrix signifies that the word only needs to begin with an "a" and end with a "d" with any amount of letters in between.These 2 wildcard characters can be used any number of times and in any combination.

My problem is that this seems like such a huge project that I have no idea where to begin. I know to first take the input as a string then break the string down into its individual components. I would do the same thing with the database file then could just compare the individual components to see if a word matched. Now, if I encounter a period then all I need to do is simply move on to the next character, but I am lost on what to do if I have an astrix inputted. Am I moving in the right direction or am I totally lost?
  #2  
Old 19-Apr-2006, 07:20
davis
 
Posts: n/a

Re: Need help getting started on a database search program


Quote:
Originally Posted by Heresy
I have an assignment (for an intro C programming class) that I have a few weeks to complete but need a place to start. The assignment is to make a program that takes a user input of a word or characters and finds all the matches from a seperate database file. "Wildcard" characters in the form of periods and astrix are used to further complicate this.

To explain it better, if the user inputs "a.d" then the period signifies that there is a letter missing, but only a single letter. If the user inputs "a*d" then the astrix signifies that the word only needs to begin with an "a" and end with a "d" with any amount of letters in between.These 2 wildcard characters can be used any number of times and in any combination.

My problem is that this seems like such a huge project that I have no idea where to begin. I know to first take the input as a string then break the string down into its individual components. I would do the same thing with the database file then could just compare the individual components to see if a word matched. Now, if I encounter a period then all I need to do is simply move on to the next character, but I am lost on what to do if I have an astrix inputted. Am I moving in the right direction or am I totally lost?

Can you post the database file or at least demonstrate the organization of its data?

While asterisk is a universal wildcard, it is strange that a period is required instead of a ? for a single character, particularly when it comes to file name pattern matching. Oh well.

The challenge in parsers is to identify various events, states and transitions encountered as you read a character in the stream. Your work flow will change as various events happen that causes states to change. Read up on FSMs or finite state machines in C.

Maybe try taking a look at: autogen.sourceforge.net

Another way to think about the complexity of the task is that if it wasn't difficult, it would be boring! Sometimes when I'm sitting in my office wondering how in the world I'm going to solve a particular problem, I remind myself that if it wasn't difficult, they wouldn't need me!


:davis:
  #3  
Old 19-Apr-2006, 15:25
Heresy Heresy is offline
New Member
 
Join Date: Feb 2006
Posts: 10
Heresy is on a distinguished road

Re: Need help getting started on a database search program


Quote:
Originally Posted by davis
Can you post the database file or at least demonstrate the organization of its data?

The database file is simple a list of words in a text file with one word per line. I have been rethinking my strategy for this problem. So, I may be changing my approach, but the initial idea of over 1200 lines of code for this problem just seems daunting at the moment. Thanks for the reassurance though.
  #4  
Old 22-Apr-2006, 09:26
davis
 
Posts: n/a

Re: Need help getting started on a database search program


Quote:
Originally Posted by Heresy
The database file is simple a list of words in a text file with one word per line. I have been rethinking my strategy for this problem. So, I may be changing my approach, but the initial idea of over 1200 lines of code for this problem just seems daunting at the moment. Thanks for the reassurance though.

Do you know if there are going to be any duplicate words in the list?

My strategy for implementing this was really quite easy.

After thinking about it for a couple of days, I think that the easiest way to go about it is to use a multi-pass parser and don't worry about cycles or performance. Rather, just write something that works.

I started with a linked list that I initialized with a set of words read in from a word list file containing one word per line.

wordlist.txt
Code:
alphabet boxcar carousel dilute elevator fascinate grateful helmsman irradiated janitor kindness leftovers mediocre natural octogon philander quixotic reassuring stationary terse unconditional verbose weird xenoplasmic yesterday zetamorph

...I just used the first word that came to mind for each letter in the alphabet. As you can see from my list, ".e*" as search criteria would/should produce several hits.

I read these into an initial linked list with the plan to parse character by character in the search criteria, each time, refining the contents of the list (using a temporary list, not the original "wordlist" initialized linked list).

I wrote the basic functions necessary to manage the list and search through for either a given character at a given index, a minimum and maximum sized word and so forth.

Each function took a pointer to two WORD_LIST structures, one as the "source" words and the other as the "destination" words. I always used an empty list as the "result" list. I used word comparison functions to identify if character(s) of interest appeared in the word at a given node in the list. On a match, I "added" the word to the destination list and continued marching onward.

By successively calling these operations for all elements of the search criteria, the filtering process eventuated a "result set" in the form of a linked list of words matching the search criteria. I did not write a sorting algorithm to sort the words, but it wouldn't be very difficult to implement if required. I did write a simple "print_word_list" function that spews out the contents of the result set to stdout.

It seemed to me to be a reasonable way to get started and do the job without incurring too much overhead. I have about 70 total lines in my header file and about 300 in my .c implementation file. Over the course of thinking about it and writing a bit of code here and there, it probably took me 45 minutes to write it. I don't think that it is excessively challenging nor should it take 1200 lines of code.

I didn't implement a lot of the list management functions that would be required for sorting, such as inserting a node into the middle of a list. All of my words are always inserted at the end of the list, which dramatically reduces the number of list management functions.

I didn't implement list traversal functions, since I knew that I was always going to be running from first to last nodes in every "parsing" operation, I just stepped through the list from head to tail each time. A better implementation would include those features and use them to gain some flexibility in the design.

I didn't need to implement node freeing functions, as all I really needed was to free the entire list for each of the successively filtered lists that I used. I used one list for the original words and two temps to "catch" and the other to "throw" words from one to the other during the filtration process. At the end of the code, after printing the results, I simply freed all three lists.

I didn't worry about memory usage (other than to insure no leaks), so instead of just moving a node from a temp to the other temp on match, I copied the node instead. A better design would, if appropriate, just move the matched node to new location in the result list.

Undoubtedly there are a lot better choices that are must less processor/memory intensive especially if very long lists are expected.

Let me know if you'd like for me to post some of my code or if you'd rather take it from here on your own.


:davis:
  #5  
Old 23-Apr-2006, 20:41
Heresy Heresy is offline
New Member
 
Join Date: Feb 2006
Posts: 10
Heresy is on a distinguished road

Re: Need help getting started on a database search program


I am thinking along the same lines as you Davis, but I am still not sure how to account for the astrixes. Also, I am not sure of the coding that I would use to add all the matching words to a seperate file then how I would access that file to do the same matching function over again. One of my former partners on this project sent me some code that first checks to see if the input word is all dots, all astrixes, or a mixture of both then implements a recursive function in the latter case to check all the letters. I think it is a little excessive, but I have yet to fully understand everything that he has written. Still, I think your solution is more of what I am looking for but I don't understand those two sections I described in the first 2 sentences.
  #6  
Old 24-Apr-2006, 09:03
davis
 
Posts: n/a

Re: Need help getting started on a database search program


Quote:
Originally Posted by Heresy
I am thinking along the same lines as you Davis, but I am still not sure how to account for the astrixes. Also, I am not sure of the coding that I would use to add all the matching words to a seperate file then how I would access that file to do the same matching function over again. One of my former partners on this project sent me some code that first checks to see if the input word is all dots, all astrixes, or a mixture of both then implements a recursive function in the latter case to check all the letters. I think it is a little excessive, but I have yet to fully understand everything that he has written. Still, I think your solution is more of what I am looking for but I don't understand those two sections I described in the first 2 sentences.

You need to consider what the asterisk (no x) means. It means any letters and any number of letters.

a*d means that you must check all words in the word list of all sizes and return only those that begin with an "a" and end with a "d." It is really very easy.

*g means that you check all words returning all that end with a "g."

a*c*d means that you check all 3+ letter words beginning with an "a," ending with a "d" and containing a "c" anywhere between the first and last characters.

There is nothing difficult about it. Write a function like this:

int index_of( const char letter, const char* word );

...where the return type is the index into the word. It is easy to compare the return type to zero (start of word) versus the strlen -1 of the word to get first or last.

The thing to remember about asterisks is that you will search all words and that the length is no longer important.

Dot words mean that length is important.

a.d means that only three letter words are going to be in the result set.
a.*d is no different than a*d except that the minimum size of the word MUST be three characters and the maximum size is the size of the longest word in the list.

CPP / C++ / C Code:
int find_words_of_minimum_length( const WORD_LIST* wlist_in, WORD_LIST* wlist_out, const int min_length );
int find_words_of_maximum_length( const WORD_LIST* wlist_in, WORD_LIST* wlist_out, const int max_length );

Here are the declarations of the functions that I wrote that find min/max length. During your parsing of the search criteria, you need to decide how long/short the result set word(s) must be.

a.d means exactly three letter words in the result set. I used find_words_of_minimum_length to refine the wlist_in to wlist_out where min_length = 3. Then do the same thing for max_length = 3. The word list will contain all of the 3 letter words in the list. Then further refine it with:

begins_with(...)
contains(...)
ends_with(...)

The wlist_out becomes the wlist_in for every successive refinement operation and the "temporary" list is emptied and fed in as the wlist_out for each operation. Always start with an empty list for the result set.

One of the things that I did was to simplify the search by storing the word length with the word so that I only ever have to call strlen once per word on insertion.

CPP / C++ / C Code:
typedef struct st_word
{
    char*           word;
    int             length;
    struct WORD*    prev;
    struct WORD*    next;
} WORD;

You can easily write a function called find_min_max( ... ) where you pass in the starting word list and the result set word list and a minimum and a maximum so that you can do min and max length in one pass, rather than two passes. As you can see in the structure, I stored the length so that traversing the list means really just looking at the length and not calling strlen on the word member on each pass.

My "add_word" function:

CPP / C++ / C Code:
int 
add_word( WORD_LIST* wlist, const char* word )
{
    int rc = 0;
    WORD* node = (WORD*)malloc( sizeof( WORD ) );
    if( node )
    {
        node->length = strlen( word );
        node->word = (char*)malloc( 1+ node->length );
        strcpy( node->word, word );
        if( wlist->head == 0 )
        {
            wlist->head = node;
            wlist->tail = node;

            node->prev = (struct WORD*)node;
            node->next = 0;
        }
        else
        {
            node->prev = (struct WORD*)wlist->tail;
            wlist->tail->next = (struct WORD*)node;
            node->next = 0;
            wlist->tail = node;
        }
        ++wlist->count;
    }
    else
    {
        rc = ENOMEM;
    }
    return rc;
}

...what would normally be a call to strlen in the (char*)malloc operation is replaced by the node->length. This isn't very exciting, rather, I'm just lazy and didn't want to have to call strlen 900 times in order to do the work, so I just stored the length with the word. I knew that word length was going to be important.

a*d means that word length is unimportant...so the starting list is all words, the next refinement is all words that begin with a and the next refinement is all words that end in d and the work is done.

*d means all words to start, words ending in d to finish.

Any time that you have a dot or another character in the search criteria, word length becomes important again.

.*d means at least two letter words ending in d.
...d means four letter words ending in d.
a.d means three letter words beginning with a and ending with d.

You only need to consider what your first refinement is to get started. Once you return the list that represents that first refinement, the rest gets fairly easy fairly quickly.

In my word list, I wrote function pointers to various compare type functions and assigned their values to std C string comparison functions where appropriate. On initialization of the word list, I assigned the function pointers. When the search criteria contains no wildcards, then the pattern matching must be exact and strcmp is the solution. It may be a bit overkill for this exercise, but the idea was to make it extensible. Too bad this isn't a C++ project, it would be much easier!


:davis:
  #7  
Old 01-May-2006, 11:27
Heresy Heresy is offline
New Member
 
Join Date: Feb 2006
Posts: 10
Heresy is on a distinguished road

Re: Need help getting started on a database search program


The professor said he could implement the program in 75 lines including the header files. My former partner said he did it with only 25 lines of code. So, I got to thinking and brainstormed with a friend in the class, and I came up with a code that is roughly 100 lines with all the comments (since I can not keep anything straight in my head just by looking at code). I hate using recursive functions or sub-functions so everything is just loops in the main function. Currently, it is spitting me a few errors which I believe I can solve really easily. So, can someone see if anything is horribly wrong with it?

Here is the code I made:

CPP / C++ / C Code:
#include<ctype.h>
#include<math.h>
#include<stdio.h>
#include<string.h>

#define word_length 15
                   
int main (void)
{
    // All datatypes used in the function
    int i, j, n, flag, flag2;
    char input_word[word_length], input_word2[word_length];
    char dictionary_word[word_length], dictionary_word2[word_length];
    
    // File pointer to input file
    FILE *inp;      
    
    // Opens and reads database file of words
	inp = fopen("dictionary.txt", "r");
    
    // Takes input from user and puts it into a character array
    printf("Enter a word that is 15 or less characters long to search for below.");
    printf("You may use *'s or .'s as wildcards but no spaces or hyphens./n");
    printf("Search Term: ");
    scanf("%s", input_word);
    
    // Copies input_word to new string which can be manipulated
    strcpy(input_word2, input_word);    
    
    // Converts input word into lowercase letters
    strlwr(input_word2);    
    
    // Takes word from database file and makes it into a new string
    while (fscanf(inp, "%s", dictionary_word) != EOF);
    {
        // Converts database word into new string
        strcpy(dictionary_word2, dictionary_word);
   
        // Breaks down input word into individual letters
        i=0;
        while (input_word2[i] != '\0')
        {
           // Does lots of cool stuff dealing with the asterisk
           if (input_word2[i]== '*')
              j=(i+1);
              // Means that asterisk is at the end of the string
              if (input_word2[j]== '\0')
                 flag2=1;
                 break;
              // Means a letter comes after the asterisk
              else
                 // Breaks down dictionary word into individual letters then looks
                 // for character input_word2[j] in dictionary_word2
                 n=i;
                 while (dictionary_word2[n-1] != '\0')
                 {
                     if (dictionary_word2[n]==input_word2[j])
                       i=i+n;
                       break;
                     if (dictionary_word2[n]== '\0')
                        printf("%s does not match", dictionary_word);
                        flag2=0;
                        break;
                     else
                         n++;
                 }
           // Skips letter by incrementing counter 
           if (input_word2[i]== '.')
              i++;
           // Compares the letter in position "i" in each word and returns value
           else
               flag = strcmp(input_word2[i], dictionary_word2[i]);
               if (flag==0)
                  printf("%s does not match", dictionary_word);
                  flag2=0;
                  break;
               if (flag==1)
                  i++;
        }
        if (flag2 != 0)
           printf("%s does match", dictionary_word);
   }
	// Closes the database file
    fclose(inp);
	
    return(0);
}
Last edited by cable_guy_67 : 01-May-2006 at 15:54. Reason: Please enclose c code in [c] ... [/c] tags
  #8  
Old 01-May-2006, 16:07
cable_guy_67's Avatar
cable_guy_67 cable_guy_67 is offline
Senior Member
 
Join Date: Oct 2004
Location: Nescopeck, PA
Posts: 1,109
cable_guy_67 is a jewel in the roughcable_guy_67 is a jewel in the roughcable_guy_67 is a jewel in the roughcable_guy_67 is a jewel in the rough

Re: Need help getting started on a database search program


Define horribly...

Sorry, I couldn't help myself. First things, please start using C Code Tags to mark up your posts. It makes it easier to see your formatting and find some problems. See the Guidelines to PostingThread for more information on that.

When I try to compile your code with gcc I get,
Code:
dictionary.c: In function `main': dictionary.c:51: error: parse error before "else" dictionary.c:64: error: parse error before "else" dictionary.c:72: warning: passing arg 1 of `strcmp' makes pointer from integer without a cast dictionary.c:72: warning: passing arg 2 of `strcmp' makes pointer from integer without a cast

The lines 51 and 64 are trying to tell you that you need to take care of some construct problem. The parse error very often means something is forgotten, like a semi-colon. In this case you need some braces { and } to help make sense of it.

For example.
CPP / C++ / C Code:
              if (input_word2[j]== '\0')    // LINE 47
                 flag2=1;
                 break;
              // Means a letter comes after the asterisk
              else
                 // Breaks down dictionary word into individual letters then looks
                 // for character input_word2[j] in dictionary_word2
                 n=i;
                 while (dictionary_word2[n-1] != '\0')

In this section of your code you seem to want the statements on line 48 and 49 to be part of the if on line 47. This problem is throughout your program. Based on your indentation I think you need something like,
CPP / C++ / C Code:
if (input_word2[j]== '\0')
{
  flag2=1;
  break;
}
// Means a letter comes after the asterisk
else
{
  // Breaks down dictionary word into individual letters then looks
  // for character input_word2[j] in dictionary_word2
  n=i;
  while (dictionary_word2[n-1] != '\0')
  {

and so on.

This will help with getting something that you can compile without error. Not without warnings but one thing at a time.

Mark
__________________
"Opportunity is missed by most people because it comes dressed in overalls and looks like work."
--Thomas Alva Edison
"Those who would give up essential liberty to purchase a little temporary safety, deserve neither liberty nor safety."
--Benjamin Franklin
"A happy person is not a person in a certain set of circumstances, but rather a person with a certain set of attitudes."
--Hugh Downs
  #9  
Old 01-May-2006, 19:11
Heresy Heresy is offline
New Member
 
Join Date: Feb 2006
Posts: 10
Heresy is on a distinguished road

Re: Need help getting started on a database search program


I fixed all the mistakes with my brackets and there are no more parse errors. The indenting problem is from me copying it over which screwed up some of the line spacing. Also, I am thinking the error with strcmp is because you can't compare individual characters using this function but only the string as a whole. Am I correct on this?

I will also agree that a good deal of this code is horrible on the eyes with the levels of indententations. The comments are there to help somewhat (help me and whoever else is looking at the code). If and when I post more code, I will be sure to use the proper tags. I couldn't find the right button on the reply screen; so, I chose the next best thing I could find.
  #10  
Old 01-May-2006, 20:25
cable_guy_67's Avatar
cable_guy_67 cable_guy_67 is offline
Senior Member
 
Join Date: Oct 2004
Location: Nescopeck, PA
Posts: 1,109
cable_guy_67 is a jewel in the roughcable_guy_67 is a jewel in the roughcable_guy_67 is a jewel in the roughcable_guy_67 is a jewel in the rough

Re: Need help getting started on a database search program


Quote:
Originally Posted by Heresy
I fixed all the mistakes with my brackets and there are no more parse errors. The indenting problem is from me copying it over which screwed up some of the line spacing.
That's why we want people to use those tags. It deals better with spaces instead of tabs but still makes things more readable.

Quote:
Originally Posted by Heresy
Also, I am thinking the error with strcmp is because you can't compare individual characters using this function but only the string as a whole. Am I correct on this?
Considering that you are only comparing single characters, there must be a simpler way. Hmmm, how to compare 'a' to 'b' and find out if they are equal or not.

Quote:
Originally Posted by Heresy
I will also agree that a good deal of this code is horrible on the eyes with the levels of indententations. The comments are there to help somewhat (help me and whoever else is looking at the code). If and when I post more code, I will be sure to use the proper tags. I couldn't find the right button on the reply screen; so, I chose the next best thing I could find.

No problem. Probably nobody else here but me who looks back at year old code says, "What the heck was I thinking?" The more you do the better you get at things. Just remember making it easy for people to help will increase your odds of getting a reply that helps.

I was segfaulting until I took the strcmp() bit out and just checked char to char. Play with some inputs, see what you find and post back any further questions.

Mark
__________________
"Opportunity is missed by most people because it comes dressed in overalls and looks like work."
--Thomas Alva Edison
"Those who would give up essential liberty to purchase a little temporary safety, deserve neither liberty nor safety."
--Benjamin Franklin
"A happy person is not a person in a certain set of circumstances, but rather a person with a certain set of attitudes."
--Hugh Downs
 
 

Recent GIDBlogWriting a book by crystalattice

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
database search skyloon MySQL / PHP Forum 6 04-Jun-2006 12:09
Type casts ? kai85 C++ Forum 12 23-Jun-2005 12:04
[GIM] gim.h dsmith C Programming Language 0 18-Jan-2005 08:48
Multiple Database search misunderstood MySQL / PHP Forum 11 04-Jun-2004 12:21
Search Engine Positioning 101 and 201 "How To" Tips... 000 Search Engine Optimization Forum 0 29-May-2003 10:34

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 21:08.


vBulletin, Copyright © 2000 - 2008, Jelsoft Enterprises Ltd.