[CONTEST?]Data Structure Test
Data Structure Comparison
I have written a test program to test the viability and performance of different data structures. I have always been intrigued by data structures and a recent post got me to thinking about the performance of distinct type of data structures. I wanted to have some numbers to back statements about different data structures. I have based this upon a quasi-realworld example that actually does something with the numbers. I would like to test several data structures and would like to invite anyone interested in this to submit a data structure to be reviewed.
Real World Example
Let's say that I have gathered a lot of flow data from a drainage stream. This is a long list of unordered data that I want to be able to do some statistics on. The input of this data is to be simulated by the random number generator routine that I have included below. I want to be able to have easy access to the data as well as being completely seperated from the data structure. The required functions are discussed further below. There are four possible sizes of data sets that will be tested. For fairness, these numbers and orders will be identical for each test. There is a small (20,000 numbers), a moderate (100,000 numbers), a large (500,000 numbers), and an extreme (1,000,0000 numbers) test. All structures will be started at the small level and proceed upwards until test results become unfeasible.
Summary of Tests
Needed public functions
For a baseline, I have used my singly linked list. This was not originally created for this type of usage, but I thought it would be a good measuring gauge. I had to use inheritance to define a new class based on this list that would "plug in" to the test program. The results and source are given in the next post.
Dell Inspiron 8500 Laptop
Intel Pentium 4 2.2 Ghz with speedstep
1 Gigabyte of Ram
Linux: Slackware Linux running 2.6.5 kernel & KDE 3.2.1
Windows: Windows XP Pro
All unnecessary programs stopped on both.
This is a good exercise to use, just for testing a data structure. I found a single line that I removed in my list implementation that improved the data insertion on a 20000 item structure from 13.38 seconds to 0.02 seconds. I just have never done any extensive testing on it and never realized that I had such a time-consuming implementation.
I really hope that I get some participation on this. If you have an existing data structure algorithm, you should be able to code the "adapter" header file fairly quickly. If not, it is a really good exercise (IMHO) to write and understand these different structures. And they shouldn't take that long. The longest part of doing this whole exercise for me is writing this post.
I have compiled these using gcc under linux and dev-cpp under windows. They should compile under about anything with a few minor tweaks.
Last edited by dsmith : 04-May-2004 at 10:02.
Single Linked List Results
Submitted by dsmith
Okay, here are the results of my single linked list. I only ran this through the small (20,000 numbers) and the moderate (100,000 numbers) cycles. As can be seen, there is a major difference when going up to 100,000. I ran these both in Linux and Windows with similar results. The Windows version edged out the Linux version in both cases. Here are the listings from the Windows versions for the small and moderate size resepectively.
Small Test Printout
Moderate Test Printout
Two things, just kill the single-linked list. Those are insertion of unsorted numbers into a sorted list and the reverse print-out. This is a true single-linked list, so the implimentation of the call to prev is a killer.
I have made a summary sheet that summarizes all of the tests and results for easy viewing. It is attached below.
My single linked list is done as an in-line class. This is not a requirement for this test.
I have made some slight changes to my original test. The changes that I have made are as follows:
This will cause some changes to the original list of required functions for this test, simply because of the added possible use of the range function. The changed functions are as follows:
Doubly Linked List Results
I have actually tested two more data structures that I made, although both are similar. The first is a routine doubly linked list. I was hoping that it would yeild some better results than the singly linked list due to its ability to search from the current position either forwards or backwards. Unfortunately it gave very similar results (even worse) in most of the search/sort functions. The only place that it pays real dividends is the descending print-out test for obvious reasons. Other than this one fairly useless test, there is not any advantage to a standard doubly-linked list. However, I tried it with a little twist as detailed below and got more favorable results.
The second data structure is a slightly modified doubly linked list that uses the new range function that I have defined above. In the find function, a single call is made to this range function to determine if the search should be started at the first of the list, the end of the list or the current position in the list. Other than that it is identical. This one call almost halfs the time for the insertion of the numbers. In addition, the finding, inserting and deleting of 1000 numbers are all about a quarter of the time of the first two tests. These are the most time consuming items. While this is outstanding, it still does not qualify this modified doubly linked list to take the large test.
Here is the current total time for the moderate sized list run under Linux:
Here are some graphs of the most telling tests:
Future Possible Structures
I am now contemplating a tree structure as well as a dynamic array. The tree structure should work good at insertions, but I am wary about its performance in other tests due to the amount of comparisons that need to be made for trivial tasks (next & prev). The dynamic array is going to be unbeatable in the search test I think, but I am worried about the data insertion and deletion penalties that it will incur.
Last edited by dsmith : 07-Jun-2004 at 08:46.
|Thread Tools||Search this Thread|
|Rate This Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Repetition Structure||ewallo||C++ Forum||4||25-Mar-2004 17:46|
|CSS Layout question||oihjk||Web Design Forum||3||28-May-2003 11:36|
|Review/Beta Test Program - Free Ticketing/Project management software||phantomlight||Websites Reviewed Forum||0||28-Apr-2003 22:52|
|Could you test the speed of this domain for me||jrobbio||Websites Reviewed Forum||8||21-Jan-2003 14:24|