GIDForums

Go Back   GIDForums > Computer Programming Forums > C Programming Language
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read


 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 11-Jun-2007, 00:56
lavaka lavaka is offline
New Member
 
Join Date: Jun 2007
Posts: 5
lavaka is on a distinguished road

malloc/free usage, and general good programming behavior


Hi,
First, apologies for a long post. I'm not a very good C programmer, having done it only a bit in the past. But I'm going to have to use it for my new job, and this means I need to make my code somewhat standardized, so others can read it.

Hence, in the code I post below, please comment not only on incorrect usages, but also on correct, but poor usages (or, if I did something well, please point that out too). Any general tips or advice is very welcome, no matter how picky (i.e. if you don't think I use curly braces well, let me know). I'd rather learn these things now.

About the code I'm posting: it's just a quick routine based on the data that netflix provided for their netflix prize. I wanted to see what kind of result you get if you just provide the average ranking of a movie as the prediction for a users's preference for the movie, as a means to give myself practice working with large datasets. For those of you unfamiliar with the data, it comes in 17,770 files, about 2 GB when uncompressed, and each file corresponds to a movie and each line is a triplet: user, their rating, and the date. I processed the files a little bit in a shell to get rid of the date and change the formatting.

With such large amounts of data, I needed to use "short ints" wherever possible so that it fit into my 1 GB of RAM. I use calloc to dynamically allocate the data.

First question: when is it ESSENTIAL to use "free" ? Why do you use it with calloc'ed/malloc'ed variables only? (or, is that true?). I didn't use it, and my program looks OK. I check the RAM with a system monitor, and it goes up to about 800 or 900 MB, and then back down to the baseline once I quit the program, so it doesn't seem like there's a memory leak. But when I use "free", I get this error:

*** glibc detected *** free(): invalid pointer: 0xbfe914d0 ***

Second question: I'd like to get used to using a makeful, but for some reason these things seem like the most complicated electronic entities ever. Can someone post a very simple makefile? I'm using a makefile that I modified from a friend, but I don't understand half of it.

Third problem: the code doesn't work. SOMETIMES I get a seg fault, but usually it works. So I must be messing up. Anyone know why?

Fourth problem: in my loop (I mention it in the comments) from m = 1 to 17770, it seems like "m" should end at the value 17770, but instead it ends at 17771, and if I watch the loop through a debugger (I used ddd), after m reaches 17770, it goes through just a part of the loop once more. So, I'm messing up again. Anyone see the error?

Fifth question: I use this on different computers, and the files are in different locations. What's an easy way to be able to have the program automatically know where to find the data (assuming it's only on a handful of computers, and I can do a "case" or something)? Can you define a string in the preprocessor?

If it's useful, I'm using redhat with gcc 3.4.6 and using "-O3" as the optimization level.

Ok, thanks to anyone who answers. It's much appreciated. The code is in two files: read_data.c and netflix.h (is this good practice to separate the files? I'm assuming that you might want to expand the code later)

Netflix.h
CPP / C++ / C Code:
#include <stdio.h>
#include <math.h>

float findRMSE( float *testPredictions, int *testTrueRatings, int nTest );

load_data.c
CPP / C++ / C Code:
/* Fri June 8th 2007     		*/
/* Posted to gidforums Sun June 10th	*/

#include "netflix.h"

#define NMOVIES 17770
#define MAXUSERS 480189

int main(int argc, char *argv[]) 
{
   /* "input.dat" contains a list of users and movies, but no ratints	*/
   /* the goal is to estimate those ratings as best as possible		*/
   /* For now, the actual ratings are in the data, so I can determine the error	*/

   /* I don't know how long "input.dat" is, so I'll dynamically allocate memory */
   int		*testMovies, *testUsers ; 	/* from input.dat	*/
   float 	*testPredictions;		/* Estimated Ratings 	*/
   int		*testTrueRatings;		/* Actual Ratings 	*/

   /* This is where I store the known data; each movie file contains	*/
   /* a list of user/rating pairs, so this is a giant, 			*/
   /* double, jagged array.						*/
   int		*movieUsers[NMOVIES+1] ;  	/* this cannot be a short */
   short int	*movieRatings[NMOVIES+1] ;
   int		movieLengths[NMOVIES+1];
   float	movieMeans[NMOVIES+1], movieVars[NMOVIES+1];

   /* For my algorithm, I'll want to a list of the movies/ratings, per user	*/
   short int	*userMovies[MAXUSERS];		/* now, sorted by user	 */
   short int	*userRatings[MAXUSERS];  
   int		userIDs[MAXUSERS];
   short int	userLengths[MAXUSERS];
   float	userMeans[MAXUSERS], userVars[MAXUSERS];

   int		i, j, k, n, nTest, m, u, r, s, t, w, v;
   char		fname[100];
   float	mse;
   FILE		*fp, *fp2;

 /* ******  GET PROBE INFO *******		    	*/
 /* Read "input.dat" to get test user and movies  	*/
 if ( (fp = fopen("input.dat","r") )!=NULL )  {
  /* find out how long the file is */
  n = 0;
  while(!feof(fp)) {
      fscanf(fp, "%d %d",&i,&i);
      n++;
  }
  fclose(fp);  nTest = n-1;
 } else   {
    printf("You need to provide 'input.dat' in the current directory\n");
    exit(1);
 }
 printf("The length of the probe file is %d\n", nTest);

 /* Now, allocate the storage for the predictions*/
 testMovies = (int*) calloc( nTest, sizeof(int) );
 testUsers  = (int*) calloc( nTest, sizeof(int) );
 testPredictions = (float*) calloc( nTest, sizeof(float) );
 testTrueRatings = (int*) calloc( nTest, sizeof(int) );

 /* Now, read in the data */
 fp = fopen("input.dat","r");
 for (i=0; i<nTest; i++)
  	fscanf(fp, "%d %d", testMovies + i, &testUsers[i]);
 fclose(fp);
 /*
 printf("The contents of input.dat are:\n");
 for(i=0; i<nTest; i++) 
	printf("%d %d\n", testMovies[i], testUsers[i]);
 */

 /* ******  READ CUSTOMER IDS	*******		    	*/
 /* ID#'s go up to 2,649,429; there are 480,189 unique users */
 /* I found "MAXUSERS" by using "wc" on the following file   */

 if( (fp = fopen("IDs_sorted.txt","r")) != NULL) {
   for (u=0; u< MAXUSERS; u++) {
	fscanf(fp,"%d", userIDs + u);
   }
   fclose(fp);
 } else {
	printf("Error: can't find IDs_sorted.txt\n");
 }

 /* ******  GET TRAINING DATA *******		    	*/
 /* "movie_lengths.txt" contains pairs of movie and #ratings per movie	*/
 fp2 = fopen("movie_lengths.txt","r");
 for (m=1; m<= NMOVIES; m++) {

	/* Something WEIRD happens to m in this loop*/

	/* NOTE: careful with 0- vs 1-indexing conventions !! 		*/
	/* Currently, the 0 spot is not used, i.e. I've overallocated by one */

	sprintf(fname, "~/netflix/mv_%07d.dat",m);

	/* find length of movie file so I can allocate memory */
	fscanf(fp2, "%d %d", &i, &n);  /* n is length of file */
	movieLengths[m] = n;
	if (i != m ) { 
	   printf("error at movie %d\n",m ); 
	}
	movieUsers[m]   = (      int*) calloc( n, sizeof(      int) );
	movieRatings[m] = (short int*) calloc( n, sizeof(short int) );

	if( (fp = fopen(fname, "r"))!= NULL) {
  	  for (i=0,s=0; i< n; i++) {
	    fscanf(fp,"%d %hd",movieUsers[m]+i,movieRatings[m]+i);
	    s += (float)movieRatings[m][i];
	    }
	  fclose(fp);
	} else { printf("File does not exist");  	}

	/* and to find mean and variances: */
	s = s/(float)n;
	for(i=0,v=0; i < n; i++) {
		v += pow( (float)movieRatings[m][i] - s,   2);
	}
	if (n < 2 ) { printf("# of observations at movie %d is too low",m); }
	movieMeans[m] = s;
	movieVars[m]  = v/(float)(n-1); 

	/*	printf("m and i and n are %d %d %d \n ", m, i, movieLengths[m]); */

 }
 fclose(fp2);
 printf("m and i are %d %d \n ", m, i);
 /* the output of the above line is 
  * 17771 17770
  * but it SHOULD be 
  * 17770 17770  i.e. how did "m" get incremented one more time??  */

 /* ******  SIMPLE GUESS: Take Mean Scores		*/
printf("nTest is %d\n", nTest);
 for (n = 0; n < nTest; n++ ){
	/* need to find value of testTrueRatings */
	m = testMovies[n]; u = testUsers[n];
	for(j=0,k=0; j< movieLengths[m]; j++) {
	   if ( movieUsers[m][j] == u ){
		testTrueRatings[n] = (int)movieRatings[m][j];
		k = 1; break;
	    }
	}
	if (k == 0) 
		printf("never found rating at movie %d\n", m);

	/* and the SECRET algorithm: */
	testPredictions[n] = movieMeans[m];
 }
 
 mse = findRMSE( testPredictions, testTrueRatings, nTest );
 printf("Using the user's mean score, RMSE is %f\n", mse );
 
 /* This is to keep the program running so I can see how much memory is being used */
 printf("Hit any key to terminate  ");
 scanf("%s",fname);

 /* Now, free all the calloc'ed memory	*/
 free(testMovies);
 free(testUsers);
 free(testPredictions);
 free(testTrueRatings);
 free(movieUsers);
 free(movieRatings);
 
 return 0;
}


/* Finds the Root Mean Squared Error  */
float findRMSE(float *testPredictions, int *testTrueRatings, int nTest ) {

  int n;
  float mse = 0;

  for (n=0; n < nTest; n++) {
	mse += pow(testPredictions[n] - (float)testTrueRatings[n], 2);
  }
  return sqrt( mse/(float)nTest );
}

Thanks in advance!
  #2  
Old 11-Jun-2007, 02:47
aie0 aie0 is offline
Member
 
Join Date: Dec 2004
Posts: 246
aie0 is a jewel in the roughaie0 is a jewel in the rough

Re: malloc/free usage, and general good programming behavior


Quote:
Hence, in the code I post below, please comment not only on incorrect usages, but also on correct, but poor usages
like this?
CPP / C++ / C Code:
int		i, j, k, n, nTest, m, u, r, s, t, w, v;

freeing movieUsers and movieRatings should be done in a loop freeing each cell.
  #3  
Old 11-Jun-2007, 09:38
davekw7x davekw7x is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Left Coast, USA
Posts: 4,520
davekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to behold

Re: malloc/free usage, and general good programming behavior


Quote:
Originally Posted by lavaka
Hi,

About the code I'm posting: it's just a quick routine...... 17,770 files...

First question: when is it ESSENTIAL to use "free" ?
You use free() to de-allocate memory that was given to you by calls to malloc and its cousins (calloc, realloc, etc.). You always supply free() with the value of a pointer that was given to you by malloc or calloc or...

You use free() whenever you are through with the memory block. If you have a function that calls malloc() and the function is called multiple times in your program and you don't free() stuff when you are finished with it, that is a memory leak.

When your program terminates, any modern operating system (Linux, Windows XP, etc.) knows what resources were allocated to that process and frees them up for you. It's probably a good idea to get into the habit of calling free before leaving a program for a number of reasons. In particular, the things you do as a standalone program for a "quick routine" may end up being a function of a bigger more meaningful program. The habit of cleaning up after yourself is never wrong, and might save you some grief later on.

Also, there are some tools that test programs for memory leaks and they sometimes give messages about memory leaks that are due to leaving the program without deallocating memory. You get into the habit of ignoring the warnings like this, and some day you may find that you have glossed over a real problem due to the clutter of irrelevant warning messages. Try to make things clean and you won't be sorry. Leave them messy, and...

Quote:
Originally Posted by lavaka

But when I use "free", I get this error:

*** glibc detected *** free(): invalid pointer: 0xbfe914d0 ***
That means that you have called free() with a value that is not something that was given to you by malloc or calloc. That is a bug. The right way to handle a bug is to fix it, not hide it. Change the program to use a small number of users. (More about this later. See footnote.) This program only has a few places where it calls calloc. Print out the pointer values that you are given each time. Then, before calling free(), print out the value that you are giving it. If it is not among the printed values of pointers that you got from calloc, then go back and look at places where they could have changed. If you call free with something that was not given you by malloc or calloc (like the name of an array), then: don't do it.

Variables declared like movieUsers are called "automatic" variables:

CPP / C++ / C Code:
    int *movieUsers[NMOVIES + 1];       /* this cannot be a short */
The compiler/linker/loader allocates storage for an array of pointers. The run-time code for your program will automatically allocate and deallocate the required storage as that variable comes into and goes out of scope.

Quote:
Originally Posted by lavaka
Second question: I'd like to get used to using a makeful,
I assume you mean a "make file". This is a file read by the utility program called "make". The default name is usually either "Makefile" or "makefile". Makefiles can be very complicated for complicated tasks and they can be simple for simple tasks. Look for GNU documentation. Show us your makefile and maybe we can help you understand or simplify it. There are a lot of defaults that GNU make can use that can help.
Code:
# a simple makefile for GNU make # The project consists of a single C file and a single header # If either changes, then we want to compile it all over again # # note that the command (the gcc... line) is preceded by # a tab character. This must be a tab, not spaces. A real tab # # Just enter "make" on the command line and the program will be # recompiled either test.c or test.h has a later timestamp # than the target test: test.c test.h gcc -Wall -W -pedantic test.c -o test # # To force a recompile, you can first enter "make clean" on the # command line. Since there are no dependencies, the command will # always be executed. # Note again that the command part (the rule) has a tab at the # beginning. Not spaces: a real tab # clean: rm -f test

When projects have a lot of source files, makefiles are usually created to compile individual source files into object files. A rule is made to combine objects into the target (executable) file. If only one source file is changed, then only that one is recompiled and its new object file is combined with the other, unchanged, ones to make the target.

Quote:
Originally Posted by lavaka

Third problem: the code doesn't work. SOMETIMES I get a seg fault, but usually it works.

My biggest quarrel is that you never test input to see if it's valid. Too many things in your program depend on input being exactly right. Always (yes, always) test for valid input as much as is humanly possible (or programatically possible).

Also you have several places where it depends on the very large number of input files. Why not make it so that you can start and test with smaller files and then go for the big stuff?

For example
CPP / C++ / C Code:
  n = 0;
  while(!feof(fp)) {
      fscanf(fp, "%d %d",&i,&i);
      n++;
  }
  fclose(fp);  nTest = n-1;

Your kludgy correction at the end points out that it always goes through the loop one time extra. (That's because the eof flag is set only after you have tried to read past the end of the file.)

I would recommend something like this:

CPP / C++ / C Code:
        n = 0;
        while ( fscanf(fp, "%d %d", &i, &i) == 2) {
            n++;
        }
        fclose(fp);
        nTest = n;

This gives the correct value for the number of items that can be read correctly from the file.

Then later you can do the same thing again, and even test for consistency:
CPP / C++ / C Code:
    fp = fopen("input.dat", "r");
    for (i = 0; i < nTest; i++) {
        if (fscanf(fp, "%d %d", testMovies + i, &testUsers[i]) != 2) {
            printf("Input error reading file seond time: i = %d\n", i);
            break; /* or maybe exit the program, since something is dreadfully wrong */
        }
    }
    fclose(fp);

The following is particularly onerous to me, since a program manifest constant, MAXUSERS, depends on your entering its value manually after running an external program. To me, that's just wrong-headed.

CPP / C++ / C Code:
 /* ID#'s go up to 2,649,429; there are 480,189 unique users */
 /* I found "MAXUSERS" by using "wc" on the following file   */

 if( (fp = fopen("IDs_sorted.txt","r")) != NULL) {
   for (u=0; u< MAXUSERS; u++) {
	fscanf(fp,"%d", userIDs + u);
   }
   fclose(fp);
 } else {
	printf("Error: can't find IDs_sorted.txt\n");
 }

I suggest that you find MAXUSERS in the program itself, instead of things like
CPP / C++ / C Code:
#define MAXUSERS 480189
.
.
.
   short int	*userMovies[MAXUSERS];		/* now, sorted by user	 */
   short int	*userRatings[MAXUSERS];  
   int		userIDs[MAXUSERS];
   short int	userLengths[MAXUSERS];
.
.
.
You would use malloc to get the arrays that you need instead of declaring them. That also means that you could test the program with shorter files and then run it with larger and larger numbers of users without undergoing the error-prone steps of manually entering a new number each time, based on wc on the input file(s).

Always test the return value of scanf to make sure it was able to find what it was expecting.
Quote:
Originally Posted by lavaka
Fourth problem: in my loop (I mention it in the comments) from m = 1 to 17770, it seems like "m" should end at the value 17770.

How does the loop work? It initializes the value to 1. It sees if the value is less than or equal to 17770. If it isn't then executes the loop and increments the value of m. Then it tests to see if m is less than or equal to 17770. If it isn't, it executes the loop and increments the value of m. It continues this way until m is no longer less than or equal to 17770. At this time (that is when n is equal to 17771) it exits the loop. Since this is a nicely structured loop, there is no other way that it can exit the loop, so what the heck value could it have other than 17771?
Quote:
Originally Posted by lavaka
Fifth question: I use this on different computers, and the files are in different locations. What's an easy way to be able to have the program automatically know where to find the data (assuming it's only on a handful of computers, and I can do a "case" or something)? Can you define a string in the preprocessor?
You mean like a path name?

Yes. You can define it in the program (with a "#define" statement) or you can define it on the compiler command line (in the makefile, for example) or, probably the best choice, you can use a command line argument when you invoke the program.


Ok, thanks to anyone who answers. It's much appreciated. The code is in two files: read_data.c and netflix.h (is this good practice to separate the files? I'm assuming that you might want to expand the code later). In general, most people put various definitions (typedefs, struct definitions, things with #define) into a common file. Function prototypes and external variable definitions are also typically put into header files. It's good to be thinking about future needs when you start, and you can always change the details as the project grows.


Regards,

Dave

Footnote: It really doesn't make sense to me to create a program that can't be tested with a small input data base. Writing a program, no matter how simple, that only works with 17,700 files is a recipe for madness. Writing a program that requires it to be recompiled when the size of the data base changes also doesn't make any sense at all. These comments are only my opinions, of course, but you did ask.
  #4  
Old 11-Jun-2007, 10:19
lavaka lavaka is offline
New Member
 
Join Date: Jun 2007
Posts: 5
lavaka is on a distinguished road

Re: malloc/free usage, and general good programming behavior


Dave, I did ask, so thanks a lot for the lengthy response. I will take most of your suggestions. When I started the code, I did actually test on fewer than 17770 files -- I changed the NMOVIES (by hand) to something like 20.

One of your suggestions is that I find the value of a variable like "NMOVIES" in the program, and then dynamically allocate. The reason I was hesitant is that for in code dealing with large datasets, it might take a while (everytime I run the program) to find the value of the variable. But you're right, for robustness, I should change this.

I did mean "makefile". I think I have a good understanding of their purpose and usefulness, but it's the syntax that confuses me. Here's the one I'm using:

Python Code:
# what do these mean?
.KEEP_STATE:
.SUFFIXES: .c .f .o

##### MACROS #####
# this I understand
LIBS        = -lm

# and this I understand as well
#CFLAGS = -O3
CFLAGS = -g
CC      = gcc

# this?
INCDIR  =

OBJECTS  = read_data.o

##### RULES #####

# what exactly is the syntax format here?
.c.o: netflix.h
        $(CC) -c $(INCDIR) $(CFLAGS)  $*.c
.f.o:  ; f77  -c -O3  $*.f

##### TARGETS #####
# -w inhibits warning messages

all:  $(OBJECTS) makefile netflix.h
        $(CC) $(OBJECTS) $(CFLAGS) $(LIBS) -o netflix

# this?
default:
        @echo "Specify a target configuration"
clean:
        -rm *.o *~ netflix
        #-rm *.o *~ st2
# this?
targets: $(PROGS)
  #5  
Old 11-Jun-2007, 11:29
davekw7x davekw7x is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Left Coast, USA
Posts: 4,520
davekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to behold

Re: malloc/free usage, and general good programming behavior


Quote:
Originally Posted by lavaka
I did mean "makefile". I think I have a good understanding of their purpose and usefulness, but it's the syntax that confuses me.

There are two issues:

1. How can I create and use a helpful Makefile?
Answer: Look for examples

2. How can I understand the examples?
Answer: Look at the GNU documentation. Start here: http://www.gnu.org/software/make/.

Click here: www.gnu.org/software/make/manual/

Follow links to download or browse. It's really good!

Code:
# what do these mean? .KEEP_STATE:
Look it up in the manual

Code:
.SUFFIXES: .c .f .o
That means that source for rules will be "something.c", "something.f", or "something.o". There are defaults that you can read about in the manual.
Code:
# this? INCDIR =
Sinde some rules in this makefile involve $INCDIR, you might want to set this equal to something if it happens that you need #include files on something other than the default paths for your installation
Code:
OBJECTS = read_data.o
This defines a variable named OBJECTS; its value consists of the name of the read_data object file
Code:
# what exactly is the syntax format here? .c.o: netflix.h $(CC) -c $(INCDIR) $(CFLAGS) $*.c
This says that if you have "something.c" and you have a target "something.o" here's the way to get it:

$(CC) is the value of the CC variable set somewhere (either in this makefile or an environment variable named CC, or for GNU, the default is gcc if it is not set anywhere else.
$(INCDIR) is the value of INCDIR set somewhere.
$(CFLAGS) is the value of CFLAGS set somewhere.
$*.c means c source files.

So the rule given by this expression to make "something.o" is
gcc -c -g something.c

The rule will be invoked whenever "something.c" is changed or whenever one of the dependencies is changed (in this case the dependency is netflix.h).

As a side note: I usually turn on compiler warnings so that the compiler can not only check for fatal errors, but can find certain "common programmer missteps" that might signify something amiss. So I would set cflags to -Wall -W -pedantic. the -g means to put debugging information into the object file also.

Since you have a target that depents on OBJECTS and OBJECTS is set to read_data.o, then every time that netflix.h is changed or read_data.c is changed, this rule will be executed.
.
Code:
all: $(OBJECTS) makefile netflix.h $(CC) $(OBJECTS) $(CFLAGS) $(LIBS) -o netflix
This is silly (not wrong; just silly).

The first target will be processed is something named "all". But it doesn't create anything named "all", so its rule will be executed each and every time you enter "make" from the command line (assuming that this is called "makefile"). Even if nothing is changed, it will execute this rule (thus kind of missing the point of having a makefile in the first place.).

I would expect something like the following, since the target is "netflix"

Code:
all: netflix netflix: $(OBJECTS) makefile netflix.h $(CC) $(OBJECTS) $(CFLAGS) $(LIBS) -o netflix

So "make" will be the same as "make all" from the command line, but this will be the same as if you had entered "make netflix", since netflix is a dependancy. (You don't really need the all:, but I like it as a matter of style. Also, it is possible to have more than one target that you want to make by default, and it's handy to have a default target named all)

Now the rule (the thing that links all of the object files together to create the executable) will be executed only of one of the dependencies changes. That is, if you change the makefile iteslf, or if you change netflix.h, this rule will be invoked. If something happend to make any of the OBJECTS change (like a change in its source file), then the OBJECT will be made first and then this rule will be invoked.

I'll leave the rest for your document-browsing pleasure. If you have any questions, then ask.

Now, there are a few verrrry important things about makefiles:

1. Rules are always introduced with a tab character, not spaces. You can't always tell in your editor, but GNU make will complain if you forget. (vim and gvim users can enter ":set list" and they will see the difference beteween spaces and tabs. Then ":set nolist" returns to the "normal" view).

2. Unlike C and C++ The makefile is processed as a whole, not sequentially. That is, you can define something anywhere in the file and it will be defined everywhere. The only thing that depends on its order of apperance is that the first target will be the default. (That is, just entering "make" on the command line will make the first target that appears in the file.)

3. GNU make has a lot of defaults that may make it unnecessary to have everything that you see in every make file, but these defaults have changed from version to version over the years, and they may change again, so many people put a lot of stuff that you may not really need. It's good to try to understand what is there, and I appreciate your inquisitiveness.

4. If you have a file named "something.c" and you don't have a makefile, then you can enter "make something" and GNU make will execute the following rule:

$(CC) $(CFLAGS) something.c -o something

If you have environment variables CC and CFLAGS set, it will use their values. If you con't have CC set, it will use the default cc (which is probably a symbolic link to gcc; at least it is on my various linux systems).

Note that using the "no makefile" make doesn't allow you to combine more than one source file and it doesn't allow for dependencies on header files, but it's handy.

In my .bashrc, I have an alias for setcflags that looks like this:

Code:
alias setcflags='export CXX=g++;export CXXFLAGS="-Wall -W -pedantic";export CC=gcc;export CFLAGS="-Wall -W -pedantic"'

Then if I first execute "setcflags", I can make C or C++ programs just by typing "make hello.c" or "make hello.cpp" or whatever.

Since GNU make, by default, echos the rules it is executing, I can tell what is being done, whether I use a makefile or I use the "no makefile" make.

Regards,

Dave
  #6  
Old 14-Jun-2007, 09:14
WaltP's Avatar
WaltP WaltP is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Midwest US
Posts: 3,230
WaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to all

Re: malloc/free usage, and general good programming behavior


Since Dave helped on the code very well, I'll mention a few things about formatting (see this link) Since formatting is a very personal style, take what you like, ignore the rest...


My preference is 4 space indent. I set up my editor to tab 4 spaces and convert all TABs into SPACEs. This makes the code easiest for me to follow, and we keep the multiples of 4 for all indentation.


CPP / C++ / C Code:
/* Fri June 8th 2007     		*/
/* Posted to gidforums Sun June 10th	*/

#include "netflix.h"

#define NMOVIES 17770
#define MAXUSERS 480189

int main(int argc, char *argv[]) 
{
   /* "input.dat" contains a list of users and movies, but no ratints	*/
   /* the goal is to estimate those ratings as best as possible		*/
   /* For now, the actual ratings are in the data, so I can determine the error	*/

   /* I don't know how long "input.dat" is, so I'll dynamically allocate memory */
   int		*testMovies, *testUsers ; 	/* from input.dat	*/
   float 	*testPredictions;		/* Estimated Ratings 	*/
   int		*testTrueRatings;		/* Actual Ratings 	*/

   /* This is where I store the known data; each movie file contains	*/
   /* a list of user/rating pairs, so this is a giant, 			*/
   /* double, jagged array.						*/
   int		*movieUsers[NMOVIES+1] ;  	/* this cannot be a short */
   short int	*movieRatings[NMOVIES+1] ;
   int		movieLengths[NMOVIES+1];
   float	movieMeans[NMOVIES+1], movieVars[NMOVIES+1];
I like to line up the comments to make them easier to follow:
CPP / C++ / C Code:
int main(int argc, char *argv[]) 
{
    /* "input.dat" contains a list of users and movies, but no ratints  */
    /* the goal is to estimate those ratings as best as possible        */
    /* For now, the actual ratings are in the data, so I can determine  */ 
    /*    the error  */

    /* I don't know how long "input.dat" is, so I'll dynamically        */
    /*     allocatememory */
    int      *testMovies, *testUsers ;      /* from input.dat           */
    float    *testPredictions;              /* Estimated Ratings        */
    int      *testTrueRatings;              /* Actual Ratings           */

    /* This is where I store the known data; each movie file contains   */
    /* a list of user/rating pairs, so this is a giant, double, jagged  */
    /* array.                                                           */
    int      *movieUsers[NMOVIES+1] ;       /* this cannot be a short   */
[EDIT] I see you did line them up, but they got lost in the flurry of TABs since my editor is set differently than yours. A good reason to switch to SPACEs. All editors handle SPACEs the same, but not TABs.
[/EDIT]


CPP / C++ / C Code:
 /* ******  GET PROBE INFO *******		    	*/
 /* Read "input.dat" to get test user and movies  	*/
 if ( (fp = fopen("input.dat","r") )!=NULL )  {
  /* find out how long the file is */
  n = 0;
  while(!feof(fp)) {
      fscanf(fp, "%d %d",&i,&i);
      n++;
  }
  fclose(fp);  nTest = n-1;
 } else   {
    printf("You need to provide 'input.dat' in the current directory\n");
    exit(1);
 }
 printf("The length of the probe file is %d\n", nTest);
Here you lost your indentation... dropped to 1 space. Be consistent! Always add indentation after each '{', then remove the indentation before each '}'
CPP / C++ / C Code:
    /* ******  GET PROBE INFO *******              */
    /* Read "input.dat" to get test user and movies    */
    if ( (fp = fopen("input.dat","r") )!=NULL )  
    {
        /* find out how long the file is */
        n = 0;
        while(!feof(fp)) 
        {
            fscanf(fp, "%d %d",&i,&i);
            n++;
        }
        fclose(fp);  nTest = n-1;
    } 
    else   
    {
        printf("You need to provide 'input.dat' in the current directory\n");
        exit(1);
    }
    printf("The length of the probe file is %d\n", nTest);
Also note I prefer the braces on separate lines. They are easier to find as you scan thru the code looking for the beginning/end of a code block. Especially with the else clauses.


CPP / C++ / C Code:
 fp = fopen("input.dat","r");
 for (i=0; i<nTest; i++)
  	fscanf(fp, "%d %d", testMovies + i, &testUsers[i]);
 fclose(fp);
 if( (fp = fopen("IDs_sorted.txt","r")) != NULL) 
Spacing in a line -- use whitespace where it makes the line more readable, but not where it's harder to see:
CPP / C++ / C Code:
    // add more SPACEs
    for (i = 0; i < nTest; i++)
    {
        // Remove some SPACEs
        fscanf(fp, "%d %d", testMovies+i, &testUsers[i]);  
    }
    fclose (fp);
    // moved the space between the ('s
    if ((fp = fopen("IDs_sorted.txt","r")) != NULL)     
Space after each command and before the (. And always put {/}s around ifs and loop bodies. It helps if you need to add a couple lines of code later, especially debug code.


CPP / C++ / C Code:
     movieUsers[m]   = (      int*) calloc( n, sizeof(      int) );
     movieRatings[m] = (short int*) calloc( n, sizeof(short int) );
I like the way you lined up the ints here.


CPP / C++ / C Code:
  /* the output of the above line is 
   * 17771 17770
   * but it SHOULD be 
   * 17770 17770  i.e. how did "m" get incremented one more time??  */
I also like this form of commenting -- the only difference is I put the */ on the next line...


CPP / C++ / C Code:
printf("Hit any key to terminate  ");
scanf("%s",fname);
Never, never use scanf() to read string values!!! Here's why! The rest of this scanf() series is also worth knowing.

Plus, you can't hit any key using scanf() -- you must hit ENTER.

Hope this helps...
__________________

Cow: You're a lawyer too?
Mooseblood (mosquito): Ma'am, I was already a bloodsucking parasite. All I needed was a briefcase!
  #7  
Old 14-Jun-2007, 17:27
dlp dlp is offline
Junior Member
 
Join Date: May 2006
Posts: 82
dlp will become famous soon enough

Re: malloc/free usage, and general good programming behavior


Quote:
Originally Posted by WaltP
Plus, you can't hit any key using scanf() -- you must hit ENTER.
This is pretty much true of any ANSI C function that pulls input, right?
  #8  
Old 14-Jun-2007, 19:50
WaltP's Avatar
WaltP WaltP is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Midwest US
Posts: 3,230
WaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to all

Re: malloc/free usage, and general good programming behavior


Quote:
Originally Posted by dlp
This is pretty much true of any ANSI C function that pulls input, right?
Right......
__________________

Cow: You're a lawyer too?
Mooseblood (mosquito): Ma'am, I was already a bloodsucking parasite. All I needed was a briefcase!
 

Recent GIDBlogUpdates On The All New Toyota VIOS - Part III by Nihal

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 19:01.


vBulletin, Copyright © 2000 - 2008, Jelsoft Enterprises Ltd.