![]() |
|
#1
|
|||
|
|||
Convert Data TypesHi all,
I am a new C++ user. I need to convert some data from a file format of .c02. Actually the data is stored in this file in a sequectial order, for example first 4 bytes is int, then 8 bytes is char, and then 4 bytes is float and so on. If I open the .c02 file using wordpad, it shows huge nmber of ASCII characters, but if I open it using a hex editor,then I can see the hexadecimal numbrs. Now please help me how I can convert these data. What I need specifically is: 1. How to read the bytes from the .c02 file? Or can I read from the hex editor output? 2. Using C++, how I can convert first 4bytes to unsigned int, then 2 bytes to signed int, then 4 bytes to float, then 8 bytes to char and so on. 3.Lastly, I need to give the output, i.e., converted uint, int, float number, char in a text file or if possible directly in a excel worksheet. Actually this is a very little part of my research study in construction engineering, but unfortunately I have very little knowledge in programming. Please help me... |
|
#2
|
|||||
|
|||||
Re: Convert Data TypesQuote:
Quote:
Quote:
int - 4 byte integer short - 2 byte integer float - 4 byte float char - define an array of char. Do not use it as a string, just a series of characters because you will not have an ending '\0'. To test your compiler, write a small test program that will display the size of each of the above types to make sure of the actual size of each value using sizeof(type) -- type of course is int, short, etc. This is all assuming each and every record in the file is exactly the same size. Quote:
Quote:
__________________
Age is unimportant -- except in cheese |
|
#3
|
|||
|
|||
Re: Convert Data TypesQuote:
Then you have to learn something about computer representation of numbers and how the individual bytes can be written to (and read from) files. It is not sufficient to say that "the first four bytes of a file represents an int". If the file format is one that is widely used (.wav, .bmp, .jpg, etc) you can find the format specification in various places on the web. I usually start with wotsit. If it is not so widely used, then you must get the file specification from whoever (or whatever program) it was that wrote the file. If you can't do that, then you might have to inspect the file a byte at a time to deduce its structure. I'll tell you why. Suppose that an int is a two's complement 32-bit (four byte) quantity for my compiler on my computer (and, it is for all of the compilers that I have on my computers, but that's not universally true). The address of the int refers to a four-byte block of sequential memory locations. "Normal" programs (and their programmers) that use integers to calculate don't have to worry about any of the details, but, since functions that write to and read from disk files have to address the individual bytes, there's more that we need to know. Suppose the memory address of an int is 1000 (assigned by the compiler, and normally of no interest to the programmer). Suppose the integer has a decimal value of 305419896 (0x12345678 hex). Now when a program stores this int in memory there are two ways that are in common use. 1. Big-endian: address 1000 holds 0x12 address 1001 holes 0x34 address 1002 holds 0x56 address 1003 holds 0x78 2. Little-endian: address 1000 holds 0x78 address 1001 holds 0x56 address 1002 holds 0x34 address 1003 holds 0x12 (Be patient; I will get to the file stuff in a minute.) A given C compiler will have its own way of doing things, and for desktop and laptop and workstation systems in common use, the compiler will undoubtedly use whatever endianness is defined for that CPU's architecture. I repeat that if all you are going to do is to use ints for your calculations, you never (ever) have to be concerned. C source code for program written on a big-endian machine has nothing that needs to be concerned with endianness, and the same code can be compiled and executed on a little-endian machine. When we access the individual bytes of any multi-byte data item, then we very definitely must be concerned with the endianness of the machine. When we read the four bytes of an int from a file, we have to know whether the bytes in the file were stored in big-endian order or little-endian order. If we created a file on our machine and we are absolutely certain that we will always read the file from a machine with the same endianness as ours, then we may be able to perform the task with less effort, but in general we will need to know the endianness of the multi-byte items on the file. Here is a program that writes an int to three different files. I am hoping that you can use a hex editor to see what I am trying to say. CPP / C++ / C Code:
When I ran this, I got three files, each of which had a length of (exactly) four bytes. Here are the dumps: The big-endian file Code:
The little-endian file Code:
The default-endian file Code:
As you can see, my machine is little-endian. Now do you see why, in general, if someone tells me that the first four bytes of a file make up an int, I need a little more information? Summary notes: 1. If I read from the file a byte at a time, it is possible to make a program that doesn't depend on the endianness of the implementation (compiler and machine) that is running the program, but I must know the endianness of the file. I also must make sure that the sizes of the data types are consistent. Many programs assume that ints are always four bytes, short ints are always 2 bytes, etc., but different implementations may not all have the same lengths for corresponding data types. 2. If I know that the endianness of the file is the same as the endianness of the machine that is reading the file, it is possible to write a program without knowing what the endianness is. I also would have to know (or assume) that the lengths of the data types are consistent. 3. I have to know "something" more than just which bytes of the file represent what integers. 4. It is important to realize that a file specification may be made independently of machine endianness. For example when I read a .wav file on my embedded system with a big-endian embedded processor, I have to realize that the file itself was written with the integer data type fields written in little-endian order. As a convenient way of working, I can write a C program that can be compiled and tested on my (little-endian) workstation and then take the exact same C source code and use the cross-compiler for my embedded system to create the application. It may not be important to you for a particular application, but you should definitely be aware of the issue(s). Regards, Dave |
|
#4
|
|||
|
|||
Re: Convert Data TypesHi all,
I have checked with my compiler for the sizes of data types. As expected, I have found the size of int is 4 bytes, but in the data structure of the file that is to be converted, there are some data fields of 8 bytes uint. Now, would you please tell me how to read from the file these 4 byte and 8 byte integers? Is there any functions in C++ to read int of 4 byte, int of 8 byte, float of 4 byte etc. from a binary file? Moreover, I checked the Endianness for my machine and the file. In the data structure of the file, I have found that the byte order is Intel 0x0101 (little endian), and my machine is also little endian. Hence, do I need to consider the endianness for multibytes in my coading? Thanks all for your kind attention. Ashim |
|
#5
|
||||
|
||||
Re: Convert Data TypesQuote:
Quote:
__________________
Age is unimportant -- except in cheese |
|
#6
|
|||
|
|||
Re: Convert Data TypesQuote:
What compiler? What operating system? Quote:
All compilers to which I have access (Borland, Microsoft, GNU) have a 64-bit int data type. But the name of the type is different and the way that C programs read them is different. That's why I suggest that people who get into things like this should always tell us what compiler and what operating system they are using (but I understand it that you didn't think to do so). For example with GNU gcc for Windows or Linux, that compiler supports the C99 Standard, which defines the names the data types and where to find them. Now, the C++ standard doesn't have the same data types (and headers), but the GNU g++ compiler supports them. To write something in "machine-endianness" (little-endian machines write little-endian files) you can either write a byte at a time, or you can write the datum with a single statement: CPP / C++ / C Code:
After running the program, the file length was 8, and I looked at the bytes with the "od" program: Code:
Quote:
The code that I showed works to create a file that has the same endianness as the platform that generated the file. Similar code (using read() instead of write()) will read such a file without any further effort concerning endianness. Since your project, apparently is about reading a particular file format on a particular machine, you can try anything you want and see if it works. Then draw your conclusion and attach the results to your thesis and get on with your life. Usually I try to get people to get into the habit of writing programs that are as portable as possible. Since programs that require specific lengths of data types are only portable the program is in C and their compiler supports C99 data types, and many of the people here use compilers that don't do this, whatever advice that I can give you may not be as "portable" as I would like. We here can address your specific problem if you give us a little more information (mainly: tell us what compiler you are using). Regards, Dave |
|
#7
|
|||
|
|||
Re: Convert Data TypesHi,
I am using Microsoft Visual C++6.0, and the operating system is MS Windows Xp. Actually I have tried to read from the file using 'fread', and now I am able to read. But, the main problem is that the program is reading as chars, although I am declearing the variables as unsigned int (See NOTE 1 in Coading given below). When I use 'fwrite' to give output into another file, it just gives the output as chars only i.e., same as input file. Now, I need to know how to give the output as my decired tyes, i.e. unsigned int. The program is attached below. CPP / C++ / C Code:
Would you please tell me why this output is as char? Why not as unisgned int, and how I can get it as unsigned int? Thank you. Ashim Last edited by cable_guy_67 : 10-Aug-2006 at 06:24.
Reason: Please surround your code with [cpp] ... [/cpp]
|
|
#8
|
|||
|
|||
Re: Convert Data TypesQuote:
I'm not sure what you expect. I ran the program with the following input file. (I use the "octal dump" program, od, to display the hexadecimal bytes of the entire file: Code:
0x01020304 0x05060708 0x090a0b0c 0x0d0e0f00 0x11223344 The output file looks like this: Code:
This is a binary file, right? In general I wouldn't know what to make of it, but by looking at the code that created it, I see that that appears to be: a binary little-endian integer 0x01020304 a char 0x2c (the ascii representation of ',') a binary little-endian integer 0x05060708 a char 0x2c a binary little-endian integer 0x090a0b0c a char 0x2c a binary little-endian integer 0x0d0e0f00 a char 0x2c a binary little-endian integer 0x11223344 a char 0x2c Isn't this exactly what you told it to do? (Normally we don't use delimiters, such as ',' for binary files, but that is what your program created.) If you want to write integers (or anything else) in human-readable chars (ascii) instead of binary values, then I suggest formatted output. So if you want to see decimal values in the output file, you could use fprintf(fp2, "%u%c", out_temp, space);, for example instead of write(..) By changing this, you can open the output file with a text editor (or read it as a CSV file into an Excel spreadsheet, or whatever...) If you make the chang that I suggested (printing formatted decimal values, separated by commas), you might see something like Code:
Is this what you had in mind? Regards, Dave |
Recent GIDBlog
Meeting the local Iraqis by crystalattice
| Thread Tools | Search this Thread |
| Rate This Thread | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| [Include] Doubly-linked List | dsmith | C Programming Language | 6 | 14-Apr-2006 13:12 |
| Strange C++ code memory leakage problem | gaoanyu | C++ Forum | 7 | 04-Nov-2005 08:09 |
| Mrs | stacy12 | C Programming Language | 14 | 05-Feb-2005 18:02 |
| User defined data types | Kareem1984 | C Programming Language | 1 | 06-Oct-2004 18:26 |
| [CONTEST?]Data Structure Test | dsmith | C Programming Language | 2 | 06-Jun-2004 15:13 |
Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The