GIDForums  

Go Back   GIDForums > Computer Programming Forums > C++ Forum
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 07-Aug-2006, 01:14
ashimce ashimce is offline
New Member
 
Join Date: Aug 2006
Posts: 3
ashimce is on a distinguished road

Convert Data Types


Hi all,

I am a new C++ user. I need to convert some data from a file format of .c02. Actually the data is stored in this file in a sequectial order, for example first 4 bytes is int, then 8 bytes is char, and then 4 bytes is float and so on.

If I open the .c02 file using wordpad, it shows huge nmber of ASCII characters, but if I open it using a hex editor,then I can see the hexadecimal numbrs. Now please help me how I can convert these data.

What I need specifically is:

1. How to read the bytes from the .c02 file? Or can I read from the hex editor output?
2. Using C++, how I can convert first 4bytes to unsigned int, then 2 bytes to signed int, then 4 bytes to float, then 8 bytes to char and so on.
3.Lastly, I need to give the output, i.e., converted uint, int, float number, char in a text file or if possible directly in a excel worksheet.

Actually this is a very little part of my research study in construction engineering, but unfortunately I have very little knowledge in programming. Please help me...
  #2  
Old 07-Aug-2006, 02:09
WaltP's Avatar
WaltP WaltP is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Midwest US
Posts: 3,243
WaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to all

Re: Convert Data Types


Quote:
Originally Posted by ashimce
If I open the .c02 file using wordpad, it shows huge nmber of ASCII characters, but if I open it using a hex editor,then I can see the hexadecimal numbrs. Now please help me how I can convert these data.
This is obviously a binary file -- Notepad will not work at all.

Quote:
Originally Posted by ashimce
1. How to read the bytes from the .c02 file? Or can I read from the hex editor output?
Open the file in binary mode.

Quote:
Originally Posted by ashimce
2. Using C++, how I can convert first 4bytes to unsigned int, then 2 bytes to signed int, then 4 bytes to float, then 8 bytes to char and so on.
Read each record into a structure or class that exactly defines your record.
int - 4 byte integer
short - 2 byte integer
float - 4 byte float
char - define an array of char. Do not use it as a string, just a series of characters because you will not have an ending '\0'.

To test your compiler, write a small test program that will display the size of each of the above types to make sure of the actual size of each value using sizeof(type) -- type of course is int, short, etc.

This is all assuming each and every record in the file is exactly the same size.

Quote:
Originally Posted by ashimce
3.Lastly, I need to give the output, i.e., converted uint, int, float number, char in a text file or if possible directly in a excel worksheet.
For this, write out the data into another file in text mode, each record separated by a comma. The file type should be .csv. Text surrounded by double quotes. This can be read into Excel as a comma-delimited text file.


Quote:
Originally Posted by ashimce
Actually this is a very little part of my research study in construction engineering, but unfortunately I have very little knowledge in programming. Please help me...
__________________

Age is unimportant -- except in cheese
  #3  
Old 07-Aug-2006, 09:17
davekw7x davekw7x is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Left Coast, USA
Posts: 4,712
davekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to behold

Re: Convert Data Types


Quote:
Originally Posted by ashimce
Hi all,

I am a new C++ user. I need to convert some data from a file format of .c02. Actually the data is stored in this file in a sequectial order, for example first 4 bytes is int, then 8 bytes is char, and then 4 bytes is float and so on.

.
.

Actually this is a very little part of my research study in construction engineering, but unfortunately I have very little knowledge in programming. Please help me...

Then you have to learn something about computer representation of numbers and how the individual bytes can be written to (and read from) files.

It is not sufficient to say that "the first four bytes of a file represents an int".

If the file format is one that is widely used (.wav, .bmp, .jpg, etc) you can find the format specification in various places on the web. I usually start with wotsit.

If it is not so widely used, then you must get the file specification from whoever (or whatever program) it was that wrote the file. If you can't do that, then you might have to inspect the file a byte at a time to deduce its structure.

I'll tell you why. Suppose that an int is a two's complement 32-bit (four byte) quantity for my compiler on my computer (and, it is for all of the compilers that I have on my computers, but that's not universally true).

The address of the int refers to a four-byte block of sequential memory locations. "Normal" programs (and their programmers) that use integers to calculate don't have to worry about any of the details, but, since functions that write to and read from disk files have to address the individual bytes, there's more that we need to know.


Suppose the memory address of an int is 1000 (assigned by the compiler, and normally of no interest to the programmer). Suppose the integer has a decimal value of 305419896 (0x12345678 hex).

Now when a program stores this int in memory there are two ways that are in common use.

1. Big-endian:
address 1000 holds 0x12
address 1001 holes 0x34
address 1002 holds 0x56
address 1003 holds 0x78

2. Little-endian:
address 1000 holds 0x78
address 1001 holds 0x56
address 1002 holds 0x34
address 1003 holds 0x12

(Be patient; I will get to the file stuff in a minute.)

A given C compiler will have its own way of doing things, and for desktop and laptop and workstation systems in common use, the compiler will undoubtedly use whatever endianness is defined for that CPU's architecture.

I repeat that if all you are going to do is to use ints for your calculations, you never (ever) have to be concerned. C source code for program written on a big-endian machine has nothing that needs to be concerned with endianness, and the same code can be compiled and executed on a little-endian machine.

When we access the individual bytes of any multi-byte data item, then we very definitely must be concerned with the endianness of the machine.

When we read the four bytes of an int from a file, we have to know whether the bytes in the file were stored in big-endian order or little-endian order. If we created a file on our machine and we are absolutely certain that we will always read the file from a machine with the same endianness as ours, then we may be able to perform the task with less effort, but in general we will need to know the endianness of the multi-byte items on the file.

Here is a program that writes an int to three different files. I am hoping that you can use a hex editor to see what I am trying to say.
CPP / C++ / C Code:
#include <iostream>
#include <fstream>

using namespace std;

int main()
{
    ofstream le_out("le_integer", ios::binary); 
    ofstream be_out("be_integer", ios::binary);
    ofstream def_out("default_integer", ios::binary);

    int x = 0x12345678;
    char *xp = (char *) &x; // set up to write the individual bytes of x
    char y[4];

    int i;

    cout << "sizeof(int) = " << sizeof(int) << endl; // probably four, right?

    cout << "x = " << hex << x 
         << " hex, " << dec << x << " decimal" 
         << endl << endl;

    
    def_out.write(xp, sizeof(int));

    //
    // set up the bytes of y in big-endian order
    //
    y[0] = (x >> 24) & 0xff;
    y[1] = (x >> 16) & 0xff;
    y[2] = (x >> 8)  & 0xff;
    y[3] =  x        & 0xff;

    cout << "Here are the bytes in big-endian order   :  ";
    for (i = 0; i < 4; i++) {
        cout << hex << (unsigned int)y[i] << " ";
    }
    cout << endl << endl;

    be_out.write(y, 4);
    //
    // set up the bytes of y in litle-endian order
    //
    y[0] =  x        & 0xff;
    y[1] = (x >> 8)  & 0xff;
    y[2] = (x >> 16) & 0xff;
    y[3] = (x >> 24) & 0xff;

    cout << "Here are the bytes in little-endian order:  ";
    for (i = 0; i < 4; i++) {
        cout << hex << (unsigned int)y[i] << " ";
    }
    cout << endl << endl;

    le_out.write(y, 4);

    def_out.close();
    be_out.close();
    le_out.close();

    return 0;
}

When I ran this, I got three files, each of which had a length of (exactly) four bytes.

Here are the dumps:

The big-endian file
Code:
$ od -tx1 be_integer 0000000 12 34 56 78

The little-endian file
Code:
$ od -tx1 le_integer 0000000 78 56 34 12

The default-endian file
Code:
$ od -tx1 default_integer 0000000 78 56 34 12

As you can see, my machine is little-endian.

Now do you see why, in general, if someone tells me that the first four bytes of a file make up an int, I need a little more information?

Summary notes:

1. If I read from the file a byte at a time, it is possible to make a program that doesn't depend on the endianness of the implementation (compiler and machine) that is running the program, but I must know the endianness of the file. I also must make sure that the sizes of the data types are consistent. Many programs assume that ints are always four bytes, short ints are always 2 bytes, etc., but different implementations may not all have the same lengths for corresponding data types.

2. If I know that the endianness of the file is the same as the endianness of the machine that is reading the file, it is possible to write a program without knowing what the endianness is. I also would have to know (or assume) that the lengths of the data types are consistent.

3. I have to know "something" more than just which bytes of the file represent what integers.

4. It is important to realize that a file specification may be made independently of machine endianness.

For example when I read a .wav file on my embedded system with a big-endian embedded processor, I have to realize that the file itself was written with the integer data type fields written in little-endian order.

As a convenient way of working, I can write a C program that can be compiled and tested on my (little-endian) workstation and then take the exact same C source code and use the cross-compiler for my embedded system to create the application. It may not be important to you for a particular application, but you should definitely be aware of the issue(s).

Regards,

Dave
  #4  
Old 07-Aug-2006, 22:20
ashimce ashimce is offline
New Member
 
Join Date: Aug 2006
Posts: 3
ashimce is on a distinguished road

Re: Convert Data Types


Hi all,

I have checked with my compiler for the sizes of data types. As expected, I have found the size of int is 4 bytes, but in the data structure of the file that is to be converted, there are some data fields of 8 bytes uint. Now, would you please tell me how to read from the file these 4 byte and 8 byte integers? Is there any functions in C++ to read int of 4 byte, int of 8 byte, float of 4 byte etc. from a binary file?

Moreover, I checked the Endianness for my machine and the file. In the data structure of the file, I have found that the byte order is Intel 0x0101 (little endian), and my machine is also little endian. Hence, do I need to consider the endianness for multibytes in my coading?

Thanks all for your kind attention.

Ashim
  #5  
Old 07-Aug-2006, 23:35
WaltP's Avatar
WaltP WaltP is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Midwest US
Posts: 3,243
WaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to allWaltP is a name known to all

Re: Convert Data Types


Quote:
Originally Posted by ashimce
I have checked with my compiler for the sizes of data types. As expected, I have found the size of int is 4 bytes, but in the data structure of the file that is to be converted, there are some data fields of 8 bytes uint. Now, would you please tell me how to read from the file these 4 byte and 8 byte integers? Is there any functions in C++ to read int of 4 byte, int of 8 byte, float of 4 byte etc. from a binary file?
Are you sure they are 8 byte integers? I would guess that's not quite true. But I've been wrong before. If so, you'll have to check your compiler specs for a 8-byte type -- like long long -- because it's not standard.


Quote:
Originally Posted by ashimce
Moreover, I checked the Endianness for my machine and the file. In the data structure of the file, I have found that the byte order is Intel 0x0101 (little endian), and my machine is also little endian. Hence, do I need to consider the endianness for multibytes in my coading?
No need to worry. They are the same. The easiest way to check endianness IMO is rather than looking around for specs that are probably hard to find and understand is simply read the first 2 or 4 bytes of the file (assuming they aren't all the same value) and see what comes out. If the number displayed makes sense, your endianness is not a problem. Especially if you know what the first value in the file actually is...
__________________

Age is unimportant -- except in cheese
  #6  
Old 08-Aug-2006, 08:15
davekw7x davekw7x is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Left Coast, USA
Posts: 4,712
davekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to behold

Re: Convert Data Types


Quote:
Originally Posted by ashimce

I have checked with my compiler for the sizes of data types.

What compiler? What operating system?
Quote:
Originally Posted by ashimce
fields of 8 bytes uint. Now, would you please tell me how to read from the file these 4 byte and 8 byte integers? Is there any functions in C++ to read int of 4 byte, int of 8 byte, float of 4 byte etc. from a binary file?
It's not a matter of which function; it's a matter of which data type.

All compilers to which I have access (Borland, Microsoft, GNU) have a 64-bit int data type. But the name of the type is different and the way that C programs read them is different. That's why I suggest that people who get into things like this should always tell us what compiler and what operating system they are using (but I understand it that you didn't think to do so).

For example with GNU gcc for Windows or Linux, that compiler supports the C99 Standard, which defines the names the data types and where to find them. Now, the C++ standard doesn't have the same data types (and headers), but the GNU g++ compiler supports them. To write something in "machine-endianness" (little-endian machines write little-endian files) you can either write a byte at a time, or you can write the datum with a single statement:

CPP / C++ / C Code:
#include <iostream>
#include <fstream>
#include <inttypes.h>

using namespace std;

int main()
{
    uint64_t x;
    char *xp = (char *)&x; // address of x, but it's used as a pointer to char
    char *outname = "data.txt";
    ofstream outfile(outname, ios::binary);

    if (!outfile) {
        cerr << "There was a problem opening " 
             << outname << " for writing." << endl;
        return 0;
    }

    x = 0x12345678890abcdeLL;

    outfile.write(xp, sizeof(uint64_t)); // first argument is pointer to char
    outfile.close();

    return 0;
}

After running the program, the file length was 8, and I looked at the bytes with the "od" program:

Code:
$od -t x1 data.txt 0000000 de bc 0a 89 78 56 34 12
Quote:
Originally Posted by ashimce
Hence, do I need to consider the endianness for multibytes in my coading?
It is possible to write code that does depend on endianness and it is possible to write code that does not depend on endianness. Some people write code with no consideration at all for endianness and they just get lucky. Some are not so lucky.

The code that I showed works to create a file that has the same endianness as the platform that generated the file. Similar code (using read() instead of write()) will read such a file without any further effort concerning endianness.

Since your project, apparently is about reading a particular file format on a particular machine, you can try anything you want and see if it works. Then draw your conclusion and attach the results to your thesis and get on with your life. Usually I try to get people to get into the habit of writing programs that are as portable as possible. Since programs that require specific lengths of data types are only portable the program is in C and their compiler supports C99 data types, and many of the people here use compilers that don't do this, whatever advice that I can give you may not be as "portable" as I would like.

We here can address your specific problem if you give us a little more information (mainly: tell us what compiler you are using).

Regards,

Dave
  #7  
Old 10-Aug-2006, 00:32
ashimce ashimce is offline
New Member
 
Join Date: Aug 2006
Posts: 3
ashimce is on a distinguished road

Re: Convert Data Types


Hi,

I am using Microsoft Visual C++6.0, and the operating system is MS Windows Xp.

Actually I have tried to read from the file using 'fread', and now I am able to read. But, the main problem is that the program is reading as chars, although I am declearing the variables as unsigned int (See NOTE 1 in Coading given below). When I use 'fwrite' to give output into another file, it just gives the output as chars only i.e., same as input file. Now, I need to know how to give the output as my decired tyes, i.e. unsigned int.

The program is attached below.

CPP / C++ / C Code:
#include <stdio.h>

main()
{
	unsigned int in_temp, out_temp;   // NOTE 1
	unsigned int *in;
	int i=0;
	char space = ',';
	
	FILE *fp1;
	FILE *fp2;

	fp1=fopen("input.txt","rb");
	fp2=fopen("output.txt","wb");

	for(i=0;i<5;i++)
	{
		fread (&in_temp,4,1,fp1); //sizeof(unsigned int) = 4,tested
		in = &in_temp;	

		out_temp = *in;	
		fwrite (&out_temp,4,1,fp2);
		fwrite (&space,sizeof(char),1,fp2);
	}

	fclose (fp1);
	fclose (fp2);
	return 0;

}


Would you please tell me why this output is as char? Why not as unisgned int, and how I can get it as unsigned int?

Thank you.

Ashim
Last edited by cable_guy_67 : 10-Aug-2006 at 06:24. Reason: Please surround your code with [cpp] ... [/cpp]
  #8  
Old 10-Aug-2006, 08:38
davekw7x davekw7x is offline
Outstanding Member
 
Join Date: Feb 2004
Location: Left Coast, USA
Posts: 4,712
davekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to beholddavekw7x is a splendid one to behold

Re: Convert Data Types


Quote:
Originally Posted by ashimce

Would you please tell me why this output is as char? Why not as unisgned int, and how I can get it as unsigned int?

Thank you.

Ashim

I'm not sure what you expect. I ran the program with the following input file. (I use the "octal dump" program, od, to display the hexadecimal bytes of the entire file:

Code:
$od -t x1 input.txt 0000000 04 03 02 01 08 07 06 05 0c 0b 0a 09 00 0f 0e 0d 0000020 44 33 22 11
(In other words the file contained the little-endian 4-byte ints

0x01020304
0x05060708
0x090a0b0c
0x0d0e0f00
0x11223344

The output file looks like this:

Code:
$od -t x1 output.txt 0000000 04 03 02 01 2c 08 07 06 05 2c 0c 0b 0a 09 2c 00 0000020 0f 0e 0d 2c 44 33 22 11 2c

This is a binary file, right? In general I wouldn't know what to make of it, but by looking at the code that created it, I see that that appears to be:

a binary little-endian integer 0x01020304
a char 0x2c (the ascii representation of ',')
a binary little-endian integer 0x05060708
a char 0x2c
a binary little-endian integer 0x090a0b0c
a char 0x2c
a binary little-endian integer 0x0d0e0f00
a char 0x2c
a binary little-endian integer 0x11223344
a char 0x2c


Isn't this exactly what you told it to do? (Normally we don't use delimiters, such as ',' for binary files, but that is what your program created.)

If you want to write integers (or anything else) in human-readable chars (ascii) instead of binary values, then I suggest formatted output.

So if you want to see decimal values in the output file, you could use fprintf(fp2, "%u%c", out_temp, space);, for example instead of write(..)

By changing this, you can open the output file with a text editor (or read it as a CSV file into an Excel spreadsheet, or whatever...)

If you make the chang that I suggested (printing formatted decimal values, separated by commas), you might see something like
Code:
16909060,84281096,151653132,219025152,287454020,

Is this what you had in mind?

Regards,

Dave
 
 

Recent GIDBlogMeeting the local Iraqis by crystalattice

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Include] Doubly-linked List dsmith C Programming Language 6 14-Apr-2006 13:12
Strange C++ code memory leakage problem gaoanyu C++ Forum 7 04-Nov-2005 08:09
Mrs stacy12 C Programming Language 14 05-Feb-2005 18:02
User defined data types Kareem1984 C Programming Language 1 06-Oct-2004 18:26
[CONTEST?]Data Structure Test dsmith C Programming Language 2 06-Jun-2004 15:13

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 11:22.


vBulletin, Copyright © 2000 - 2008, Jelsoft Enterprises Ltd.