How to read file into char array in C

When you are working in C programming language, you might encounter some problems which require reading a file into character array, like analyzing the frequency of each character, or converting every starting word of all sentences from lower case to upper case, or vice versa. The solution is really easy but probably isn't that simple for people who do not know much about file reading or writing. So in that article, you can learn step-by-step how to read files into character array in C

Open a file in C

The easiest and most popular way to open a file in C is using the 'fopen' function in the format below:

file = fopen(file_name, "open_mode"); //open the "file-name" file and store the file pointer in the variable 'file'

The mode parameter of the 'fopen' function specifies the mode in which the file is to be opened. The mode can be one of the following:

"r": Open file for reading.
"w": Truncate file to zero length or create a file for writing.
"a": Append to file or create a file for writing if it does not exist.
"r+": Open file for reading and writing.
"w+": Truncate file to zero length or create a file for reading and writing.
"a+": Append to file or create a file for reading and writing.

But for some reason, the file may not open properly. To prepare when such a situation like that happens, you should always check the return value of the 'fopen' function to ensure that the file was opened successfully before attempting to read or write to it. Like this:

// If 'fopen' returns NULL,  print an error message and exit the program 
if (file == NULL) {
      printf("Error: Failed to open file '%s'.\n", file_name);
      return 1;
}

Read file contents character by character

Before reading the file, you must have a character array to store file contents in there. Let's do that

char buffer[1000]; //Initialize a char array named 'buffer' with size of 1000

Now, it's time to read the file by using 'fgetc'. This function will read one character in the file every time called, and if called repeatedly it will read each subsequent character until the end. Thus, we can use a while loop to make the process become easier.

int i = 0, c; //c is the intermediate variable, i is the increment variable
while ((c = fgetc(file)) != EOF) {//Read contents until it reach the end of the file
      buffer[i] = c;
      i++;
}

The example above assumes that the file contains only ASCII characters and that the file size is less than 1000 characters.

Resizable buffer

The buffer array that we previously defined is containing a maximum of 1000 characters. But for many situations, the file size is much larger than that. We can solve this problem by turning our buffer into a resizable one. You can use dynamic memory allocation with the 'malloc', 'realloc' and 'realloc' functions provided by the C standard library.

char *buffer = NULL; // initialize buffer to NULL
int buffer_size = 0;
/*Open the file here*/
// Read file character by character
int c, i = 0;
while ((c = fgetc(file)) != EOF) {
    // If buffer is full, resize it
    if (i >= buffer_size) {
       buffer_size += 1000; // increase buffer size by 1000 bytes
       buffer = realloc(buffer, buffer_size); // resize buffer
       if (buffer == NULL) {
          printf("Error: Memory allocation failed.\n");
          return 1;
       }
    }
    buffer[i] = c;
    i++;
}

We use 'realloc' functions in the above code snippet, which proves to be useful because the file size is usually not known in advance. For 'malloc' and 'calloc' functions, they can be used to allocate a block of memory of the specified size to a variable. In this example, you can use like the below:

buffer = (char*)malloc(1000); //is the same as define char buffer[1000]

You probably won't need to use 'malloc' and 'calloc' in this example. We will meet them again later.

File contains non-ASCII characters

In C, a string is represented as a sequence of bytes, and the interpretation of those bytes depends on the character encoding. If the file contains non-ASCII characters, you need to use a character encoding that supports those characters, such as UTF-8 or UTF-16.

For this problem, you should use functions that can handle multibyte characters, such as 'fgetwc' and 'fgetws'. These functions read one wide character (wchar_t) or one wide character string (wchar_t*) at a time, respectively.

Here are some modifications to the code to make it work when the file contains non-ASCII characters:

wchar_t buffer[100];
// Open file for reading
file = fopen(filename, "r,ccs=UTF-8");
// Read file contents
wchar_t c;
int i = 0;
while ((c = fgetwc(file)) != WEOF) {
   buffer[i] = c;
   i++;
}

Also, make sure that the input and output streams are set to the correct encoding to properly display or manipulate the characters. On Unix operating systems like MacOS and Linux, to ensure the output encoding is in UTF-8, you can use the 'setlocale' function:

#include 
int main()
{
    setlocale(LC_ALL, "en_US.utf8");

    // your code here

    return 0;
}

On Windows, you can use the '_setmode' and '_O_U8TEXT' functions to set the output encoding to UTF-8:

#include  //_O_U8TEXT
#include  //_setmode()
int main()
{
    _setmode(_fileno(stdout), _O_U8TEXT);

    // your code here

    return 0;
}

Here's an example of a file containing the Vietnamese word "Xin chào!" (Hello) with accent (which are non-ASCII characters), save in UTF-8 encoding:

Xin chào!

And here is the output of our program after I run it on an online C compiler:

Xin chào!

...Program finished with exit code 0
Press ENTER to exit console.

Read file contents as a whole

If you are not familiar with C then you can skip this step, but I still recommend reading it as an advanced exercise. I want to introduce another way to tackle the "How to read file into char array in C" problem. The new thinking is to not read the file character by character but as a whole, by determining the file size before reading. This is a more complicated solution, but also more effective.

First, you should define the usual variables: file pointer to open file and buffer to contain character array. Remember you also need the file size as well:

FILE *fp;
long file_size;
char *buffer;

Then you can open the file to read:

fp = fopen("example.txt", "r");

To know the size of the file, you can use the 'ftell' function. It will tell the byte location of the current position in the file pointer:
current_byte = ftell(fp);

But wait, the file reading is always start at the beginning of the file. No problem, the 'fseek' function will move the reading control to different positions in the file:

fseek(fp, 0, SEEK_END);

You can get the file size properly now. After that, let's set the reading control to the beginning again to start reading file contents:

file_size = ftell(fp);
rewind(fp); move the control to the file's beginning

// Allocate memory for the char array
buffer = (char*) malloc(file_size + 1);

The use of the 'malloc' function here is pretty straightforward: allocating memories to create an uninitialized char array with the size of (file_size+1) times 1 byte (size of type char).

If you want to use the 'calloc' function, here is how:

buffer = (char*) calloc(file_size + 1, sizeof(char));

The main difference between 'malloc' and 'calloc' is that 'malloc' only allocates memory without initializing its contents, while 'calloc' both allocates and initializes memory to zero. The main advantage of using 'calloc' is that the allocated memory will already be zeroed out, which can be helpful if you plan to use the char array as a string later.

// Read the file into the char array
fread(buffer, file_size, 1, fp);

After creating a buffer, you can read the entire file using the 'fread' function, which takes the file pointer, the size of each element to read, the number of elements to read, and the destination array.

// Add a null terminator at the end of the char array
buffer[file_size] = '\0';

You might be wondering before why there is a need to allocate an extra byte to the "buffer". Why not just (file_size) but (file_size + 1)? Here it is, the null terminator will be added at the end of the char array to indicate the end of the string. Actually, if your only mission is to read a file into an array, then this step is unnecessary. But later if you want to print this array as a string, then this is a requirement. String in C is defined to have the last character as a null terminator '\0'.

Cleanup your code

You have opened and used the file, so remember to close it afterward. Simply use the 'fclose' function to free the "file" pointer variable that you had assigned.

fclose(file);

Talking about freeing pointers, remember the "buffer" array that you used to store characters? If you defined it as an allocated memory (pointer), then it's best to free it now to avoid memory leaks.

free(buffer);

Here is an overview of what your solution should look like:

#include 
#include 

int main() {
   FILE *file;
   char filename[] = "example.txt";
   char *buffer = NULL; // initialize buffer to NULL
   int buffer_size = 0;
   int i = 0;

   //Open file for reading
   file = fopen(filename, "r");

   //Check if file opened successfully
   if (file  == NULL) {
      printf("Error: Failed to open file '%s'.\n", filename);
      return 1;
   }

   // Read file character by character
   int c;
   while ((c = fgetc(file)) != EOF) {
      // If buffer is full, resize it
      if (i >= buffer_size) {
         buffer_size += 1000; // increase buffer size by 1000 bytes
         buffer = realloc(buffer, buffer_size); // resize buffer
         if (buffer == NULL) {
            printf("Error: Memory allocation failed.\n");
            return 1;
         }
      }
      buffer[i] = c;
      i++;
   }

   // Close file
   fclose(file);

   // Print the character array
   printf("%s", buffer);

   // Free the dynamically allocated buffer
   free(buffer);

   return 0;
}

Conclusion

In this guide, we covered step-by-step solutions to read a file into a character array in C: opening the file, creating a buffer variable, storing the entire contents from the file to the buffer, and finally closing the file and freeing all memory.

We also discussed how to solve problems when the file size is too large for our initial buffer to handle, or when the file character encoding is non-standard. We have introduced new concepts and some best examples to use them. This will help you familiarize yourself with these situations, giving you access to a wider range of future-use resources.