Generate or Create a Large text file in Linux

This tutorial explains how to generate or create files of different sizes (such as; large text file, empty file, 100 MB file, 1 GB file, specific size file, and custom data file) in Linux for testing and debugging purposes.

Linux offers several commands for creating, manipulating and managing text files. Broadly, we can categorize these commands in two types; regular text manipulating commands and system text manipulating commands.

Regular text manipulating commands

These commands are used to create and maintain regular text files. These commands are optimized for manipulating the text easily. These commands include several features and functions to make the text editing easier. These commands are also known as the text editors. Some popular text editors are the VI, Vim, Emacs, Nano, Gedit, and Pico.

System text manipulating commands

These commands are used to create or generate the dummy or sample files. These commands are optimized for creating any size of text file speedily. These commands don\’t include any feature or function to manipulate the text. Some popular system text manipulating commands are the dd, yes, fallocate, truncate, and touch.

Which type of command should I use to create a text file?

It depends on the requirement. If you want to store some meaningful data in the text file, use the regular text manipulating command. If you want to create a file for testing and debugging purposes, use the system text manipulating command.

Let\’s take a simple example. Suppose, you want to store some products\’ information in a text file. In this case, you should always use a regular text editor. But, if you want to create a 100Mb or 1GB text file for testing, you should never use a regular text editor.

Creating a large text file for testing from the regular text editor is only a waste of time. For example, suppose, you need a 1GB text file to test an archive utility. If you create this file from the regular text editor, you may have to spend more than one hour in adding, creating, and copying the useless text in this file. But, if you use a system text manipulating command, you can create this file in less than a minute.

In the below section, we will understand how to use the system text manipulating commands to create files of any size for testing and debugging purposes.

Creating a large text file of the specific size

To create a large text file of the specific size, we can use the dd command.
The dd command takes four arguments; source, destination, block-size, and counter.
It reads the source and copies the contents of the source on the destination.

It uses the block-size and the counter to control the copy operation. In easy language,
through the block-size and counter, it allows us to specify the size of the destination file.

It uses the following syntax.

#dd if=[source] of=[destination] bs=[block-size] count=[counter]

For the source, we can use the /dev/zero file. The /dev/zero is a special file in the Linux system.
It contains a null character. Every time, when this file is read, it returns that null character.
The dd command not only can read this file but also can store the returned null character in the specified file.

Through the block-size and counter, we can specify how many times should the dd command performs the read
and store operation. This command accepts the block-size in bytes.

To use any unit in block-size, convert that unit in bytes.

The following table lists two popular units to measure the data.

Units in binary system Units in decimal system
1KiB = 1024 bytes 1KB = 1000 bytes
1MiB = 1024KiB or 1048576 bytes 1MB = 1000KB or 1000000 bytes
1GiB = 1024MiB or 1073741824 bytes 1GB = 1000MB or 1000000000 bytes
1TiB = 1024GiB or 1099511627776 bytes 1TB = 1000GB or 1000000000000 bytes

By default to measure the data and display the file size, Linux uses units of the binary system.

Once a unit is converted in bytes, use the counter to get the specific size of the file.

The following table lists some examples of how to specify the file size accurately.

File size bs (block-size) Single unit in bytes count (counter) Description
100KiB 1024 100 1024 bytes (1KiB) X 100 = 100KiB
100KB 1000 100 1000 bytes (1KB) X 100 = 100KB
100MiB 1048576 100 1048576 bytes (1MiB) X 100 = 100MiB
100MB 1000000 100 1000000 bytes (1MB) X 100 = 100MB
1GiB 1073741824 1 1073741824 bytes (1GiB) X 1 = 1GiB
1GB 1000000000 1 1000000000 bytes (1GB) X 1 = 1GB
500GiB 1073741824 500 1073741824bytes (1GiB) X 500 = 500GiB
500GB 1000000000 500 1000000000bytes (1GB) X 500 = 500GB

Let\’s take some practical examples to understand it more clearly.

Access root shell and run the following commands.

#mkdir test
#cd test
#dd if=/dev/zero of=500kib-file bs=1024 count=500
#dd if=/dev/zero of=800mib-file bs=1048576 count=800
#dd if=/dev/zero of=3gib-file bs=1073741824 count=3
#dd if=/dev/zero of=100kb-file bs=1000 count=100
#dd if=/dev/zero of=600mb-file bs=1000000 count=600
#dd if=/dev/zero of=2gb-file bs=1000000000 count=2
#ls -lh
# cd ..
#rm -rf test

You can also specify the block-size in a smaller unit, but in that case, you have to increase the value of the counter.
For example, the following command creates a 100GB file using the block-size in MB.

#dd if=/dev/zero of=100gb-file bs=1000000 count=100000

The following image shows a few more examples of the dd command with ouput.

examples of the dd command

If you don\’t want to do the calculation or want a file that contains some custom characters
and lines instead of null characters, you can use the yes command.
The yes command continuously prints the supplied string on the console.
We can store the output of the yes command in a file. To control the size of the file and break the loop of the yes command,
we can use the head command.

To use this approach, use the following syntax.

$yes [text or string] | head -c [size of file] > [name of file]

For example, the following command stores the string \”this is a test line\” in the file named test-file until
the size of the test-file does not become equal to the 100KB.

$yes this is test file | head -c 100KB > test.file.

The head command accepts file size in both units; binary and decimal. You can use any unit, which you like.
To use the decimal unit, use the B suffix. For example, KB, MB, GB, TB, etc.

The following image shows a few more examples of this approach with output.

creating a large file form the yes command

Generating large files

If you do not care about the contents of the file, you can also use the fallocate and the truncate commands.
Instead of writing any data in the file, these commands only manipulate the allocated disk space of the file.
Since these commands do not put any character in the file, you can generate a file of any size in a few seconds.

The differences between both commands are the following.

The fallocate command supports only the btrfs, ext4, ocfs2, and xfs file systems. The truncate command supports all modern file systems.

The fallocate command allocates all of the space to the file without writing a single byte of data in the file.
It means, if you use the fallocate command to create a 20GB file, you will get a file that consumes 20GB actual disk space but contains no data.

The truncate command creates a sparse file instead of the actual file.
The difference between a sparse file and an actual file is that a sparse file doesn\’t consume all allocated space.
It only consumes the space that is used by data.

For example, you created two 50GB files; one from the fallocate command another from the truncate command.
The first file immediately consumes all allocated 50GB space while the second file consumes only the space that is required by the actual data.
Since the truncate command does not put any data in the file, the actual consumed disk space remains unchanged.

Let\’s take one more example. Suppose you have 5 GB disk space and you want to create a file of 10GB for testing.
Since the fallocate command allocates all assigned space immediately, you can\’t create a file of 10GB if you only have 5GB disk space.

In this case, you can use the truncate command. Since the truncate command creates the sparse file and a sparse file does not
consume any disk space until it contains any data, you can easily create a file of 10GB for testing even if you only have 5GB disk space.

Generating files from the fallocate command and the truncate command.

To generate a file from the fallocate command, use the following syntax.

#fallocate -l [size of file] [name of file]

For example, the following commands generate 5GB (named 5-gb-file) and 80GiB (named 80-gib-file) files respectively.

#fallocate -l 5GB 5-gb-file
#fallocate -l 80G 80-gib-file

The following image shows a few more examples of the fallocate command.

examples of the fallocate command

To generate a sparse file from the truncate command, use the following syntax.

#truncate -s [file-size] [name of the file]

For example, the following command generates 100GB (named 100-gb-file) and 10GiB (named 10-gib-file) files respectively.

#truncate -s 100GB 100-gb-file
#truncate -s 10G 10-gib-file

The following image shows a few more examples of the truncate command.

generating large file from the truncate command

Generating an empty file

If you want to create an empty file or a zero size file, you can use the touch command. The touch command, use the following syntax.

#touch [file name]

For example, the following command generates a zero byte file named zero-size-file.

#touch zero-size-file

The following image shows an example of the touch command with the output.

creating empty file from the touch command

That\’s all for this tutorial. If you like this tutorial, please don\’t forget to share it with friends through your favorite social channel.