Counting files in subdirectories.

ze countOK, it sounds simple, and it probably is if you’re sitting at your desktop with Gnome or KDE fired up. However if you’re looking on a server half way across the world, using the command line its not so easy.

There are a number of tools which are useful in finding out things about your filesystem. ls, du, df are three of them, but sometimes they just don’t give you the information you need. In my case I’m backing up a server to a remote location. The script was timing out becase I was trying to back up too many files at once, so I needed to find the number of files in each subdirectory.

Sounds easy at first, and there are numerous attempts at finding that information around the internet. But none of them did exactly what I wanted.

There were PERL scripts and python scripts, and the minimal

ls -alR | wc -l

which gives the total files under a directory, but not quite. Anyway, after experimenting for a long time, I finally put together the following command in all its glory.

find . -type f | awk -F/ '{ print $2 }' | sort | uniq -c

Lets just go through this, so you can tailor it to  your own needs. Each part of the command passes its output to the next part of the command through the pipe sign ( | ), so we can consider each part separately.

find . -type f . This provides a listing of all the files (ie not directories and symlinks etc) underneath the current directory. You can modify the options to the find command to find other items as well, but I wanted files. This pumps out a list like this.

./dir1/file1

./dir1/file2

./dir2/file1

./dir2/subdir1/file1

./dir2/subdir1/file2

awk -F/ ‘{ print $2 }’ This takes the previous list and splits it into fields using the / sign (-F./). It will then output a list of these. Continuing the example above:

dir1

dir1

dir2

dir2

dir2

sort Sorts the list, as you might expect. There are two reasons for this. First is that find doesn’t always find files in alphabetical order. Second is that the next command needs like terms to be grouped together to count them properly. Anyway, this leaves the list above unchanged in this case.

uniq -c Takes the list above and returns only unique values, with a count in the left margin ie.

2   dir1

3 dir2

… which is exactly what I wanted.

I also used the

du -h --max-depth=1

to find out the sizes of these directories.

Leave a Comment