Everything is Broken

Counting files in subdirectories.

ze countOK, it sounds simple, and it probably is if you’re sitting at your desktop with Gnome or KDE fired up. However if you’re looking on a server half way across the world, using the command line its not so easy.

There are a number of tools which are useful in finding out things about your filesystem. ls, du, df are three of them, but sometimes they just don’t give you the information you need. In my case I’m backing up a server to a remote location. The script was timing out becase I was trying to back up too many files at once, so I needed to find the number of files in each subdirectory.

Sounds easy at first, and there are numerous attempts at finding that information around the internet. But none of them did exactly what I wanted.

There were PERL scripts and python scripts, and the minimal

ls -alR | wc -l

which gives the total files under a directory, but not quite. Anyway, after experimenting for a long time, I finally put together the following command in all its glory.

find . -type f | awk -F/ '{ print $2 }' | sort | uniq -c

Lets just go through this, so you can tailor it to  your own needs. Each part of the command passes its output to the next part of the command through the pipe sign ( | ), so we can consider each part separately.

find . -type f . This provides a listing of all the files (ie not directories and symlinks etc) underneath the current directory. You can modify the options to the find command to find other items as well, but I wanted files. This pumps out a list like this.

./dir1/file1

./dir1/file2

./dir2/file1

./dir2/subdir1/file1

./dir2/subdir1/file2

awk -F/ ‘{ print $2 }’ This takes the previous list and splits it into fields using the / sign (-F./). It will then output a list of these. Continuing the example above:

dir1

dir1

dir2

dir2

dir2

sort Sorts the list, as you might expect. There are two reasons for this. First is that find doesn’t always find files in alphabetical order. Second is that the next command needs like terms to be grouped together to count them properly. Anyway, this leaves the list above unchanged in this case.

uniq -c Takes the list above and returns only unique values, with a count in the left margin ie.

2   dir1

3 dir2

… which is exactly what I wanted.

I also used the

du -h --max-depth=1

to find out the sizes of these directories.

This entry was posted on Monday, February 23rd, 2009 at 6:00 pm and is filed under General IT, Linux. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

« Z-star Video Camera working in Ubuntu Intrepid
Ubuntu Firefox’s Tattletale Search Query »

Leave a Reply

CAPTCHA Image CAPTCHA Audio
Refresh Image
  • Recent Posts

    • ClamAV reporting Outdated version
    • Adventures in unbricking a router
    • One Line Guitar Tuner
    • Captcha Madness
    • Ubuntu upgrade 9.04 to 9.10
  • Sing for your supper

  • Static

    • About
    • Privacy Policy
  • Tags

    apathy apple calendar chkconfig collanos cross-platform eee evolution firmware google grub hotspot id card lightning Linux Mandriva notebook partitioning pclinuxos pclinuxos 2008 Philippines power management re-install sane scanner scheduleworld script Security slow sysv-rc-conf thinkpad thunderbird trust ubuntu ultraportable usb virtualisation vmware vpn wifi wireless workspace sharing xsane yahoo zombie
  • Blogroll

    • Datalude
    • Digital Life
    • Engage the World
  • Archives

    • May 2010
    • April 2010
    • March 2010
    • November 2009
    • October 2009
    • June 2009
    • May 2009
    • April 2009
    • February 2009
    • January 2009
    • November 2008
    • October 2008
    • September 2008
    • August 2008
    • July 2008
    • June 2008
    • April 2008
    • March 2008
    • February 2008

Everything is Broken runs on WordPress. Theme by Bob. All content Copyright © Datalude 2008+.