Skip to content

A Python script to traverse a directory tree and ZIP all files with a given extension

Back in my days of heavy data generation I ran into issues keeping hard drive use under control. Manually making giant ZIP files of entire trees was not a good solution; the files would be huge and extracting small portions became too slow—not to mention the fact that some of the more popular compression programs out there could generate ZIP files that were too large to open (and thus the data was unrecoverable).

So I wrote this script to traverse a tree and create ZIP files of each extension separately. The script is executed with the first argument being the root directory (full path recommended) enclosed in quotes with the trailing slash omitted.

Compression is courtesy of 7-Zip. I guess I should mention I wrote this for Windows and it will require modification for use elsewhere.

  1. #Superzip.
  2. #Walks a directory tree and zips given extensions, then deletes the originals.
  3. #WARNING
  4. #This version checks after zipping if the ZIP file exists and has a size larger
  5. #than 0. This is the only way it checks to see if there may have been an error
  6. #with the ZIPing.
  7.  
  8. import os, glob, shutil, string, sys
  9.  
  10. if len(sys.argv) <> 2:
  11.     sys.exit(‘First argument must be root directory to be processed.\n\n\n)
  12.  
  13. dirroot = sys.argv[1]
  14.  
  15. fils =[
  16. ##    ‘psd’,
  17. ##    ‘bmp’,
  18. ##    ‘par’,
  19. ##    ‘dat’,
  20. ##    ‘raw’,
  21. ##    ‘tec’,
  22. ##    ‘mpk’,
  23. ##    ‘pdf’,
  24. ##    ‘avi’,
  25. ##    ‘pts’,
  26. ##    ‘dwp’,
  27. ##    ‘cpt’,
  28. ##    ‘trk’,
  29.     ‘txt’
  30. ##    ‘vox’,
  31. ##    ‘tif’
  32. ##    ‘lat’
  33.     ]
  34.  
  35. for dirpath, dirnames, filenames in os.walk(dirroot):
  36.     #For each directory, we look for the files.
  37.     #We never use dirnames or filenames
  38.     for fil in fils:
  39.        
  40.         filext = ‘%s\\*.%s’ % (dirpath, fil)
  41.         if len(glob.glob(filext)) > 0:
  42.             #There are such files in this directory.
  43.             os.chdir(dirpath)
  44.             args = ‘7z a -tzip %s.zip *.%s’ % (fil, fil)
  45.             print dirpath
  46.             print args
  47.             os.spawnl(os.P_WAIT, ‘C:\\Program Files\\7-Zip\\7z.exe’, args)
  48.             if os.path.exists(‘%s.zip’ % fil):
  49.                 statinfo = os.stat(‘%s.zip’ % fil)
  50.                 if statinfo.st_size > 0:
  51.                     cmd = ‘del /Q "%s"’ % (filext)
  52.                     os.system(cmd)
  53.                 else:
  54.                     print ‘File %s.zip has size of 0.’ % fil
  55.             else:
  56.                 print ‘ZIP file does not exist: %s.zip’ % fil
  57.  

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*