Tuesday, July 21, 2009

Getting row counts in all excel file by sheet names

The scenario:
You have a number of excel files (xls). In each excel file there various sheets. Each sheet has a header and a number of lines following it. You want an output of the format (csv):
filename1, sheet name 1, row count , sheet name 2, row count, sheet name 3, row count
filename2, sheet name 1, row count , sheet name 2, row count, sheet name 3, row count

Well here is a script that does just that!
You will need python and dexutils both downloadable from : google code
make a folder (i will call it SCRIPT) and make a file getsheetrowcounts.py containing the following data :

import dex
from dex import xlrd

= True

files = dex.glob('./input/*.xls')

= {} # book['filename']=[]

for file in files:
wb = xlrd.open_workbook(file)
file = dex.GetFilenameWithoutExt(file)
sheets = wb.sheet_names()

for sheetname in sheets:
sh = wb.sheet_by_name(sheetname)
rowcount = sh.nrows

if (decrementoneforheader):
rowcount = rowcount-1

# now we have the data
outlines = []
for bookname in book.keys():
line = bookname
rows = book[bookname]
for column in rows:
line = line + "," + column



Next make a folder "input" inside of SCRIPT and place all the excel files in this folder. Finally run getsheetrowcounts.py and you will get the desired output.csv