Tuesday, July 21, 2009

Getting row counts in all excel file by sheet names

The scenario:
You have a number of excel files (xls). In each excel file there various sheets. Each sheet has a header and a number of lines following it. You want an output of the format (csv):
filename1, sheet name 1, row count , sheet name 2, row count, sheet name 3, row count
filename2, sheet name 1, row count , sheet name 2, row count, sheet name 3, row count

Well here is a script that does just that!
You will need python and dexutils both downloadable from : google code
make a folder (i will call it SCRIPT) and make a file getsheetrowcounts.py containing the following data :


import dex
from dex import xlrd

decrementoneforheader
= True

files = dex.glob('./input/*.xls')

book
= {} # book['filename']=[]

for file in files:
wb = xlrd.open_workbook(file)
file = dex.GetFilenameWithoutExt(file)
sheets = wb.sheet_names()

for sheetname in sheets:
sh = wb.sheet_by_name(sheetname)
rowcount = sh.nrows

try:
book[file]
except:
book[file]=[]
book[file].append(sheetname)
if (decrementoneforheader):
rowcount = rowcount-1
book[file].append(str(rowcount))


# now we have the data
outlines = []
for bookname in book.keys():
line = bookname
rows = book[bookname]
for column in rows:
line = line + "," + column

outlines.append(line)

dex.writelines('output.csv',outlines)




Next make a folder "input" inside of SCRIPT and place all the excel files in this folder. Finally run getsheetrowcounts.py and you will get the desired output.csv