Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Monday, February 16, 2015

Get the metadata of pdf files in python

Often we would like to retrieve the meta data that is stored for a given pdf here I show you two ways to do that : using pyPdf and pdfminer.

1) pyPdf
Install pyPdf using pip

then use this code:
from pyPdf import PdfFileReader
pdf_toread = PdfFileReader(open("doc2.pdf", "rb"))
pdf_info = pdf_toread.getDocumentInfo()
print str(pdf_info)
Also you might not get all the meta data that you like for instance in my case I was looking for number of page. if you check the functions in pdf_toread you can find right method. for example: 
print pdf_toread.getNumPages()


2) pdfminer 
install odfminer using pip and then:

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

fp = open('diveintopython.pdf', 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)

print doc.info  # The "Info" metadata



Sunday, February 15, 2015

List assignment in python

Normally assignment with = does not make a copy of the list. In fact assignment makes two variables to point to same location in mamory.

colors = ['red', 'blue', 'green']
  print colors[0]    ## red
  print colors[2]    ## green
  print len(colors)  ## 3
list of strings 'red' 'blue 'green'
b = colors   ## Does not copy the list
both colors and b point to the one list

---------------------------------------------------
so to copy the list you have different options :

you can slice it :
new_list = old_list[:]
OR

use generic copy.copy:
import copy
new_list = copy.copy(old_list)
OR

use built in function list():
new_list = list(old_list)

OR if the list contains object use generic copy.deepcopy()
import copy
new_list = copy.deepcopy(old_list)

Obviously the slowest and most memory-needing method, but sometimes unavoidable.

!!!! NOTE :
But be aware that copy.copy(), list[:] and list(list), unlike copy.deepcopy() and the python version don't copy any lists, dictionaries and class instances in the list, so if the originals change, they will change in the copied list too and vice versa.

Wednesday, January 21, 2015

How to Install Python 3 and PyDev

A nice way of installing python3 with pip3 installed is illustrated here :

how-to-install-python-3-and-pydev-on-osx

The link uses brew that takes care of installation.
but if you want to directly have it after installing python3 you need to have pip3 to download the version 3 of the packages:
To install that for example, in ubuntu  use :
 sudo apt-get install python3-pip  

and to have easy_install version 3  use:

udo apt-get install python3-setuptools

Tuesday, January 6, 2015

Some Useful Libraries to Play with Excel

Here are some options to choose from:
  • xlwt (writing xls files)
  • xlrd (reading xls/xlsx files)
  • openpyxl (reading/writing xlsx files)
  • xlsxwriter (writing xlsx files)
If you need to copy only data (without formatting information), you can just use any combination of these tools for reading/writing. If you have an xls file, you should go with xlrd+xlwt option.
Here's a simple example of copying the first row from the existing excel file to the new one:
import xlwt
import xlrd

workbook = xlrd.open_workbook('input.xls')
sheet = workbook.sheet_by_index(0)

data = [sheet.cell_value(0, col) for col in range(sheet.ncols)]

workbook = xlwt.Workbook()
sheet = workbook.add_sheet('test')

for index, value in enumerate(data):
    sheet.write(0, index, value)

workbook.save('output.xls')

Sunday, December 21, 2014

Installed package information in python

When you installed a package using pip or easy_install, show command can tell you where you package is installed. Normally it goes to lib folder in python path.

pip show [options] <package>

Example:
pip show pymongo
Name: pymongo
Version: 2.7.2
Location: /Users/anaconda/lib/python2.7/site-packages

-----------

'List' command will show all the installed packaged.
pip list
or
pip freeze



Wednesday, June 4, 2014

different signs(marker types) to draw points and lines in plot function in python

Here is a list of option you need to provide for plot function to draw what you want:

example: plt.plot(range(10), linestyle='--', marker='o', color='b')

================    ===============================
character           description
================    ===============================
``'-'``             solid line style
``'--'``            dashed line style
``'-.'``            dash-dot line style
``':'``             dotted line style
``'.'``             point marker
``','``             pixel marker
``'o'``             circle marker
``'v'``             triangle_down marker
``'^'``             triangle_up marker
``'<'``             triangle_left marker
``'>'``             triangle_right marker
``'1'``             tri_down marker
``'2'``             tri_up marker
``'3'``             tri_left marker
``'4'``             tri_right marker
``'s'``             square marker
``'p'``             pentagon marker
``'*'``             star marker
``'h'``             hexagon1 marker
``'H'``             hexagon2 marker
``'+'``             plus marker
``'x'``             x marker
``'D'``             diamond marker
``'d'``             thin_diamond marker
``'|'``             vline marker
``'_'``             hline marker
================    ===============================

Tuesday, April 29, 2014

Plot CDF in Python

perhaps the most easy way of plotting the cumilative distribution function in python:


import numpy as np
import statsmodels.api as sm # recommended import according to the docs
import matplotlib.pyplot as plt

sample = np.random.uniform(0, 1, 50)
sample=[1,2,2,3,2,3,3,3,3,3,3,4,4,4,4,4,60,3,3,3,10]
ecdf = sm.distributions.ECDF(sample)

dsum = sum(sample);
normalized_data = sample/dsum;

x = np.linspace(min(sample), max(sample),10000)
print x
y = ecdf(x)
plt.step(x, y)

plt.show()


Here is the plot:



Sunday, April 27, 2014

Fitting data with SciPy using curve_fit

Suppose that you have a data set consisting of Throughput vs Latency data for your experiment. We’ll start by importing the needed libraries and defining a fitting function. in this example we define a quadratic function. you may define your own function:

from scipy.optimize import curve_fit
def fitFunc(x, a, b, c):
    return a*(x**2)+b*x + c

now we create some points:
y=[25.6, 31.21, 36.82, 42.43, 44.67, 46.91, 48.04, 49.15, 51.2] 
x=[26.0, 41.0, 50.0, 62.5, 69.0, 74.0, 75.050000000000182, 109.0, 98.5]

The scipy.optimize module contains a least squares curve fit routine that requires as input a user-defined fitting function (in our case fitFunc ), the x-axis data (in our case, t) and the y-axis data (in our case, noisy). The curve_fit routine returns an array of fit parameters, and a matrix of covariance data.

tt = np.linspace(min(x),max(x),100)
x=numpy.asarray(x)
y=numpy.asarray(y)
fitParams, fitCovariances = curve_fit(fitFunc, x, y)
print fitParams
print fitCovariances
plt.ylabel('Throughput (%)', fontsize = 16)
plt.xlabel('Latency' , fontsize = 16)
'''plot the real data'''
plt.plot(t,noisy,'b+',markersize=19)   
'''plot the best curving fit'''
plt.plot(tt, fitFunc(tt, fitParams[0], fitParams[1], fitParams[2]))

output:
[-0.00420386  0.88363413  3.96364659]
[[  4.78028309e-07  -6.52166801e-05   1.92964390e-03]
 [ -6.52166801e-05   9.24484129e-03  -2.86615224e-01]
 [  1.92964390e-03  -2.86615224e-01   9.57342339e+00]]

result:


Measure the Running Time of a Piece of Code in Python (Timeit)

Timeit library give you the capability of measuring the run time of a piece of code or function. here is how to use it:

import timeit
start_time = timeit.default_timer()
# code you want to evaluate
print 'Hello world!'
elapsed = timeit.default_timer() - start_time
print 'Time:',elapesd


Here is the result:
Hello world!
Time: 4.19853863036e-05

Linrear Regression in Python using Scipy with plot

Here is a linear regression example using the scipy.stats library.

from scipy import stats
import numpy
x = [5.05, 6.75, 3.21, 2.66]
y = [1.65, 26.5, -5.93, 7.96]
plot(x,y,'b+',markersize=19,label='Real Data')
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
print "Gradient and intercept", gradient, intercept
xx= numpy.linspace(-5,10,100)
z=gradient*xx+intercept
plt.plot(xx,z,'r',label='Linear Regression Function')
plt.legend(loc=2)


Here is the result of running the above code:





How to plot a function using matplotlib

In this example I show how to evaluate a function using numpy and how to plot the result:

import pylab
import numpy

x = numpy.linspace(-15,15,100) # 100 linearly spaced numbers
y = numpy.sin(x)/x # computing the values of sin(x)/x

# compose plot
pylab.plot(x,y) # sin(x)/x
pylab.plot(x,y,'co') # same function with cyan dots
pylab.plot(x,2*y,x,3*y) # 2*sin(x)/x and 3*sin(x)/x
pylab.show() # show the plot


Here is the plot:


Give it a try :)

Thursday, April 24, 2014

Subplot in python using matplotlib

Here is an example of how to create subplot in python:

import matplotlib.pyplot as plt
import numpy as np

fig=plt.figure()
data=np.arange(900).reshape((30,30))
for i in range(1,5):
    ax=fig.add_subplot(2,2,i)        
    ax.imshow(data)

plt.suptitle('Main title')
plt.show() 

Note that: the first two element in subplot(2,2,i) specify the number of subplot and the number of row and column. in this example it the plot has 2 rows and 2 columns. and i is the position of the subplot.
Also the title of the figure is defined by suptitle function .



Here is the result:

enter image description here