You are on page 1of 6

import pandas as pd

import numpy as np

df = pd.read_sas('sample.sas7bdat')
df.head()
df.describe()

SPOUSARR SPOUSE NRESPOUS HHSPOUSE SPOUSEFW SPOUSENF HHKID HHSIB HHPARENT

count 22318.000000 71311.000000 43267.000000 71311.000000 62171.000000 62171.000000 71266.000000 62171.000000 71311.000000

mean 2030.854288 0.606737 0.788014 0.443929 0.152611 0.113719 0.891238 0.022985 0.042798

std 570.785415 0.488478 1.304314 0.496850 0.359615 0.317472 1.378112 0.229908 0.270853

min 1901.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

25% 1983.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

50% 1991.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

75% 1999.000000 1.000000 3.000000 1.000000 0.000000 0.000000 2.000000 0.000000 0.000000

max 9996.000000 1.000000 3.000000 1.000000 1.000000 1.000000 12.000000 9.000000 6.000000

8 rows × 355 columns

df = pd.read_csv('sample.csv')
df.head()
df.describe()

age Medu Fedu traveltime studytime failures famrel freetime goout Dalc

count 395.000000 395.000000 395.000000 395.000000 395.000000 395.000000 395.000000 395.000000 395.000000 395.000000 395.000

mean 16.696203 2.749367 2.521519 1.448101 2.035443 0.334177 3.944304 3.235443 3.108861 1.481013 2.291

std 1.276043 1.094735 1.088201 0.697505 0.839240 0.743651 0.896659 0.998862 1.113278 0.890741 1.287

min 15.000000 0.000000 0.000000 1.000000 1.000000 0.000000 1.000000 1.000000 1.000000 1.000000 1.000

25% 16.000000 2.000000 2.000000 1.000000 1.000000 0.000000 4.000000 3.000000 2.000000 1.000000 1.000

50% 17.000000 3.000000 2.000000 1.000000 2.000000 0.000000 4.000000 3.000000 3.000000 1.000000 2.000

75% 18.000000 4.000000 3.000000 2.000000 2.000000 0.000000 5.000000 4.000000 4.000000 2.000000 3.000

max 22.000000 4.000000 4.000000 4.000000 4.000000 3.000000 5.000000 5.000000 5.000000 5.000000 5.000

df = pd.read_xml('sample.xml')
df.head()
df.describe()

number_of_groups enrollment minimum_age

count 1.0 1.0 0.0

mean 1.0 901.0 NaN

std NaN NaN NaN

min 1.0 901.0 NaN

25% 1.0 901.0 NaN

50% 1.0 901.0 NaN

75% 1.0 901.0 NaN

max 1.0 901.0 NaN

df = pd.read_json('sample.json')
df.head()
df.describe()
sepalLength sepalWidth petalLength petalWidth

count 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.057333 3.758000 1.199333

std 0.828066 0.435866 1.765298 0.762238

min 4.300000 2.000000 1.000000 0.100000

25% 5.100000 2.800000 1.600000 0.300000

50% 5.800000 3.000000 4.350000 1.300000

75% 6.400000 3.300000 5.100000 1.800000

max 7.900000 4.400000 6.900000 2.500000

url = 'Financial Sample.xlsx'

df = pd.read_excel(url)
print(df)
df.describe()

Segment Country Product Discount Band \


0 Government Canada Carretera NaN
1 Government Germany Carretera NaN
2 Midmarket France Carretera NaN
3 Midmarket Germany Carretera NaN
4 Midmarket Mexico Carretera NaN
.. ... ... ... ...
695 Small Business France Amarilla High
696 Small Business Mexico Amarilla High
697 Government Mexico Montana High
698 Government Canada Paseo High
699 Channel Partners United States of America VTT High

Units Sold Manufacturing Price Sale Price Gross Sales Discounts \


0 1618.5 3 20 32370.0 0.00
1 1321.0 3 20 26420.0 0.00
2 2178.0 3 15 32670.0 0.00
3 888.0 3 15 13320.0 0.00
4 2470.0 3 15 37050.0 0.00
.. ... ... ... ... ...
695 2475.0 260 300 742500.0 111375.00
696 546.0 260 300 163800.0 24570.00
697 1368.0 5 7 9576.0 1436.40
698 723.0 10 7 5061.0 759.15
699 1806.0 250 12 21672.0 3250.80

Sales COGS Profit Date Month Number Month Name Year


0 32370.00 16185.0 16185.00 2014-01-01 1 January 2014
1 26420.00 13210.0 13210.00 2014-01-01 1 January 2014
2 32670.00 21780.0 10890.00 2014-06-01 6 June 2014
3 13320.00 8880.0 4440.00 2014-06-01 6 June 2014
4 37050.00 24700.0 12350.00 2014-06-01 6 June 2014
.. ... ... ... ... ... ... ...
695 631125.00 618750.0 12375.00 2014-03-01 3 March 2014
696 139230.00 136500.0 2730.00 2014-10-01 10 October 2014
697 8139.60 6840.0 1299.60 2014-02-01 2 February 2014
698 4301.85 3615.0 686.85 2014-04-01 4 April 2014
699 18421.20 5418.0 13003.20 2014-05-01 5 May 2014

[700 rows x 16 columns]


Manufacturing
Units Sold Sale Price Gross Sales Discounts Sales COGS Profit Date
Price

count 700.000000 700.000000 700.000000 7.000000e+02 700.000000 7.000000e+02 700.000000 700.000000 700 700.

2014-
mean 1608.294286 96.477143 118.428571 1.827594e+05 13150.354629 1.696091e+05 145475.211429 24133.860371 04-28
21:36:00

2013-
min 200.000000 3.000000 7.000000 1.799000e+03 0.000000 1.655080e+03 918.000000 -40617.500000 09-01
00:00:00

2013-
25% 905.000000 5.000000 12.000000 1.739175e+04 800.320000 1.592800e+04 7490.000000 2805.960000 12-24
06:00:00

2014-
50% 1542.500000 10.000000 20.000000 3.798000e+04 2585.250000 3.554020e+04 22506.250000 9242.200000 05-16
12:00:00

2014-
75% 2229.125000 250.000000 300.000000 2.790250e+05 15956.343750 2.610775e+05 245607.500000 22662.000000 09-08
12:00:00

2014-
max 4492.500000 260.000000 350.000000 1.207500e+06 149677.500000 1.159200e+06 950625.000000 262200.000000 12-01
00:00:00

std 867.427859 108.602612 136.775515 2.542623e+05 22962.928775 2.367263e+05 203865.506118 42760.626563 NaN

import pickle
df=pd.read_pickle('data.pkl')
df
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Pytho
n311\site-packages\pandas\io\pickle.py:206, in read_pickle(filepath_or_buffer, compression, storage_options)
205 warnings.simplefilter("ignore", Warning)
--> 206 return pickle.load(handles.handle)
207 except excs_to_catch:
208 # e.g.
209 # "No module named 'pandas.core.sparse.series'"
210 # "Can't get attribute '__nat_unpickle' on <module 'pandas._libs.tslib"

ModuleNotFoundError: No module named 'pandas.core.index'

During handling of the above exception, another exception occurred:

ModuleNotFoundError Traceback (most recent call last)


Cell In[23], line 2
1 import pickle
----> 2 df=pd.read_pickle('data.pkl')
3 df

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Pytho
n311\site-packages\pandas\io\pickle.py:211, in read_pickle(filepath_or_buffer, compression, storage_options)
206 return pickle.load(handles.handle)
207 except excs_to_catch:
208 # e.g.
209 # "No module named 'pandas.core.sparse.series'"
210 # "Can't get attribute '__nat_unpickle' on <module 'pandas._libs.tslib"
--> 211 return pc.load(handles.handle, encoding=None)
212 except UnicodeDecodeError:
213 # e.g. can occur for files written in py27; see GH#28645 and GH#31988
214 return pc.load(handles.handle, encoding="latin-1")

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Pytho
n311\site-packages\pandas\compat\pickle_compat.py:225, in load(fh, encoding, is_verbose)
222 # "Unpickler" has no attribute "is_verbose" [attr-defined]
223 up.is_verbose = is_verbose # type: ignore[attr-defined]
--> 225 return up.load()
226 except (ValueError, TypeError):
227 raise

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2288.0_x64__qbz5n2kfra8p0\Lib\pickle


.py:1213, in _Unpickler.load(self)
1211 raise EOFError
1212 assert isinstance(key, bytes_types)
-> 1213 dispatch[key[0]](self)
1214 except _Stop as stopinst:
1215 return stopinst.value

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2288.0_x64__qbz5n2kfra8p0\Lib\pickle


.py:1538, in _Unpickler.load_stack_global(self)
1536 if type(name) is not str or type(module) is not str:
1537 raise UnpicklingError("STACK_GLOBAL requires str")
-> 1538 self.append(self.find_class(module, name))

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Pytho
n311\site-packages\pandas\compat\pickle_compat.py:156, in Unpickler.find_class(self, module, name)
154 key = (module, name)
155 module, name = _class_locations_map.get(key, key)
--> 156 return super().find_class(module, name)

File C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2288.0_x64__qbz5n2kfra8p0\Lib\pickle


.py:1580, in _Unpickler.find_class(self, module, name)
1578 elif module in _compat_pickle.IMPORT_MAPPING:
1579 module = _compat_pickle.IMPORT_MAPPING[module]
-> 1580 __import__(module, level=0)
1581 if self.proto >= 4:
1582 return _getattribute(sys.modules[module], name)[0]

ModuleNotFoundError: No module named 'pandas.core.index'

url = 'sample2.txt'
txt = open(url,mode="r")
text=txt.read()
print(text)
txt.close()
print(len(text))
Aeque enim contingit omnibus fidibus, ut incontentae sint.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quae cum ita sint, effectum est nihil esse malum, quod
turpe non sit. Itaque nostrum est-quod nostrum dico, artis est-ad ea principia, quae accepimus. Quod totum contr
a est. Duo Reges: constructio interrete. Atqui iste locus est, Piso, tibi etiam atque etiam confirmandus, inquam
; Quamvis enim depravatae non sint, pravae tamen esse possunt. Duarum enim vitarum nobis erunt instituta capiend
a.

Non igitur de improbo, sed de callido improbo quaerimus, qualis Q. Audio equidem philosophi vocem, Epicure, sed
quid tibi dicendum sit oblitus es. Ex ea difficultate illae fallaciloquae, ut ait Accius, malitiae natae sunt. A
t multis malis affectus. Nam quibus rebus efficiuntur voluptates, eae non sunt in potestate sapientis. Quis est
tam dissimile homini. Ut proverbia non nulla veriora sint quam vestra dogmata. Si quicquam extra virtutem habeat
ur in bonis. Sed plane dicit quod intellegit. Paulum, cum regem Persem captum adduceret, eodem flumine invectio?

Qui ita affectus, beatum esse numquam probabis; Sed nimis multa. Nam prius a se poterit quisque discedere quam a
ppetitum earum rerum, quae sibi conducant, amittere. Familiares nostros, credo, Sironem dicis et Philodemum, cum
optimos viros, tum homines doctissimos. Quod iam a me expectare noli. Quid ergo?

Eademne, quae restincta siti? Ita relinquet duas, de quibus etiam atque etiam consideret. Illa videamus, quae a
te de amicitia dicta sunt. Eaedem res maneant alio modo. Quid ergo attinet gloriose loqui, nisi constanter loqua
re? Prioris generis est docilitas, memoria; Portenta haec esse dicit, neque ea ratione ullo modo posse vivi; Bea
tum, inquit. Bestiarum vero nullum iudicium puto.

Quem Tiberina descensio festo illo die tanto gaudio affecit, quanto L. Quorum sine causa fieri nihil putandum es
t. Tria genera bonorum; Nunc dicam de voluptate, nihil scilicet novi, ea tamen, quae te ipsum probaturum esse co
nfidam. Illud dico, ea, quae dicat, praeclare inter se cohaerere. Fortemne possumus dicere eundem illum Torquatu
m? Hoc tu nunc in illo probas. Cur post Tarentum ad Archytam?

Indicant pueri, in quibus ut in speculis natura cernitur.


Sed tamen est aliquid, quod nobis non liceat, liceat illis. Virtutis, magnitudinis animi, patientiae, fortitudin
is fomentis dolor mitigari solet. Piso igitur hoc modo, vir optimus tuique, ut scis, amantissimus. Non prorsus,
inquit, omnisque, qui sine dolore sint, in voluptate, et ea quidem summa, esse dico. Potius inflammat, ut coerce
ndi magis quam dedocendi esse videantur. Virtutis, magnitudinis animi, patientiae, fortitudinis fomentis dolor m
itigari solet. Quae fere omnia appellantur uno ingenii nomine, easque virtutes qui habent, ingeniosi vocantur. N
ec enim, dum metuit, iustus est, et certe, si metuere destiterit, non erit;
2859

import PyPDF2
pdf = open('file-sample_150kB.pdf', 'rb')
pdf = PyPDF2.PdfReader(pdf)
page = pdf.pages[0]
print(page.extract_text())

Lorem ipsum
Lorem ipsum dolor sit amet, consectetur adipiscing
elit. Nunc ac faucibus odio.
Vestibulum neque massa, scelerisque sit amet ligula eu, congue molestie mi. Praesent ut
varius sem. Nullam at porttitor arcu, nec lacinia nisi. Ut ac dolor vitae odio interdum
condimentum. Vivamus dapibus sodales ex, vitae malesuada ipsum cursus
convallis. Maecenas sed egestas nulla, ac condimentum orci. Mauris diam felis,
vulputate ac suscipit et, iaculis non est. Curabitur semper arcu ac ligula semper, nec luctus
nisl blandit. Integer lacinia ante ac libero lobortis imperdiet. Nullam mollis convallis ipsum,
ac accumsan nunc vehicula vitae. Nulla eget justo in felis tristique fringilla. Morbi sit amet
tortor quis risus auctor condimentum. Morbi in ullamcorper elit. Nulla iaculis tellus sit amet
mauris tempus fringilla.
Maecenas mauris lectus, lobortis et purus mattis, blandit dictum tellus.
·Maecenas non lorem quis tellus placerat varius.
·Nulla facilisi.
·Aenean congue fringilla justo ut aliquam.
·Mauris id ex erat. Nunc vulputate neque vitae justo facilisis, non condimentum ante
sagittis.
·Morbi viverra semper lorem nec molestie.
·Maecenas tincidunt est efficitur ligula euismod, sit amet ornare est vulputate.
Row 1Row 2Row 3Row 4024681012
Column 1
Column 2
Column 3

url = "Financial Sample.xlsx"


xl = pd.ExcelFile(url)

# Get the sheet names


sheet_names = xl.sheet_names
for sheet_name in sheet_names:
df = pd.read_excel(url, sheet_name=sheet_name)
print(sheet_name)
print(df.head(2))
Sheet1
Segment Country Product Discount Band Units Sold \
0 Government Canada Carretera NaN 1618.5
1 Government Germany Carretera NaN 1321.0

Manufacturing Price Sale Price Gross Sales Discounts Sales COGS \


0 3 20 32370.0 0.0 32370.0 16185.0
1 3 20 26420.0 0.0 26420.0 13210.0

Profit Date Month Number Month Name Year


0 16185.0 2014-01-01 1 January 2014
1 13210.0 2014-01-01 1 January 2014

df = pd.read_excel('Financial Sample.xlsx')
af = df[['Segment', 'Product']]
print(af)

Segment Product
0 Government Carretera
1 Government Carretera
2 Midmarket Carretera
3 Midmarket Carretera
4 Midmarket Carretera
.. ... ...
695 Small Business Amarilla
696 Small Business Amarilla
697 Government Montana
698 Government Paseo
699 Channel Partners VTT

[700 rows x 2 columns]

from PyPDF2 import PdfReader


with open('file-sample_150kB.pdf', 'rb') as file:
pdf = PdfReader(file)
pages_text = []
for page in range(len(pdf.pages)):
text = pdf_reader.pages[page].extract_text()
pages_text.append(text)
df = pd.DataFrame(pages_text, columns=['Text'])
print(df)

Text
0 Lorem ipsum \nLorem ipsum dolor sit amet, cons...
1 In non mauris justo. Duis vehicula mi vel mi p...
2 Lorem ipsum dolor sit amet, consectetur adipis...
3

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

I used pandas because it give me opportunity to read data any types.

You might also like