Professional Documents
Culture Documents
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
Drive already mounted at /content/drive; to attempt to forcibly remount, call
drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Interview-AI Engineer-VNPay/dataset
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[4]: !ls
1
import seaborn as sns
from scipy.stats import chi2_contingency
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[8]: order_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[8]: Row ID Order Priority Discount Unit Price Shipping Cost Customer ID \
0 20847 High 0.01 2.84 0.93 3
1 20228 Not Specified 0.02 500.98 26.00 5
2 21776 Critical 0.06 9.48 7.29 11
3 24844 Medium 0.09 78.69 19.99 14
4 24846 Medium 0.08 3.28 2.31 14
… … … … … … …
1947 19842 High 0.01 10.90 7.46 3397
1948 19843 High 0.10 7.99 5.03 3397
2
1949 26208 Not Specified 0.08 11.97 5.81 3399
1950 24911 Medium 0.10 9.38 4.93 3400
1951 25914 High 0.10 105.98 13.99 3403
3
3 United States Central Minnesota Prior Lake 55372
4 United States Central Minnesota Prior Lake 55372
… … … … … …
1947 United States Central Illinois Danville 61832
1948 United States Central Illinois Danville 61832
1949 United States Central Illinois Des Plaines 60016
1950 United States East West Virginia Fairmont 26554
1951 United States West Wyoming Cheyenne 82001
Order Date Ship Date Profit Quantity ordered new Sales Order ID
0 2015-01-07 2015-01-08 4.5600 4 13.01 88522
1 2015-06-13 2015-06-15 4390.3665 12 6362.85 90193
2 2015-02-15 2015-02-17 -53.8096 22 211.15 90192
3 2015-05-12 2015-05-14 803.4705 16 1164.45 86838
4 2015-05-12 2015-05-13 -24.0300 7 22.23 86838
… … … … … … …
1947 2015-03-11 2015-03-12 -116.7600 18 207.31 87536
1948 2015-03-11 2015-03-12 -160.9520 22 143.12 87536
1949 2015-03-29 2015-03-31 -41.8700 5 59.98 87534
1950 2015-04-04 2015-04-04 -24.7104 15 135.78 87537
1951 2015-02-08 2015-02-11 349.4850 5 506.50 87530
[9]: return_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
4
[10]: user_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[11]: False
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[12]: 4 86838
5 86838
6 86838
10 86836
15 42949
…
1935 88838
1936 88838
1940 88745
1942 88746
1948 87536
Name: Order ID, Length: 587, dtype: int64
5
[13]: order_df[order_df['Order ID'] == 182683]
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[14]: set(return_df['Status'].values)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[14]: {'Returned'}
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
6
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[17]: order_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[17]: Row ID Order Priority Discount Unit Price Shipping Cost Customer ID \
0 20847 High 0.01 2.84 0.93 3
1 20228 Not Specified 0.02 500.98 26.00 5
2 21776 Critical 0.06 9.48 7.29 11
3 24844 Medium 0.09 78.69 19.99 14
4 24846 Medium 0.08 3.28 2.31 14
… … … … … … …
1947 19842 High 0.01 10.90 7.46 3397
1948 19843 High 0.10 7.99 5.03 3397
1949 26208 Not Specified 0.08 11.97 5.81 3399
1950 24911 Medium 0.10 9.38 4.93 3400
1951 25914 High 0.10 105.98 13.99 3403
7
… … …
1947 Storage & Organization Small Box
1948 Telephones and Communication Medium Box
1949 Pens & Art Supplies Small Pack
1950 Office Furnishings Small Box
1951 Office Furnishings Medium Box
Order ID Manager
8
0 88522 William
1 90193 William
2 90192 Erin
3 86838 Chris
4 86838 Chris
… … …
1947 87536 Chris
1948 87536 Chris
1949 87534 Chris
1950 87537 Erin
1951 87530 William
order_df.drop(columns=['Status'], inplace=True)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[19]: order_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[19]: Row ID Order Priority Discount Unit Price Shipping Cost Customer ID \
0 20847 High 0.01 2.84 0.93 3
1 20228 Not Specified 0.02 500.98 26.00 5
2 21776 Critical 0.06 9.48 7.29 11
3 24844 Medium 0.09 78.69 19.99 14
4 24846 Medium 0.08 3.28 2.31 14
… … … … … … …
1947 19842 High 0.01 10.90 7.46 3397
1948 19843 High 0.10 7.99 5.03 3397
1949 26208 Not Specified 0.08 11.97 5.81 3399
1950 24911 Medium 0.10 9.38 4.93 3400
9
1951 25914 High 0.10 105.98 13.99 3403
10
… … … … … …
1947 United States Central Illinois Danville 61832
1948 United States Central Illinois Danville 61832
1949 United States Central Illinois Des Plaines 60016
1950 United States East West Virginia Fairmont 26554
1951 United States West Wyoming Cheyenne 82001
[20]: order_df[order_df['Returned'] == 1]
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[20]: Row ID Order Priority Discount Unit Price Shipping Cost Customer ID \
68 1950 Medium 0.01 4.91 0.50 117
69 1951 Medium 0.09 4.00 1.30 117
11
171 5302 High 0.01 8.33 1.99 308
256 1147 Medium 0.08 2.94 0.96 491
294 2368 Medium 0.00 6.88 2.00 553
346 7893 Not Specified 0.00 236.97 59.24 640
588 6711 High 0.00 6.68 5.66 1044
689 7632 Medium 0.09 130.98 30.00 1217
692 7810 Medium 0.00 7.10 6.05 1228
693 7811 Medium 0.01 4.98 4.62 1228
694 7812 Medium 0.06 5.68 1.39 1228
968 8389 High 0.02 30.98 17.08 1733
1205 1008 High 0.09 16.98 12.39 2189
1513 5338 High 0.05 165.20 19.99 2670
1514 5339 High 0.09 17.99 8.65 2670
12
Product Name Product Base Margin \
68 Avery 493 0.36
69 EcoTones® Memo Sheets 0.37
171 80 Minute Slim Jewel Case CD-R , 10/Pack - Sta… 0.52
256 Newell 343 0.58
294 Adams Phone Message Book, 200 Message Capacity… 0.39
346 Chromcraft Rectangular Conference Tables 0.61
588 Xerox 1923 0.37
689 Office Star - Contemporary Task Swivel chair w… 0.78
692 Wilson Jones Hanging View Binder, White, 1" 0.39
693 Imation 3.5", DISKETTE 44766 HGHLD3.52HD/FM, 1… 0.64
694 Staples Standard Envelopes 0.38
968 Xerox 197 0.40
1205 Brown Kraft Recycled Envelopes 0.35
1513 Economy Rollaway Files 0.59
1514 Model L Table or Wall-Mount Pencil Sharpener 0.57
13
968 2015-06-28 2015-06-29 -32.280 13 438.25
1205 2015-05-08 2015-05-10 -48.570 22 381.91
1513 2015-05-29 2015-05-29 2008.710 167 27587.55
1514 2015-05-29 2015-05-29 -80.530 71 1191.58
common_order_ids = order_ids_in_order_df.intersection(order_ids_in_return_df)
unique_order_ids_in_order_df = order_ids_in_order_df - common_order_ids
unique_order_ids_in_return_df = order_ids_in_return_df - common_order_ids
1354 unique order IDs in order_df but not in return_df: {90114, 90115, 90120,
90121, 86041, 90145, 90146, 90147, 90148, 40997, 86050, 86051, 86052, 86053,
90154, 86054, 86063, 90160, 86064, 90166, 90167, 86075, 86076, 86077, 8257,
90178, 86085, 86086, 90185, 90186, 90187, 86092, 90189, 90190, 90192, 90193,
86101, 86102, 86103, 86104, 90201, 32869, 86118, 86119, 90218, 86122, 86123,
86124, 90236, 90237, 90238, 90239, 86144, 86145, 90244, 90248, 86153, 90258,
86163, 86164, 86165, 86166, 90264, 90265, 86173, 90270, 90271, 53410, 16547,
86181, 86184, 86189, 86190, 86191, 86192, 90291, 90292, 90296, 90301, 90303,
12480, 90309, 90314, 86220, 86221, 86222, 90322, 86227, 90327, 86233, 86234,
90333, 90334, 90335, 90337, 90338, 90339, 53476, 86250, 90353, 90354, 86258,
90359, 90360, 90361, 90362, 86263, 86264, 86267, 86268, 86279, 90378, 86283,
86284, 90385, 90386, 90387, 86297, 86307, 90404, 90405, 24869, 41253, 90408,
14
16676, 86308, 86309, 86310, 86311, 90414, 90415, 86327, 86331, 90430, 90431,
90432, 86338, 45380, 90437, 90438, 90439, 86346, 90449, 86356, 86357, 90460,
90461, 90462, 86368, 86369, 90469, 86373, 359, 90473, 86382, 90479, 90480,
86383, 86384, 90488, 90491, 90492, 90493, 86397, 90500, 90501, 90502, 86409,
86410, 86411, 90513, 90514, 86422, 86427, 90524, 90525, 86432, 90530, 90531,
90532, 90533, 90538, 90539, 90540, 86447, 86448, 86454, 90551, 86459, 86460,
90557, 86465, 57794, 86466, 90568, 90577, 90578, 86486, 90583, 86489, 86490,
86491, 90588, 90589, 90593, 90594, 45539, 90596, 90597, 86500, 90600, 90601,
90602, 86507, 86508, 86509, 86514, 90612, 90613, 86520, 90621, 86527, 90624,
86528, 86529, 90630, 90631, 86534, 86535, 86536, 86544, 90641, 86545, 86546,
86547, 86548, 90646, 86555, 86556, 90653, 548, 86565, 90662, 86566, 86567,
90669, 86573, 86574, 86575, 90674, 90675, 90678, 90685, 86591, 86592, 90695,
86599, 86600, 90706, 86610, 86611, 86612, 90710, 90714, 86621, 90724, 90725,
86629, 86633, 90731, 90735, 86639, 90739, 86645, 86646, 90750, 90751, 90752,
90753, 86654, 86655, 646, 86662, 29319, 86668, 90766, 90767, 90771, 86686,
86687, 86688, 37537, 90786, 90787, 41636, 86693, 86694, 29350, 86699, 90796,
90800, 90806, 90814, 90815, 53953, 90818, 90819, 90820, 90821, 86722, 86723,
86724, 86725, 86734, 86735, 90832, 90833, 90837, 90844, 86750, 86751, 86752,
86753, 90850, 86754, 90853, 90854, 90855, 4839, 90859, 90860, 90861, 86767,
86768, 90867, 90871, 90880, 90881, 45824, 86789, 86790, 86791, 90888, 86792,
86793, 90891, 86794, 86795, 86796, 90899, 90905, 90908, 90909, 90910, 86812,
86813, 86814, 8994, 86815, 90917, 90922, 86826, 86827, 86828, 90927, 90932,
86836, 90934, 86837, 86838, 86839, 86846, 86847, 41793, 90951, 90952, 86860,
90961, 90962, 86867, 90964, 86868, 86869, 86870, 86874, 90973, 90977, 37729,
33635, 86883, 86884, 86885, 86886, 86887, 90985, 90986, 90987, 86898, 86899,
86900, 86901, 86902, 91000, 86913, 86914, 91017, 86925, 86926, 86927, 91025,
86933, 91030, 91036, 91041, 91042, 91043, 86949, 86950, 86951, 86952, 91049,
86956, 91053, 91054, 86957, 86958, 91057, 86959, 91059, 91060, 86960, 91062,
91063, 86966, 86973, 962, 91076, 91077, 91078, 86989, 91086, 91087, 91088,
91089, 91090, 87002, 87003, 87004, 87005, 91108, 91109, 91110, 87015, 87016,
91115, 91116, 87020, 91122, 91123, 87029, 87030, 91127, 87031, 87032, 91130,
91131, 87033, 87041, 87042, 87043, 91144, 87057, 91166, 91167, 87071, 87072,
87076, 87077, 17446, 91174, 91175, 87078, 87079, 91180, 87086, 87087, 91194,
91195, 91200, 91201, 21572, 9285, 87109, 87110, 91209, 91212, 91213, 87117,
91219, 91228, 91229, 87134, 87135, 13408, 54369, 91235, 91236, 37987, 87146,
87147, 91244, 91245, 87148, 87160, 87161, 91258, 87162, 91261, 91262, 91263,
21636, 87175, 87176, 87177, 87178, 91277, 87186, 87187, 91285, 91286, 87193,
87194, 87195, 91296, 91297, 91298, 91304, 91305, 91306, 87208, 91310, 87214,
91316, 87221, 87222, 91321, 91328, 38080, 29889, 87234, 38087, 87240, 87243,
87244, 87245, 91344, 91354, 91355, 87258, 87259, 87260, 34017, 91362, 91363,
17636, 91365, 91366, 87272, 91371, 87277, 91376, 87285, 87286, 87287, 91386,
91388, 91389, 87296, 87297, 87298, 87299, 58628, 91397, 91398, 87306, 91407,
91408, 87316, 87317, 91414, 91415, 91416, 91417, 91424, 13606, 54567, 91432,
91433, 91435, 91436, 91437, 91438, 87342, 87345, 87347, 91447, 91451, 87356,
87357, 91454, 87364, 87365, 87366, 91466, 87374, 87378, 87382, 87383, 91480,
91481, 91482, 91488, 91492, 46436, 87396, 91495, 91496, 91502, 87406, 87407,
87408, 91513, 87424, 87425, 91522, 87426, 5509, 9606, 87435, 87436, 91543,
87451, 87452, 91550, 91555, 87463, 87464, 13735, 87473, 87474, 91571, 91575,
15
91576, 87484, 91581, 87485, 91583, 91584, 87486, 91586, 87487, 87488, 21958,
87511, 50656, 87520, 87525, 87530, 87534, 87535, 87536, 87537, 87552, 87553,
87554, 87555, 87556, 87569, 87570, 87579, 87583, 87584, 87585, 58914, 87586,
87587, 87602, 87603, 87611, 87617, 87618, 87619, 87620, 87630, 87631, 87632,
87633, 87634, 87651, 87652, 42599, 87671, 87672, 87676, 87677, 87678, 87679,
38529, 34435, 22147, 87695, 87696, 87700, 54949, 87720, 87721, 87725, 87726,
87727, 87747, 87748, 87749, 87757, 87765, 87772, 87773, 50917, 26342, 87790,
87795, 87804, 87811, 87812, 87813, 46853, 87823, 87824, 87830, 87831, 87832,
5920, 14115, 46884, 87846, 87847, 87853, 87862, 87877, 87884, 87885, 87888,
87889, 87899, 87900, 5984, 87905, 42852, 87908, 87909, 87915, 87916, 87917,
87933, 87934, 87935, 51072, 87940, 87946, 87947, 87952, 87953, 87954, 87962,
87963, 87964, 87965, 87977, 87978, 87979, 87980, 87993, 87994, 87995, 38852,
42949, 88004, 88014, 88015, 88016, 88017, 88023, 88028, 88029, 88030, 59365,
88039, 88040, 88041, 88048, 88060, 88061, 47108, 55300, 88075, 88083, 88084,
88085, 88093, 88094, 10277, 88101, 88102, 88103, 88104, 88105, 88114, 30785,
34882, 43079, 88135, 88136, 88137, 88151, 88152, 88156, 88157, 55392, 88163,
88164, 88165, 39015, 88173, 88174, 88184, 88185, 88191, 88192, 18561, 88196,
88197, 88198, 88204, 88205, 88212, 88213, 88219, 88220, 39076, 88232, 88233,
88234, 88239, 88240, 88241, 88256, 88265, 88266, 88267, 88268, 88278, 88279,
88280, 88281, 88282, 10464, 22755, 88296, 88297, 88298, 88319, 88320, 14596,
88329, 88330, 88348, 88360, 88361, 88367, 88368, 88371, 88372, 88380, 88387,
88388, 88389, 88390, 88391, 88403, 88404, 88405, 88406, 88410, 88411, 88418,
88425, 88426, 88443, 88444, 88447, 35200, 2433, 88448, 88449, 27013, 47493,
88460, 88461, 88474, 88475, 88479, 88480, 55713, 6562, 14756, 88487, 88502,
88503, 88504, 88511, 14785, 88522, 88527, 88534, 88543, 88544, 88545, 88546,
88547, 88548, 88554, 88555, 88556, 88557, 88558, 88568, 88569, 88570, 88571,
23042, 88579, 88580, 39430, 88587, 88588, 88589, 88590, 88598, 88599, 88600,
88610, 88611, 88612, 88626, 88627, 88632, 88633, 88634, 88644, 88645, 88646,
88656, 88657, 88658, 88666, 88667, 88668, 19042, 88677, 88678, 88679, 88685,
88686, 88692, 88701, 88702, 88713, 88714, 88721, 88722, 88726, 88727, 88728,
88729, 88730, 88731, 88745, 88746, 88753, 88758, 88766, 88781, 88782, 88783,
88784, 88794, 88798, 88814, 88815, 90109, 88819, 90110, 88824, 88825, 88826,
88836, 11013, 88837, 88838, 88839, 88840, 88852, 88857, 88870, 88871, 88879,
88880, 88881, 88882, 88889, 88890, 27456, 88899, 11077, 88905, 88906, 88907,
88908, 88921, 88928, 88929, 88940, 88941, 88942, 88958, 88959, 88971, 88972,
88974, 88975, 88998, 89004, 89005, 89006, 89007, 89008, 89017, 89018, 89019,
89025, 11206, 89039, 89040, 89041, 89047, 89053, 89054, 89055, 3042, 44002,
89059, 89071, 89076, 89077, 89083, 89084, 89092, 89093, 89095, 89096, 89097,
89102, 89106, 89112, 89128, 89129, 89130, 89139, 89140, 89146, 89147, 89148,
3138, 89166, 89174, 89175, 89176, 89184, 89193, 89194, 89199, 89200, 89201,
89202, 89203, 89209, 89211, 48257, 89218, 89219, 89240, 89251, 40101, 56486,
89257, 89258, 89259, 89278, 89279, 89284, 44231, 23751, 89291, 89292, 89293,
89299, 89300, 89301, 89314, 89315, 89316, 36069, 89319, 89320, 89327, 89333,
89334, 89344, 3332, 11527, 89355, 89356, 89360, 89361, 89375, 89376, 40224,
32037, 89389, 89394, 89401, 89402, 89406, 89407, 89408, 3397, 23877, 89414,
89415, 89426, 89431, 89432, 89433, 89434, 89440, 28001, 48483, 89448, 89449,
89450, 89456, 89465, 89481, 89497, 89503, 89504, 89505, 89514, 89515, 89520,
89521, 89522, 89523, 89524, 89525, 11712, 89536, 89537, 7623, 89564, 89571,
16
89572, 44517, 89579, 89583, 89584, 89585, 89595, 89596, 3585, 89601, 89602,
89608, 89609, 89610, 89611, 89631, 20007, 89639, 89647, 89657, 89658, 89664,
28225, 89665, 89666, 89679, 89680, 89686, 89697, 40547, 36452, 89704, 89705,
89706, 89716, 89726, 24193, 89730, 89729, 89743, 89761, 89762, 32420, 89770,
89775, 89776, 89777, 89787, 89789, 48836, 89801, 89805, 89810, 89818, 89819,
89820, 7909, 57061, 24294, 89835, 89836, 89847, 89848, 89849, 89856, 3841,
89857, 89858, 89869, 89872, 89873, 89874, 89879, 89880, 89885, 20261, 36647,
89897, 89909, 89910, 89915, 85826, 85827, 85828, 89928, 85833, 85834, 85835,
89939, 89940, 89941, 89942, 89943, 89944, 85850, 85857, 85858, 89957, 85865,
85866, 85867, 85868, 89961, 89970, 85880, 89981, 89982, 89983, 89984, 89988,
85893, 85894, 85895, 85896, 85897, 85898, 24455, 89993, 89994, 89999, 90000,
90001, 90002, 90003, 85914, 85915, 85916, 90011, 53153, 85928, 85929, 90026,
90027, 90031, 90032, 85938, 85939, 85940, 90040, 85947, 85948, 85949, 85950,
90043, 12224, 90044, 90048, 32710, 90058, 90059, 85964, 85965, 85966, 90069,
85979, 85980, 85981, 90078, 90079, 8165, 85990, 28647, 85991, 49125, 86002,
86003, 90099, 90103, 90104, 86010, 86011, 86012, 86013, 86014}
1623 unique order IDs in return_df but not in order_df: {20480, 20486, 143384,
135194, 139291, 155675, 131101, 147486, 167965, 53285, 135224, 131130, 151611,
131133, 139326, 163902, 180287, 65, 176190, 36932, 36934, 45127, 151641, 147546,
143450, 155741, 163933, 159838, 57440, 41059, 8292, 8293, 12389, 49255, 176248,
159866, 36992, 36994, 24707, 32901, 4230, 36999, 36998, 155800, 135320, 155804,
180381, 143519, 41120, 32931, 12451, 4261, 57510, 176698, 155834, 139451,
147643, 180412, 147646, 151738, 135356, 12483, 49349, 16582, 32966, 131288,
147672, 143577, 155868, 135389, 131294, 20704, 41186, 32996, 32998, 159992,
151802, 172283, 123132, 176378, 135420, 159997, 41216, 16641, 57600, 176382,
49412, 28928, 20743, 155929, 160025, 180507, 176410, 123166, 147743, 53536,
177534, 12580, 57638, 16679, 4391, 164152, 172344, 123194, 135480, 176443,
172349, 147775, 143679, 12613, 24902, 164184, 143706, 164187, 147804, 127324,
151901, 155999, 160095, 53600, 49510, 178362, 164216, 123258, 131450, 139643,
151933, 156031, 160127, 20864, 176511, 37250, 176536, 180633, 172442, 147869,
164254, 147871, 180638, 180639, 143773, 12704, 12706, 20899, 12710, 29095,
156090, 123323, 164282, 135612, 135613, 160188, 176572, 20934, 172504, 172505,
180698, 160218, 180700, 135643, 156126, 123359, 16864, 152028, 168413, 143838,
164345, 139775, 156159, 4610, 33283, 49668, 37380, 12806, 25095, 53767, 156184,
147993, 139802, 156186, 147996, 156189, 131614, 139807, 180760, 180767, 143898,
135707, 41508, 33317, 45605, 37414, 127545, 131642, 156219, 156220, 164412,
172602, 148031, 25152, 16961, 180798, 160315, 45632, 25157, 139864, 123481,
148056, 123483, 172634, 168539, 160348, 123487, 49762, 612, 12900, 614, 12903,
123512, 143995, 148092, 180861, 127612, 148095, 156287, 160380, 57986, 135806,
168571, 49797, 4738, 45698, 29318, 164504, 152219, 152221, 180894, 160414,
17058, 678, 49830, 152249, 180922, 160441, 135867, 131773, 168635, 176825,
135871, 29376, 160447, 29380, 33477, 710, 37572, 45767, 172761, 131802, 172762,
172764, 135897, 123614, 152282, 168668, 152286, 152287, 29410, 740, 45794,
33510, 21222, 140024, 127737, 144121, 131835, 155647, 172797, 164606, 172799,
144125, 8961, 152317, 49924, 33541, 4864, 775, 127769, 152346, 127773, 148254,
127774, 41760, 144159, 160543, 168733, 168734, 13091, 21286, 45863, 148280,
176955, 131901, 156477, 156478, 181054, 833, 144190, 9027, 49988, 152382,
168766, 29505, 29506, 54086, 181084, 4960, 21346, 33637, 13158, 17255, 54119,
17
131960, 123769, 164728, 148347, 172921, 156541, 127870, 50048, 17282, 9093, 902,
25478, 25479, 41861, 21383, 54151, 177656, 156568, 123801, 123802, 140187,
172952, 181147, 131998, 123807, 160667, 17313, 50081, 50083, 160668, 144286,
177053, 50087, 13218, 5028, 132024, 152504, 136121, 177081, 156604, 123837,
156605, 156606, 9152, 181181, 144318, 152510, 5059, 5061, 54215, 173016, 123870,
132062, 140255, 136158, 160734, 50147, 54243, 13284, 37860, 37862, 46052, 54245,
169533, 169534, 173048, 160762, 132091, 163839, 132093, 123902, 132095, 58368,
140286, 144382, 9219, 58372, 168958, 177151, 123928, 140312, 164888, 168984,
128028, 168988, 181278, 132127, 164895, 177180, 132152, 156729, 148536, 160825,
144442, 136252, 164926, 173118, 128061, 144444, 144445, 169023, 177214, 54339,
50246, 5189, 138619, 140376, 123994, 173147, 152667, 140381, 181342, 160860,
160861, 169052, 144479, 54368, 17508, 169053, 58470, 13410, 54371, 140409,
177274, 160891, 181372, 132221, 132222, 124031, 177279, 33921, 50307, 58500,
13444, 25735, 132248, 173208, 148634, 124059, 124060, 160924, 38050, 29861,
177336, 132281, 181434, 152761, 136378, 148669, 156862, 144573, 144574, 46276,
50374, 25799, 58566, 148697, 140506, 140507, 169177, 177371, 140510, 144605,
144606, 21729, 25828, 46311, 172029, 136440, 177401, 124154, 132347, 181500,
152826, 132350, 173310, 9472, 50432, 144634, 136444, 17668, 144636, 177407,
5381, 46341, 140568, 181528, 173338, 140571, 128284, 136476, 144670, 148767,
165151, 169244, 54563, 5414, 29991, 46375, 161080, 177464, 156987, 156988,
173373, 181566, 136507, 58688, 128316, 144702, 152895, 21824, 34117, 50501,
161087, 38210, 13638, 140632, 165209, 176120, 136537, 140636, 177501, 146812,
25952, 58720, 38240, 58725, 9574, 42342, 132472, 124282, 165243, 136572, 128381,
140670, 124286, 124287, 144766, 152957, 38272, 50564, 152958, 50566, 42375,
21890, 5511, 140698, 173466, 136606, 157087, 136607, 34209, 169374, 13729,
46497, 181688, 124345, 144824, 153016, 161210, 144827, 161212, 54721, 17858,
58818, 42436, 13765, 148952, 180216, 169434, 144859, 132573, 144862, 9696,
30176, 54755, 9701, 50663, 128504, 153081, 144890, 173563, 157180, 173565,
181755, 140799, 173567, 181759, 128509, 128510, 144894, 38400, 169469, 54787,
181784, 161304, 173594, 165403, 124444, 161310, 157215, 169503, 50721, 9762,
34338, 153144, 149050, 169531, 165436, 149053, 149054, 132669, 140862, 17985,
165437, 42563, 17988, 58949, 153149, 153151, 5699, 46662, 162331, 165465,
161369, 124507, 149084, 181851, 136795, 169565, 9829, 50789, 136825, 124538,
177786, 132733, 26240, 59009, 50818, 38530, 42628, 54914, 50823, 149144, 157337,
157338, 173721, 181914, 136861, 173726, 161437, 169631, 13984, 50850, 13986,
22181, 9895, 59047, 149176, 140985, 132794, 136889, 161466, 177850, 136894,
169662, 59072, 9923, 30403, 38596, 9927, 18119, 157400, 157402, 173786, 145117,
157406, 153310, 128735, 177886, 50914, 34532, 132857, 141049, 157435, 161531,
128765, 157438, 132863, 173822, 173823, 182015, 59139, 26372, 169726, 128767,
46852, 30469, 38661, 178648, 178650, 178651, 149272, 173849, 136988, 124701,
132894, 145181, 169756, 145183, 59171, 42788, 18215, 182072, 157497, 145208,
157499, 165691, 165693, 182075, 153403, 145212, 137021, 128830, 161599, 10054,
42823, 141144, 124761, 153433, 141147, 178011, 173917, 14176, 34658, 42850,
34661, 149368, 178040, 157562, 141178, 141179, 157565, 161657, 124799, 161661,
34689, 145279, 51075, 169855, 22402, 38787, 55172, 173977, 182170, 137113,
169881, 157597, 169884, 149407, 18336, 42912, 173983, 128925, 177051, 161693,
14242, 55203, 6054, 145338, 133052, 141244, 165820, 165823, 145342, 42945,
55235, 10183, 165848, 133081, 182233, 157659, 124892, 149469, 165852, 174047,
18
128984, 145371, 153564, 153567, 145375, 47078, 47079, 157689, 124921, 169977,
129018, 161786, 178170, 169981, 161791, 47109, 133144, 170011, 170012, 129053,
170015, 47138, 55330, 51239, 165944, 124985, 157753, 137272, 124988, 161848,
178234, 124991, 18496, 141375, 165951, 137275, 178235, 14406, 51271, 47174,
174169, 174171, 182365, 157791, 170079, 6241, 22627, 34916, 18533, 51302,
141432, 174201, 182392, 149627, 170106, 161917, 157822, 133247, 153726, 6272,
43138, 22656, 43140, 39043, 22661, 161944, 141465, 149657, 149658, 149660,
133277, 149661, 141471, 26784, 18593, 157853, 174239, 145563, 161948, 153757,
129182, 153759, 14497, 47265, 39075, 47271, 153784, 133305, 133306, 149691,
141500, 157882, 133310, 174269, 170169, 59585, 153786, 43203, 137406, 170174,
14528, 14534, 178392, 174297, 129241, 141531, 174300, 125149, 125150, 141533,
182492, 145626, 170201, 129246, 26852, 18661, 55526, 35047, 129272, 133369,
166137, 166138, 178425, 153851, 162043, 170236, 129279, 26881, 10498, 18689,
59652, 43269, 137471, 39169, 178431, 22787, 137497, 166170, 157979, 141595,
170265, 141598, 182559, 59680, 137500, 129310, 59683, 162078, 22820, 35110,
35111, 125240, 166200, 182586, 166203, 174395, 129336, 137530, 153915, 129339,
35137, 18753, 170301, 137534, 137535, 55616, 55618, 55623, 129368, 129369,
178520, 149852, 141661, 158047, 51553, 51554, 31073, 6498, 47457, 26982, 51559,
6500, 6502, 170360, 137596, 59776, 18822, 47494, 166296, 182680, 182681, 125339,
182683, 129433, 158110, 166302, 145818, 137630, 170398, 22947, 39333, 10662,
22950, 158136, 166329, 166330, 149947, 158139, 125373, 125374, 141758, 166332,
154040, 162235, 178621, 162238, 55747, 149976, 166360, 174553, 166363, 154072,
149981, 158173, 174559, 43488, 182750, 162265, 162266, 129501, 55776, 43494,
18919, 59879, 14820, 158200, 125433, 166392, 133627, 174584, 182781, 141822,
166398, 145916, 27137, 170494, 55808, 31232, 47620, 6661, 47621, 162330, 125467,
141852, 127516, 133662, 141855, 137755, 137756, 145947, 145950, 162335, 35366,
23076, 6695, 174648, 129592, 137785, 154171, 141884, 178749, 125503, 43585,
19010, 39490, 55877, 31303, 178776, 150105, 133722, 178777, 158300, 141917,
150109, 158302, 166493, 137819, 6757, 14951, 125560, 174713, 125562, 141946,
150138, 150141, 182906, 154233, 154237, 170621, 23168, 39555, 19078, 178840,
150169, 146076, 178846, 55968, 15009, 35492, 10917, 51876, 51879, 178874,
133819, 178875, 150205, 125631, 129727, 43713, 19138, 39619, 150232, 129753,
125659, 133851, 174813, 158430, 142047, 35554, 51940, 150264, 150265, 142073,
146170, 133884, 170747, 137981, 129791, 15106, 35588, 47876, 174872, 138009,
162586, 150299, 158492, 133917, 158494, 133919, 142111, 166686, 174876, 174879,
162590, 56101, 47910, 138040, 146232, 125754, 142139, 129850, 129854, 133951,
170815, 56128, 179006, 52035, 6978, 6979, 174937, 174938, 133979, 150364,
174940, 162650, 142175, 138075, 138078, 27490, 146271, 52068, 162654, 162655,
35687, 15202, 15206, 177243, 158584, 146297, 125818, 134011, 166779, 174971,
158590, 174975, 162683, 129918, 146328, 129947, 179101, 138142, 35744, 7079,
154552, 162744, 125882, 125883, 158652, 125885, 175035, 162745, 129979, 23488,
39872, 31682, 7107, 56257, 15303, 158684, 134111, 130015, 39904, 158713, 166906,
125947, 130042, 138234, 146426, 150527, 154619, 154620, 162812, 23557, 11271,
23559, 39943, 125976, 175128, 175130, 154650, 166940, 142365, 150557, 130077,
130079, 52258, 7203, 35877, 146489, 134202, 166972, 154684, 130110, 154687,
27712, 52288, 44098, 19523, 138303, 23616, 35910, 23619, 56387, 167000, 175195,
179291, 150622, 126046, 27744, 35936, 31844, 7269, 27750, 52327, 162937, 167036,
130174, 134271, 154751, 162943, 11396, 150680, 171161, 158875, 126108, 158878,
19
19616, 11425, 11426, 40097, 31907, 48293, 48295, 142521, 142523, 158908, 175292,
138428, 163005, 163006, 48321, 56514, 23748, 36038, 40132, 40134, 126169,
175321, 179418, 175324, 142557, 175326, 130267, 179420, 154845, 40160, 36067,
3300, 48353, 142584, 158968, 150780, 154876, 134398, 146685, 154878, 179452,
44292, 19718, 56582, 48391, 171288, 134425, 142617, 126235, 146713, 167197,
163101, 134431, 163102, 32036, 56612, 52518, 175416, 146745, 159034, 150843,
150844, 171322, 163131, 179514, 36160, 48448, 167256, 126297, 175448, 171353,
126300, 159068, 146778, 171357, 146782, 15712, 7521, 28003, 15718, 48486, 48487,
150904, 138616, 163193, 159099, 142716, 150909, 142718, 159103, 52608, 3456,
11648, 52611, 11652, 28037, 130429, 130430, 142744, 134553, 150936, 150938,
167322, 167325, 175512, 171420, 11682, 15778, 40354, 36262, 152121, 167352,
126393, 130489, 163259, 155069, 163261, 138687, 56768, 56769, 3525, 44486,
52678, 175576, 138712, 175578, 138714, 159196, 175580, 146906, 167391, 171483,
138719, 11748, 48615, 126456, 134648, 142840, 126459, 134651, 138744, 171512,
155131, 171515, 138751, 15872, 24066, 3589, 126488, 146971, 167452, 163357,
15904, 44579, 56868, 44583, 175672, 126521, 142905, 175673, 159292, 175676,
167486, 171576, 179768, 130623, 7744, 20036, 52805, 56901, 48710, 134745,
147033, 167516, 151135, 36449, 56930, 56931, 3687, 138872, 126585, 147066,
134779, 179837, 134782, 28291, 7812, 11909, 48773, 11911, 7815, 48775, 134808,
143003, 159387, 147099, 147102, 155295, 7841, 7845, 20134, 143032, 130746,
155323, 151228, 143036, 143038, 126654, 134846, 3777, 179900, 163518, 163519,
171710, 3783, 126680, 130776, 163544, 161950, 179931, 167646, 28387, 12005,
155641, 161340, 151290, 159484, 134908, 171772, 147197, 147199, 36609, 28419,
16134, 139165, 130840, 163608, 155418, 155420, 130845, 134943, 147231, 12067,
48931, 28455, 126777, 151354, 139065, 180025, 155451, 147260, 126783, 12096,
155455, 163647, 171839, 36676, 44869, 57157, 36679, 32582, 143193, 130905,
151387, 163673, 135005, 126814, 180058, 139100, 36705, 180059, 36707, 180061,
8034, 40802, 40806, 57190, 130936, 139128, 151420, 135037, 139132, 175999,
28544, 155519, 49026, 49027, 36743, 159640, 176024, 176026, 143259, 139160,
143261, 159646, 151455, 143263, 126877, 44962, 176029, 36772, 36773, 4006,
20389, 171935, 57248, 57253, 147384, 126905, 126907, 4037, 8133, 24519, 171994,
143325, 171998, 151519, 131039, 139231, 136765, 49123, 20453, 12262, 12263,
135160, 163833, 172026, 147452, 126973, 167935}
11 common order IDs: {37760, 8353, 59937, 17155, 37924, 54595, 55874, 13959,
47813, 56452, 7364}
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[22]: order_df[order_df["Returned"] == 1]
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
20
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[22]: Row ID Order Priority Discount Unit Price Shipping Cost Customer ID \
68 1950 Medium 0.01 4.91 0.50 117
69 1951 Medium 0.09 4.00 1.30 117
171 5302 High 0.01 8.33 1.99 308
256 1147 Medium 0.08 2.94 0.96 491
294 2368 Medium 0.00 6.88 2.00 553
346 7893 Not Specified 0.00 236.97 59.24 640
588 6711 High 0.00 6.68 5.66 1044
689 7632 Medium 0.09 130.98 30.00 1217
692 7810 Medium 0.00 7.10 6.05 1228
693 7811 Medium 0.01 4.98 4.62 1228
694 7812 Medium 0.06 5.68 1.39 1228
968 8389 High 0.02 30.98 17.08 1733
1205 1008 High 0.09 16.98 12.39 2189
1513 5338 High 0.05 165.20 19.99 2670
1514 5339 High 0.09 17.99 8.65 2670
21
692 Binders and Binder Accessories Small Box
693 Computer Peripherals Small Pack
694 Envelopes Small Box
968 Paper Small Box
1205 Envelopes Small Box
1513 Storage & Organization Small Box
1514 Pens & Art Supplies Small Box
22
294 2015-01-28 2015-01-29 34.068 36 267.53
346 2015-02-14 2015-02-15 1192.040 34 6686.34
588 2015-02-27 2015-02-28 -76.940 90 617.40
689 2015-04-28 2015-05-01 -421.760 41 5258.94
692 2015-02-16 2015-02-17 -60.145 28 208.83
693 2015-02-16 2015-02-18 -111.720 41 228.30
694 2015-02-16 2015-02-16 33.010 24 129.53
968 2015-06-28 2015-06-29 -32.280 13 438.25
1205 2015-05-08 2015-05-10 -48.570 22 381.91
1513 2015-05-29 2015-05-29 2008.710 167 27587.55
1514 2015-05-29 2015-05-29 -80.530 71 1191.58
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[24]: order_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1952 entries, 0 to 1951
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
23
0 Row ID 1952 non-null int64
1 Order Priority 1952 non-null object
2 Discount 1952 non-null float64
3 Unit Price 1952 non-null float64
4 Shipping Cost 1952 non-null float64
5 Customer ID 1952 non-null int64
6 Customer Name 1952 non-null object
7 Ship Mode 1952 non-null object
8 Customer Segment 1952 non-null object
9 Product Category 1952 non-null object
10 Product Sub-Category 1952 non-null object
11 Product Container 1952 non-null object
12 Product Name 1952 non-null object
13 Product Base Margin 1936 non-null float64
14 Country 1952 non-null object
15 Region 1952 non-null object
16 State or Province 1952 non-null object
17 City 1952 non-null object
18 Postal Code 1952 non-null int64
19 Order Date 1952 non-null datetime64[ns]
20 Ship Date 1952 non-null datetime64[ns]
21 Profit 1952 non-null float64
22 Quantity ordered new 1952 non-null int64
23 Sales 1952 non-null float64
24 Order ID 1952 non-null int64
25 Manager 1952 non-null object
26 Returned 1952 non-null int64
dtypes: datetime64[ns](2), float64(6), int64(6), object(13)
memory usage: 427.0+ KB
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[25]: order_df['Country'].unique()
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
24
[26]: order_df['Delivery Time'] = pd.to_datetime(order_df['Ship Date']) - pd.
↪to_datetime(order_df['Order Date'])
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[27]: order_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
25
2 53 370.91 -50.64000 1 days Regular Air
3 47 4976.92 510.48900 2 days Regular Air
4 61 586.96 -10.90000 1 days Regular Air
… … … … … …
1947 8 6901.25 4233.25880 4 days Delivery Truck
1948 8 1214.03 837.68070 1 days Delivery Truck
1949 21 556.61 196.52328 1 days Regular Air
1950 2 10.96 -29.00300 2 days Regular Air
1951 20 1503.05 1037.10450 1 days Regular Air
26
1951 0.55 West Idaho Coeur D Alene
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
27
[31]: order_df['Sales'].corr(order_df['Profit'], method='spearman')
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[31]: 0.27474678109322304
threshold = 1.96
filtered_order_df = order_df[~outliers_mask]
filtered_correlation = filtered_order_df['Sales'].
↪corr(filtered_order_df['Profit'], method='spearman')
[33]: order_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
28
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
29
1 Small Box Avery 481
2 Small Box Xerox 1976
3 Small Box V3682
4 Small Pack Acme® Forged Steel Scissors with Black Enamel …
… … …
1947 Jumbo Box Riverside Palais Royal Lawyers Bookcase, Royal…
1948 Jumbo Drum Panasonic KX-P1150 Dot Matrix Printer
1949 Medium Box SouthWestern Bell FA970 Digital Answering Mach…
1950 Small Box Wilson Jones Impact Binders
1951 Wrap Bag Accessory34
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
30
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
for i in range(len(sales)):
plt.text(i, sales[i], f'{sales[i]:,.0f}', ha='center', va='top',␣
↪color='black')
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
31
Calculate the correlation among profit, delivery time, and order priority for each region
[37]: regions = list(order_df["Region"].unique())
32
avg_profits_by_category = region_df.groupby('Product Category')['Profit'].
↪mean()
print("Average Profits for each Product Category:")
print(avg_profits_by_category)
print("################")
33
returned_count = (region_df['Returned'] == 1).sum()
total_orders = len(region_df["Order ID"].unique())
print(f"Number of rows where 'Returned' column has a value of 1:␣
↪{returned_count}/{total_orders}")
print()
print()
Region: West
################
Top 3 occurrences of each Product Category:
Office Supplies 253
Technology 129
Furniture 88
Name: Product Category, dtype: int64
################
Average Profits for each Product Category:
Product Category
Furniture 608.474841
Office Supplies 48.236487
Technology 78.257155
Name: Profit, dtype: float64
################
Average Discount for each Product Category:
Product Category
Furniture 0.049659
Office Supplies 0.048577
Technology 0.044264
Name: Discount, dtype: float64
################
Average Product Base Margin for each Product Category:
Product Category
Furniture 0.602706
Office Supplies 0.468571
Technology 0.559457
Name: Product Base Margin, dtype: float64
################
Occurrence of each Order Priority:
Low 112
Not Specified 105
High 93
Medium 89
Critical 71
Name: Order Priority, dtype: int64
################
Average Delivery Time for each Order Priority:
34
Order Priority
Critical 1 days 06:45:38.028169014
High 1 days 10:19:21.290322580
Low 4 days 06:12:51.428571428
Medium 1 days 09:42:28.314606741
Not Specified 1 days 06:51:25.714285714
Name: Delivery Time, dtype: timedelta64[ns]
################
Average Shipping Cost for the region: 12.733872340425533
################
Average Profit per Order: 225.7285419702381
################
Number of rows where 'Returned' column has a value of 1: 8/336
#####################################
Region: Central
################
Top 3 occurrences of each Product Category:
Office Supplies 315
Technology 131
Furniture 120
Name: Product Category, dtype: int64
################
Average Profits for each Product Category:
Product Category
Furniture 41.845431
Office Supplies 89.946268
Technology 335.961425
Name: Profit, dtype: float64
################
Average Discount for each Product Category:
Product Category
Furniture 0.050583
Office Supplies 0.047968
Technology 0.049466
Name: Discount, dtype: float64
################
Average Product Base Margin for each Product Category:
Product Category
Furniture 0.583590
Office Supplies 0.457238
Technology 0.576489
Name: Product Base Margin, dtype: float64
################
Occurrence of each Order Priority:
Critical 118
High 116
35
Not Specified 114
Low 114
Medium 104
Name: Order Priority, dtype: int64
################
Average Delivery Time for each Order Priority:
Order Priority
Critical 1 days 10:34:34.576271186
High 1 days 09:55:51.724137931
Low 3 days 20:25:15.789473684
Medium 1 days 11:46:09.230769230
Not Specified 1 days 10:56:50.526315789
Name: Delivery Time, dtype: timedelta64[ns]
################
Average Shipping Cost for the region: 12.575618374558303
################
Average Profit per Order: 193.41368167150003
################
Number of rows where 'Returned' column has a value of 1: 0/400
#####################################
Region: East
################
Top 3 occurrences of each Product Category:
Office Supplies 265
Technology 110
Furniture 99
Name: Product Category, dtype: int64
################
Average Profits for each Product Category:
Product Category
Furniture -4.676507
Office Supplies 192.858727
Technology 314.971045
Name: Profit, dtype: float64
################
Average Discount for each Product Category:
Product Category
Furniture 0.048889
Office Supplies 0.051170
Technology 0.046818
Name: Discount, dtype: float64
################
Average Product Base Margin for each Product Category:
Product Category
Furniture 0.606224
Office Supplies 0.468365
36
Technology 0.557636
Name: Product Base Margin, dtype: float64
################
Occurrence of each Order Priority:
Critical 106
Medium 101
Not Specified 94
High 89
Low 84
Name: Order Priority, dtype: int64
################
Average Delivery Time for each Order Priority:
Order Priority
Critical 1 days 12:00:00
High 1 days 08:37:45.168539325
Low 3 days 16:51:25.714285714
Medium 1 days 09:44:33.267326732
Not Specified 1 days 13:01:16.595744680
Name: Delivery Time, dtype: timedelta64[ns]
################
Average Shipping Cost for the region: 13.79957805907173
################
Average Profit per Order: 264.0600725882353
################
Number of rows where 'Returned' column has a value of 1: 7/323
#####################################
Region: South
################
Top 3 occurrences of each Product Category:
Office Supplies 238
Technology 111
Furniture 93
Name: Product Category, dtype: int64
################
Average Profits for each Product Category:
Product Category
Furniture 12.313790
Office Supplies -8.905288
Technology -121.169173
Name: Profit, dtype: float64
################
Average Discount for each Product Category:
Product Category
Furniture 0.049892
Office Supplies 0.049370
Technology 0.050721
37
Name: Discount, dtype: float64
################
Average Product Base Margin for each Product Category:
Product Category
Furniture 0.592299
Office Supplies 0.465294
Technology 0.557568
Name: Product Base Margin, dtype: float64
################
Occurrence of each Order Priority:
Critical 96
High 93
Low 88
Not Specified 83
Medium 82
Name: Order Priority, dtype: int64
################
Average Delivery Time for each Order Priority:
Order Priority
Critical 1 days 08:45:00
High 1 days 04:38:42.580645161
Low 4 days 09:00:00
Medium 1 days 08:11:42.439024390
Not Specified 1 days 10:07:13.734939759
Name: Delivery Time, dtype: timedelta64[ns]
################
Average Shipping Cost for the region: 12.828303167420815
################
Average Profit per Order: -44.24556558113497
################
Number of rows where 'Returned' column has a value of 1: 0/326
#####################################
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
• We can see that the order of South Region is not less than other region by a lot.
• The Low priority get lowest delivery time (4 days in average), others don’t have a huge
difference.
• South has huge loss for Technology sector, but no returned order
• South region has a slightly higher discount for Technology, compared to other region
• On average, South region lost $44 per order
• There are cases when sales is significantly smaller than profit
38
Report 2: Best ship mode
[38]: order_df['Delivery Time Seconds'] = order_df['Delivery Time'].dt.total_seconds()
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
average_shipping_cost.plot(kind='bar', color='skyblue')
plt.title('Average Shipping Cost by Shipping Mode')
plt.xlabel('Shipping Mode')
plt.ylabel('Average Shipping Cost')
plt.subplot(1, 2, 2)
average_delivery_time.plot(kind='bar', color='lightgreen')
plt.title('Average Delivery Time by Shipping Mode')
plt.xlabel('Shipping Mode')
plt.ylabel('Average Delivery Time (seconds)')
plt.tight_layout()
plt.show()
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
39
• Delivery truck service is way too expensive
• Regular Air has the longest delivery time
• Express Air is the best Shipping Mode available
Report 3: Association Rule in Purchased Items
[40]: order_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
40
1951 91586 Medium 0.03 85.99 0.99
41
… … … … …
1947 0.62 West Nevada Las Vegas
1948 0.56 Central Texas Burleson
1949 0.56 West New Mexico Clovis
1950 0.36 West California Dublin
1951 0.55 West Idaho Coeur D Alene
'Product Category':␣
↪lambda x: set(x),
'Sales': 'sum',
'Profit': 'sum'}).
↪reset_index()
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[42]: order_products_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
42
[42]: Order ID Product Name \
0 359 [Bevis 36 x 72 Conference Tables]
1 548 [Avery 481, Xerox 1976, V3682]
2 646 [Acme® Forged Steel Scissors with Black Enamel…
3 962 [Holmes Replacement Filter for HEPA Air Cleane…
4 2433 [Lexmark 4227 Plus Dot Matrix Printer]
… … …
1360 91576 [Electrix 20W Halogen Replacement Bulb for Zoo…
1361 91581 [Panasonic KX-P1150 Dot Matrix Printer]
1362 91583 [SouthWestern Bell FA970 Digital Answering Mac…
1363 91584 [Wilson Jones Impact Binders]
1364 91586 [Accessory34]
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[44]: apriori_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
43
[44]: Order ID Product Name \
1 548 [Avery 481, Xerox 1976, V3682]
3 962 [Holmes Replacement Filter for HEPA Air Cleane…
8 3397 [Belkin 105-Key Black Keyboard, Avery Durable …
10 3841 [Fellowes PB500 Electric Punch Plastic Comb Bi…
12 5509 [Staples Brown Kraft Recycled Clasp Envelopes,…
… … …
1344 91466 [Xerox 1928, Tripp Lite Isotel 8 Ultra 8 Outle…
1354 91522 [Accessory37, Self-Adhesive Address Labels for…
1356 91550 [Global Leather Executive Chair, Peel & Seel® …
1357 91555 [Avery Flip-Chart Easel Binder, Black, G.E. Ha…
1360 91576 [Electrix 20W Halogen Replacement Bulb for Zoo…
Num Items
1 3
3 2
8 2
10 2
12 2
… …
1344 2
1354 2
1356 3
1357 2
1360 3
44
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)
rules.sort_values(by='confidence', ascending=False, inplace=True)
top_10_rules = rules.head(10)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
<ipython-input-45-3c6157fc9c80>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
[46]: top_10_rules
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
45
0 0.008869 0.031042 0.008869 1.000000 32.214286
2 0.008869 0.031042 0.008869 1.000000 32.214286
5 0.008869 0.044346 0.008869 1.000000 22.550000
7 0.008869 0.013304 0.008869 1.000000 75.166667
8 0.011086 0.017738 0.008869 0.800000 45.100000
6 0.013304 0.008869 0.008869 0.666667 75.166667
9 0.017738 0.011086 0.008869 0.500000 45.100000
1 0.031042 0.008869 0.008869 0.285714 32.214286
3 0.031042 0.008869 0.008869 0.285714 32.214286
4 0.044346 0.008869 0.008869 0.200000 22.550000
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
<ipython-input-47-61326dcd0cd2>:2: UserWarning: This pattern is interpreted as a
regular expression, and has match groups. To actually get the groups, use
str.extract.
filtered_orders = order_df[order_df['Product Name'].str.contains("(Black)") &
order_df['Product Name'].str.contains("(Avery Flip-Chart Easel Binder)")]
/usr/local/lib/python3.10/dist-packages/pandas/core/algorithms.py:522:
DeprecationWarning: np.find_common_type is deprecated. Please use
`np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more
information. (Deprecated NumPy 1.25)
common = np.find_common_type([values.dtype, comps_array.dtype], [])
[48]: filtered_df
46
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
47
1888 Small Box Avery Flip-Chart Easel Binder, Black
1941 Small Box Avery Flip-Chart Easel Binder, Black
1942 Small Pack G.E. Halogen Desk Lamp Bulbs
Using a support level of 0.8%, we can discover items with a particular characteristic or items that
are frequently bought together. Support level is set to 0.8%, which means that we are interested
in finding itemsets that occur in at least 0.8% of all transactions. This helps us identify patterns
of association between items that occur together frequently in transactions. By setting a minimum
support threshold, we focus on identifying only those itemsets that have a significant presence in
the dataset, allowing us to uncover meaningful relationships between products or characteristics.
We can then use this information to promote items that normally go together to customer, help
boosting profits.
Report 4: Product Category Focus
[49]: order_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
48
[49]: Order ID Order Priority Discount Unit Price Shipping Cost \
0 359 Medium 0.08 124.49 51.94
1 548 Critical 0.04 3.08 0.99
2 548 Critical 0.02 6.48 5.90
3 548 Critical 0.04 125.99 4.20
4 646 High 0.01 9.31 3.98
… … … … … …
1947 91576 Low 0.04 880.98 44.55
1948 91581 Not Specified 0.01 145.45 17.85
1949 91583 High 0.01 28.99 8.59
1950 91584 Not Specified 0.10 5.18 5.74
1951 91586 Medium 0.03 85.99 0.99
49
1948 Jumbo Drum Panasonic KX-P1150 Dot Matrix Printer
1949 Medium Box SouthWestern Bell FA970 Digital Answering Mach…
1950 Small Box Wilson Jones Impact Binders
1951 Wrap Bag Accessory34
50
margin_by_category.plot(kind='bar', ax=axs[0, 1], color='lightgreen')
axs[0, 1].set_title('Average Product Base Margin by Product Category')
axs[0, 1].set_ylabel('Average Product Base Margin')
plt.tight_layout()
plt.show()
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
• Office Suppliers: This category generates the highest profit overall. Despite the company’s
51
profit margin being relatively smaller compared to other categories, Office Supplies excel
in terms of the number of orders. This indicates that Office Supplies are popular among
customers and contribute significantly to the company’s revenue stream.
• Furniture: On the contrary, Furniture has the lowest profit margin and incurs very high
shipping costs. This suggests that while Furniture may contribute to sales volume, its prof-
itability is compromised due to high associated expenses. The high shipping costs may be a
deterrent for customers or may reflect the logistical challenges of transporting bulky items.
• Recommendations:
Focus on maintaining or increasing sales volume in Office Supplies while optimizing costs. This
could involve negotiating better deals with suppliers or streamlining operations.
For Furniture, consider strategies lower shipping cost (open storage locations), and get more cus-
tomers to buy as it has large profit margin
Report 5: Correlation test
[52]: order_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1952 entries, 0 to 1951
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 1952 non-null int64
1 Order Priority 1952 non-null object
2 Discount 1952 non-null float64
3 Unit Price 1952 non-null float64
4 Shipping Cost 1952 non-null float64
5 Quantity ordered new 1952 non-null int64
6 Sales 1952 non-null float64
7 Profit 1952 non-null float64
8 Delivery Time 1952 non-null timedelta64[ns]
9 Ship Mode 1952 non-null object
10 Customer Segment 1952 non-null object
11 Product Category 1952 non-null object
12 Product Sub-Category 1952 non-null object
13 Product Container 1952 non-null object
14 Product Name 1952 non-null object
15 Product Base Margin 1936 non-null float64
16 Region 1952 non-null object
17 State or Province 1952 non-null object
18 City 1952 non-null object
19 Postal Code 1952 non-null int64
20 Manager 1952 non-null object
21 Returned 1952 non-null int64
22 Delivery Time Seconds 1952 non-null float64
dtypes: float64(7), int64(4), object(11), timedelta64[ns](1)
memory usage: 350.9+ KB
52
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[54]: num_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1952 entries, 0 to 1951
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order ID 1952 non-null int64
1 Discount 1952 non-null float64
2 Unit Price 1952 non-null float64
3 Shipping Cost 1952 non-null float64
4 Quantity ordered new 1952 non-null int64
5 Sales 1952 non-null float64
6 Profit 1952 non-null float64
7 Product Base Margin 1936 non-null float64
8 Postal Code 1952 non-null int64
9 Returned 1952 non-null int64
10 Delivery Time Seconds 1952 non-null float64
dtypes: float64(7), int64(4)
memory usage: 167.9 KB
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
53
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
54
• Sales is strongly positive correlated to Unit Price and Shipping Cost. This is correct as
Furniture has high Unit Price => high Sales, as well as high Shipping Fee
• Unit Price and Shipping Cost also correlated.
• Sales and Profit does not correlated noticably
[77]: cat_df = order_df.select_dtypes(include=['object'])
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[78]: cat_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1952 entries, 0 to 1951
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order Priority 1952 non-null object
1 Ship Mode 1952 non-null object
2 Customer Segment 1952 non-null object
3 Product Category 1952 non-null object
4 Product Sub-Category 1952 non-null object
5 Product Container 1952 non-null object
6 Product Name 1952 non-null object
7 Region 1952 non-null object
8 State or Province 1952 non-null object
9 City 1952 non-null object
10 Manager 1952 non-null object
dtypes: object(11)
memory usage: 167.9+ KB
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
55
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[80]: cat_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
56
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
[82]: label_encoded_df
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
p_values = pd.DataFrame(index=label_encoded_df.columns,␣
↪columns=label_encoded_df.columns)
57
for col1 in label_encoded_df.columns:
for col2 in label_encoded_df.columns:
if col1 != col2:
contingency_table = pd.crosstab(label_encoded_df[col1],␣
↪label_encoded_df[col2])
chi2, p, _, _ = chi2_contingency(contingency_table)
chi2_matrix.loc[col1, col2] = chi2
p_values.loc[col1, col2] = p
plt.figure(figsize=(12, 10))
sns.heatmap(chi2_matrix.astype(float), annot=True, cmap='coolwarm', fmt=".2f",␣
↪linewidths=0.5)
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283:
DeprecationWarning: `should_run_async` will not call `transform_cell`
automatically in the future. Please pass the result to `transformed_cell`
argument and any exception that happen during thetransform in
`preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
58
• Product Sub-Category is heavily correlated with City
[ ]:
59