You are on page 1of 13

10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

150+ Business Data Science Application


in Python
FirmAI Follow
Jul 25 · 10 min read

This article can help companies understand, not just what data science can do for them,
them, but what they can do for data science.

There is a fun game I recommend you adopt when you find corporate-speak insufferable.
You take every hackneyed question, turn it on its head and throw it back at whatever suit
might be addressing you. It is not essential that you know the topic at hand — so I
thought.

A question that popped up at our company from around January was, what can data
science do for us? Apart from smelling like suits and slides, I thought that a more
https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 1/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

addressable question lies within its inverse. Turning the question around I asked, what
can our company do for data science? One might think that questioning the questioner is
good intellectual fun, but I have come to see more hotheadedness than one might
experience in a Tarantino movie. I do, for the most part, believe I can abdicate
responsibility for this hotheadedness; any reasonable observer can find the cause in
flamboyantly decorated nooses tightly strapped around blood-restricted necks. Plausible
deniability aside, once these corporate emperors and empresses settle down, they repeat
in unison, “what even is data science?” At this moment, I stalled; they caught up to my
rhetoric; they found a way to go even deeper than me. I guess I would have to answer.
This forced me to put my poetic senior data science title aside and slide down my
unmitigatedly, arrogant horse.

At this point, everyone is miserable.


We were nearing the end of the meeting, and as it generally goes, nothing has been
achieved; no stones have been turned and no feathers have been left unruffled. This
normally meant it was time for the sacrificial silence. And yup, this time it would be
directed at me. A silence broke out (a well-positioned pause designed to send shivers
down the spine of the bravest among us). It lasted about half a minute with nothing but
the smell of cortisol to keep me company. As I confess to my misgivings by bowing my
head for the allotted time, an idea suddenly came to me. Good! Just at the right time too,
as I seriously considered sliding under the Olympic-pool sized boardroom table and out
of the room. I grabbed Greg the half-paid intern at the shoulder and asked nicely with
newfound confidence, “Greg can you please plug this computer into the HDMI port and
give me the sticky thingy”. I held my head high as I strut towards the over-sized screen.
At each step, I try to recall the page I bookmarked a few months ago, the page that I
think can save the moment — and a reputation.

As I walk towards the screen, I get distracted by the disturbing reflection of all the
predatory eyes fixed on my back, silently waiting to pounce on me if I show any sign of
weakness. I turned to face Greg, and I can see some serious sweat dripping down his
nose. All I could think was “keep your back straight, don’t show your frailty, Gregory, I
trust you“. After multiple attempts at connecting the laptop to the screen, I can see
mister laissez-faire’s eye twitching with indelible delight sneering at the failings of poor

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 2/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Gregory and me. All of this excites him a bit too much. One can’t blame him, being used
to larded presentations with needless persuasive adjectives and all of that. He can’t
contain his smile and his smile can’t contain his thoughts. For mister laissez-faire there is
nothing better than a bit of corporate theater. I felt the need to get the grimace of his
face, “hey mister laissez-faire, do you perhaps know how to or can you help Gregory plug
in the HDMI port”. As if she was waiting the question in, misses hr took on a strange
confirmatory pose. She seems to be agreeing with herself with ever-increasing nods. You
can almost see her holding back a whisper, “it was Allison that hired the intern, I had
nothing to do with it”.

Eventually, as a team, Mr laissez-faire and Gregory got the screen working, and all
predatory eyes quickly faded away into millions of pixels. Finally, the link hit me like a
hurricane. I pulled my shirt down and straightened my noose before I presented them
with a GitHub link of more than 150+ data science applications to help run a business’s
administrative processes.

And I started: “This link can help companies not just understand what data science can
do for our company but also how our company can contribute to data science
community.” In this article, I will present a curated list of these applied business machine
learning (BML) and business data science (BDS) examples and libraries that I delivered
in that presentation. The code is in Python (primarily using Jupyter Notebooks) unless
otherwise stated.

GitHub: github.com/firmai/business-machine-
learning

Accounting
Machine Learning

Chart of Account Prediction — Using labeled data to suggest the account name for
every transaction.

Accounting Anomalies — Using deep-learning frameworks to identify accounting


anomalies.

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 3/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Financial Statement Anomalies — Detecting anomalies before filing, using R.

Useful Life Prediction (FirmAI) — Predict the useful life of assets using sensor
observations and feature engineering.

AI Applied to XBRL — Standardized representation of XBRL into AI and Machine


learning.

Analytics

Forensic Accounting — Collection of case studies on forensic accounting using data


analysis. On the lookout for more data to practise forensic accounting, please get in
touch

General Ledger (FirmAI) — Data processing over a general ledger as exported


through an accounting system.

Bullet Graph (FirmAI) — Bullet graph visualisation helpful for tracking sales,
commission and other performance.

Aged Debtors (FirmAI) — Example analysis to invetigate aged debtors.

Automated FS XBRL — XML Language, however, possibly port analysis into Python.

Textual Analysis

Financial Sentiment Analysis — Sentiment, distance and proportion analysis for


trading signals.

Extensive NLP — Comprehensive NLP techniques for accounting research.

Data, Parsing and APIs

EDGAR — A walk-through in how to obtain EDGAR data.

IRS — Acessing and parsing IRS filings.

Financial Corporate — Rutgers corporate financial datasets.

Non-financial Corporate — Rutgers non-financial corporate dataset.

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 4/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

PDF Parsing — Extracting useful data from PDF documents.

PDF Tabel to Excel — How to output an excel file from a PDF.

Research And Articles

Understanding Accounting Analytics — An article that tackles the importance of


accounting analytics.

VLFeat — VLFeat is an open and portable library of computer vision algorithms,


which has Matlab toolbox.

Websites

Rutgers Raw — Good digital accounting research from Rutgers.

Courses

Computer Augmented Accounting — A video series from Rutgers University looking


at the use of computation to improve accounting.

Accounting in a Digital Era — Another series by Rutgers investigating the effects the
digital age will have on accounting.

Customer
Lifetime Value

Pareto/NBD Model — Calculate the CLV using a Pareto/NBD model.

Gamma-Gamma Model — Using deep-learning frameworks to identify accounting


anomalies.

Cohort Analysis — Cohort analysis to group customers into mutually exclusive


cohorts measured over time.

Segmentation

E-commerce — E-commerce customer segmentation.

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 5/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Groceries — Segmentation for grocery customers.

Online Retailer — Online retailer segmentation.

Bank — Bank customer segmentation.

Wholesale — Clustering of wholesale customers.

Various — Multiple types of segmentation and clustering techniques.

Behaviour

RNN — Investigating customer behaviour over time with sequential analysis using
an RNN model.

Neural Net — Demand forecasting using artificial neural networks.

Temporal Analytics — Investigating customer temporal regularities.

POS Analytics — Analytics driven customer behaviour ranking for retail promotions
using POS data.

Wholesale Customer — Wholesale customer exploratory data analysis.

RFM — Doing a RFM (recency, frequency, monetary) analysis.

Returns Behaviour — Predicting total returns and fraudulent returns.

Visits — Predicting which day of week a customer will visit.

Bank: Next Purchase — A project to predict bank customers’ most probable next
purchase.

Bank: Customer Prediction — Predicting Target customers who will subscribe the
new policy of the bank.

Next Purchase — Predict a customers’ next purchase also using feature engineering.

Customer Purchase Repeats — Using the lifetimes python library and real jewellery
retailer data analyse customer repeat purchases.

AB Testing — Find the best KPI and do A/B testing.

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 6/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Customer Survey (FirmAI) — Example of parsing and analysing a customer survey.

Happiness — Analysing customer happiness from hotel stays using reviews.

Miscellaneous Customer Analytics — Various tools and techniques for customer


analysis.

Recommender

Recommendation — Recommend the songs that a customer on a music app would


prefer listening to.

General Recommender — Identifying which products to recommend to which


customers.

Collaborative Filtering — Customer recommendation using collaborative filtering.

Up-selling (FirmAI) — Analysis to identify up-selling opportunities.

Churn Prediction

Ride Sharing — Identify customer churn rates in order to target customers for
retention campaigns.

KKDBox I — Variational deep autoencoder to predict churn customer

KKDBox II — A three step customer churn prediction framework using feature


engineering.

Personal Finance — Predict customer subscription churn for a personal finance


business.

ANN — Churn analysis using artificial neural networks.

Bike — Customer bike churn analysis.

Cost Sensitive — Cost sensitive churn analysis drivenby economic performance.

Sentiment

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 7/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Topic Modelling — Topic modelling on a corpus of customer surveys from the VR


industry.

Customer Satisfaction — Predict customer satisfaction using Kaggle data.

Employee
Management

Personality Prediction — Predict Big 5 Personality from text.

Salary Prediction Resume — Textual analyses over resume to predict appropriate


salary [Project Disappeared, still a cool idea]

Employee Review Analysis — Review analytics for top 50 retail companies on


Indeed.

Diversity Analysis — A simple analysis of gender and race disparity in the tech
industry.

Occupation Prediction — Predict the likelihood that an occupation is analytical.

Performance

Training Hours Performance — The impact of training ours on employee


performance.

Promotion Prediction — Analysing promotion patterns.

Employee Attendance prediction — Various tools to predict employee attendance.

Turnover

Early Leaving Employees — Identifying why the best and most experienced
employees leaving prematurely.

Employee Turnover — Identifying factors associated with employee turnover.

Conversations

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 8/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Slack Communication Analysis — Producing meaningful visualisations from slack


conversations.

Employee Relationships from Conversations — Identifying employee relationships


from emails for improved HR analytics.

Categorise Employee Requests — Classifying employee requests via TFDIF Vectorizer


and RandomForestClassifier.

Physical

Employee Face Recognition — A face recognition implementation.

Attendance Management System — An attendance management system using face


recognition.

Legal
Tools

LexPredict — Software package and library.

AI Para-legal — Lobe is the world’s first AI paralegal.

Legal Entity Detection — NER For Legal Documents.

Legal Case Summarisation — Implementation of different summarisation


algorithms applied to legal case judgements.

Legal Documents Google Scholar — Using Google scholar to extract cases


programatically.

Chat Bot — Chat-bot and email notifications.

Policy and Regulatory

GDPR scores — Predicting GDPR Scores for Legal Documents.

Driving Factors FINRA — Identify the driving factors that influence the FINRA
arbitration decisions.

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 9/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Securities Bias Correction — Bias-Corrected Estimation of Price Impact in Securities


Litigation.

Public Firm to Legal Decision — Embed public firms based on their reaction to legal
decisions.

Judicial Applied

Supreme Court Prediction — Predicting the ideological direction of Supreme Court


decisions: ensemble vs. unified case-based model.

Supreme Court Topic Modeling — Multiple steps necessary to implement topic


modeling on supreme court decisions.

Judge Opinion — Using text mining and machine learning to analyze judges’
opinions for a particular concern.

ML Law Matching — A machine learning law match maker.

Bert Multi-label Classification — Fine Grained Sentiment Analysis from AI.

Some Computational AI Course — Video series Law MIT.

Management
Strategy

Topic Model Reviews — Amazon reviews for product development.

Patents — Forecasting strategy using patents.

Networks — Business categories from Yelp reviews using networks can help to
identify pockets of demand.

Company Clustering — Hierarchical clusters and topics from companies by


extracting information from their descriptions on their websites

Marketing Management — Programmatic marketing management.

Decision Optimisation

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 10/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

Constraint Learning — Machine learning that takes into account constraints.

Fairlearn — I think it is called cost-sensitive machine learning.

Multi-label Classification — Cost-Sensitive Multi-Label Classification

Multi-class Classification — Cost-sensitive multi-class classification (Weighted-All-


Pairs, Filter-Tree & others)

CostCla — Costcla is a Python module for cost-sensitive machine learning


(classification) built on top of Scikit-Learn

DEA Software — pyDEA is a software package developed in Python for conducting


data envelopment analysis (DEA).

Covering Set (FirmAI) — Constraint programming analysis.

Insurance (FirmAI) — CP Insurance analysis.

Machine Learning + CP (FirmAI) — Machine Learning + Optimisation.

Post Office (FirmAI) — Post Office optimisation.

Soda — CP (FirmAI) — Constraint Programming + ML.

Soda — Knapsack (FirmAI) — Knapsack algorithm + ML.

Soda — MLP (FirmAI) — MLP analysis + ML.

Casual Inference

Marketing AB Testing — A/B Testing Experiment.

Legal Studies — Instrumental and discontinuity causal approach.

A-B Test Result (FirmAI) — Initial A-B Results.

Causal Regression (FirmAI) — Regression technique for causal estimate.

Frequentist vs Bayesian A-B Test (FirmAI) — Comparison between frequentist and


bayesian A-B testing.

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 11/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

A-B Test Power Analysis (FirmAI) — Sample size estimation to match testing power.

Variance Reduction A-B test (FirmAI) — Techniques to reduce variance in A-B tests.

Statistics

Various — Various applies statistical solutions

Quantitative

Applied RL — Reinforcement Learning and Decision Making tutorials explained at


an intuitive level and with Jupyter Notebooks

Process Mining — Leveraging A-priori Knowledge in Predictive Business Process


Monitoring

TS Forecasting — Time series forecasting for important business applications.

Data

Web Scraping (FirmAI) — Web scraping solutions for Facebook, Glassdoor,


Instagram, Morningstar, Similarweb, Yelp, Spyfu, Linkedin, Angellist.

Operations
Failure and Anomalies

Anomalies — Anomaly detection resources.

Intrusion Detection — Detecting network intrusions.

APS Failure, Data — Investigating APS failures in Scania trucks.

Hardware Failure — Using different machine learning techniques in detecting


anomalies.

Anomaly KIs,Paper — Anomaly detection algorithm for seasonal KPIs.

Load and Capacity Management

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 12/13
10/15/2019 150+ Business Data Science Application in Python - Towards Data Science

House Load Energy — Linear, SVR and Random Forest models to predict house’s
appliances energy Load.

Uber Load Management — Uber predictive load management.

Capacity Management — Investigating IT stability issues are caused by capacity


constraints.

Bike Sharing — XGBRegressor, RandomForestRegressor, GradientBoostingRegressor


combined with feature selection.

Airline Fleet Segmentation — Analysis of Delta airlines.

Airbnb — Airbnb Booking Analysis.

Prediction Management

Dispute Prediction — Financial service complaint management.

Fight Delay Prediction — Transfer learning for flight-delay prediction via variational
autoencoders in Keras.

Electric Fault Prediction — Predict tripping at grid stations by applying simple


machine learning algorithms.

Popularity Prediction in R — Marked Hawkes Point Process .

By Derek Snow | FirmAI

https://www.linkedin.com/company/firmai

Please feel free to reach out if you want to collaborate.

Machine Learning Data Science Business Python Customer Service

About Help Legal

https://towardsdatascience.com/150-business-data-science-application-in-python-72597d90f928 13/13

You might also like