Discovering Numpy, Pandas and SciKit Learn.

NR
Personal Project

--

When starting with Machine Learning you see the terms Numpy, Pandas and SciKit Learn mentioned around whether relevant or not. This article will help you understand each term and will come definitely in handy when you’re busy mastering Machine Learning.

Numpy

Numpy stands for numerical python. As the name gave it away, it’s an opensource library for the Python programming language. I hear you thinking: “another library…” but no such thing is true! Numpy is one of the most useful libraries especially if you’re crunching numbers.

Purpose

Numpy adds support for large, multi-dimensional matrices and arrays, along with a gigantic collection of top-end mathematical functions to operate on these arrays and matrices. It’s objective is to make it easier for you to transform difficult functions or calculate some data analysis. Numpy’s biggest advantage is its fastness. It’s so much faster than using the built-in Python’s functions.

For example, it lets you simply calculate the mean and median of a dataframe with a plain line of code for each:

np.median(ages)
np.mean(ages)

How to import

First off you need to install Numpy but only if you’re not using Anaconda. To do so:

pip install numpy

You always import Numpy as np, its just silence agreed on.

import numpy as np

Pandas

Pandas is a Python opensource library that gives you a highly useful set of tools to do data analysis. Learning Pandas is a must for stepping up your Machine Learning game. Not only is it used for data analysis but also for data science, Machine Learning, … To put it simply: if it uses data, you’re gonna need Pandas. It can help you load, prepare, merge, join, reshape, analyze, process and adjust data in a blink of an eye.

Purpose

As mentioned above, Pandas is an open-source library that lets you easily use data structures and data analysis tools for the Python programming language. Pandas is structured around DataFrame objects. All of your data comes into one big DataFrame where you can select out some samples or other data manipulation if wanted.

Some other fancy things Pandas lets you do are:

  • Reading and writing data between in-memory data structures and different formats such as CSV, text files, Microsoft Excel files, SQL databases, …
  • High-performance merging and joining of data sets
  • Data alignment and integrated handling of missing data

How to import

First off you need to install Numpy but only if you’re not using Anaconda. To do so:

pip install pandas

You always import Pandas as pd, its just silence agreed on.

import pandas as pd

SciKit Learn

SciKit Learn is the go-to library for Machine Learning. It’s a library founded by Google as a Google Summer of Code project. The name came from:

sciPy Toolkit

Purpose

Just like Pandas and Numpy, it’s a Python library, but SciKit more specific for Machine Learning. SciKit Learn includes everything from dataset manipulation to processing metrics. One of the best things about SciKit Learn are the built-in algorithms for Machine Learning which you can just try out with minimal adjustments. Functions such as classification, regression, clustering, mode, model selection and others are generally built-in.

How to import

Scikit Learn requires Python and NumPy. For plotting (functions that start with “plot_”) you’ll first need to import Matplotlib. If you already installed Numpy, you can simply install SciKit as following:

First off you need to install SciKit but only if you’re not using Anaconda. To do so:

pip install scikit-learn

As you usually don’t need the whole library, you can easily import just a fraction of it:

from sklearn import tree
#for importing the decision tree function

The great thing about Numpy, Pandas and Scikit Learn is that they all work together. A default thing to do is to load/clean/manipulate your data using Pandas. Translate your Pandas DataFrame into a Numpy array and fed it to Scikit Learn function(s). Often this happens automatically so you won’t need to worry about this process.

So that’s it! I hope this helps anyone who finds some struggles with understanding Numpy, Pandas and Scikit Learn. Make sure to follow me for more problems and solutions I come across within React Native and Machine Learning!

--

--

NR
Personal Project

Trying to figure things out while writing about it. Pixel-perfect friend, front-end developer and anything data-related geek