Visualizing NBA Player Statistics

I haven’t written a post in a while. I had a lot to do for university and my hobbies like recreational programming and blogging have to suffer during those times. But now I have found some time and I’ll be adding smaller posts every now and then.

In the Machine Learning course I am taking at university I could use matplotlib to plot my functions for the homework submissions. So I have gotten more familiar with coding plots and graphs in Python since my last post about matplotlib. So I wanted to prepare some interactive plots for my blog and present to you what I have been able to create so far.

Web Scrapping for Dummies

First I wanted to find some interesting data to display. I decided to collect my own data as opposed to take an already publicly available data set, since this can easily be done in Python. For a step-by-step guide on how to scrap data from a web page, where the site is generated on the server side and you can find your data directly in the html code, I recommend chapter 11 in Automate the Boring Stuff in Python from Al Sweigart. The standard module for this type of web scraping is BeautifulSoup, which makes it easy for you to find certain tags in an HTML file.

So I decided, that I wanted to collect the stats of all currently active NBA players. Luckily, there is a blog post from Greg Reda that explained exactly how this can be done in Python. This approach of web scrapping is different, since a lot of newer sites create the web page on the client-side. So you have to find the url for the request to the server. The response you then get is often a JSON object, which you can then parse for the information you want.

The stats.nba.com web page is generated on the client-side, so the latter approach was necessary. I first collected the person_ids from every NBA player in the database and then checked their roster status, whether the player is still actively playing the NBA (here is the url for the player list). This is how my code for this task looks like:

import requests
import csv
import sys

# get me all active players

url_allPlayers = ("http://stats.nba.com/stats/commonallplayers?IsOnlyCurrentSeason"
		"=0&LeagueID=00&Season=2015-16")

#request url and parse the JSON
response = requests.get(url_allPlayers)
response.raise_for_status()
players = response.json()['resultSets'][0]['rowSet']

# use rooster status flag to check if player is still actively playing
active_players = [players[i] for i in range(0,len(players)) if players[i][2]==1 ]	

ids = [active_players[i][0] for i in range(0,len(active_players))]

print("Number of Active Players: " + str(len(ids)))

(more…)

Legofy and Sliding Images

I have subscribed to the Pycoder’s Weekly Newsletter, where I have stumbled upon JuanPotato‘s python program Legofy. Legofy is a program, that can take an image and convert it to look like it was made from Lego blocks. This intrigued me, because I couldn’t imagine what he meant by that statement without looking at his example and finally trying his program out myself.

I also learned a new way of downloading projects from github and installing them directly with pip. You just have to add a “git+” in front of the URL to the git repository. That means, that your command for downloading Legofy would look as following:

 pip install git+https://github.com/JuanPotato/Legofy.git

This works like every other pip install. Which is really helpful to know for the future and is a great time-saver.

I then wrote a quick and dirty script, that takes the file path to an image as a command line argument and converts this to a Lego image. I only check if the number of command line arguments match and didn’t consider checking if the image exists before calling legofy.main(). For one legofy.main() has its own error message when the given image name doesn’t exist and also since this should only be a short example script:

import legofy
import os
import sys

"""Check number of cmd line args"""
if len(sys.argv) != 2:
	print("Wrong Arguments --- Your command should have the following structure: python test.py <file path to image>")
	sys.exit()

here = os.path.abspath(os.path.dirname(__file__))
legofy.main(os.path.join(here, sys.argv[1]), os.path.join(here, "brick.png"))

You also have to have “brick.png”, which can be found in JuanPotato’s repository in the folder legofy/bricks/brick.png, in the same directory as the above python code to make the program work (or you change the code to match your file path to brick.png).

(more…)

Down the rabbit hole: Gambling and Online Poker

In the segment “Down the rabbit hole”, I’ll tell you about different subjects that have spiked my interest the last few days, causing me to consume different types of media revolving around the subject. I will also share links to my favorite articles I could find about the given subject.

Movie: Mississippi Grind

Mississippi Grind Movie Poster

Mississippi Grind

Mississippi Grind, a movie starring Ryan Reynolds and Ben Mendelsohn, just released. Mendelsohn plays a gambling addict in his mid-forties, who spends most of his time in the local casinos in Iowa at the poker table. He owes a lot money to a lot of people and generally doesn’t enjoy his current lifestyle. Then he meets Reynolds, who is pretty much on a life long road trip, never quite sure where he is supposed to be. Reynolds and Mendelsohn hit it off on their first meeting and quickly decide to take a road trip, riding along the Mississippi River and hitting the local casinos on the way.

I found the relationship of gambling buddies very interesting. Gambling is the only thing, that the two really have in common. It is also easier to convince yourself to take a stupid bet, when you have a sidekick cheering you on along the way. It’s hard for one to know, when to really stop at that point. Mississippi Grind also addresses the bad habits, that can come with gambling addiction. From the emotional outbursts after a hard loss, to the lying and stealing to just get enough money for the minimum bet at the table.

(more…)

My first time using matplotlib

I was interested in learning a little bit more about data science and machine learning algorithms. And one of the most used data sets to introduce one to the topic is the iris data set. The iris data set contains 150 instances with a classifier describing which kind of iris plant type it is: iris setosa, iris versicolor and iris virginica. We have 50 instances of each class in the data set. Each instance describes the plants sepal length, sepal width, petal length and petal width. And now only given the information of these 4 values, a classifier should be able to accurately predict what kind of plant type the instance is.

Here is the iris data set I have used for the following plot. It only differs from the data set in the UC Irvine Machine Learning Repository by an additional line in the CSV file describing what each column signifies:

So this is how the beginning of the csv file will look like to you, when you open it in a text editor:

sepal-length,sepal-width,petal-length,petal-width,classification
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
⋮ , ⋮ , ⋮ , ⋮ , ⋮

Matplotlib

I decided to use the iris data set to learn more on how to use matplotlib. Matplotlib is an open source project, which enables you to create 2D plots in python. You can create a lot of different kinds of charts, graphs and even animations. This is very useful, since a good visualization of a given data set can help you recognize, which feature in the instance has a high relevance to determine the classification, just by using your “naked eye”.

Luckly, matplotlib offers an extensive library of examples on their website with additional images or animations the code would produce, which made it easy for me to find, what I was looking for. And this was what I was able to produce just in my first hour of using matplotlib:

Scatter Graph of the Iris data set

Scatter Graph of the Iris data set

Here’s also the complete code:

import numpy as np
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt

""" 
	Desc: 	1. sepal-length
			2. sepal-width
			3. petal-length
			4. petal-width
			5. classification 
				- Iris-setosa
				- Iris-versicolor
				- Iris-virginica
"""
dataset = np.genfromtxt('../iris.csv', delimiter=',', dtype=None)[1:]
lengths = [x[0:4] for x in dataset]
flower_type = [x[4] for x in dataset]

for i in range(0,len(lengths)-1):
	x, y = [lengths[i][0], lengths[i][1]]
	scale = 100.0
	# determine color
	flower = flower_type[i].decode("utf-8")
	color = ""
	if flower == "Iris-setosa":
		color = "red"
	elif flower == "Iris-versicolor":
		color = "green"
	elif flower == "Iris-virginica":
		color = "blue"

	plt.scatter(x, y, s=scale, c=color, alpha=1, edgecolor="none")

# Legend
red_patch = mpatches.Patch(color='red', label='iris setosa')
green_patch = mpatches.Patch(color='green', label='iris versicolor')
blue_patch = mpatches.Patch(color='blue', label='iris virginica')
plt.legend(handles=[red_patch, green_patch, blue_patch])

plt.title("The Iris Data Set", fontsize=18)
plt.xlabel(r'sepal length', fontsize=15)
plt.ylabel(r'sepal width', fontsize=15)

plt.legend()
plt.grid(True)

plt.show()

I also used the NumPy module for the genfromtxt() function, which offers a easy way to read info from a csv file. Notice that I skipped the first line of the csv file, since that line only offers a description of the columns. The rest of the code is pretty self-explaining and not too complicated.

We can see, that there two clusters in the scatter plot. Almost all instances except a few outliers of the iris setosa test cluster around the top-left of the graph. The other instances of the iris versicolor and the iris vriginica (respectively green and blue) are mixed up in one bigger cluster. This shows us, that the sepal width and sepal length are good features to determine if the instance is of type iris setosa or not, but wouldn’t perform very well in differentiating between iris versicolor and iris virginica.

This was a short introduction and demonstration of how a good visualization of a data set can help in deciding on how to approach a problem. It also helps you better understand how the given data looks like and matplotlib is a helpful module to produce good looking graphs.

Project Euler 21: Amicable Numbers

Amicable Numbers

Warning: Please only read this post, when you are absolutely sure, that you don’t want to take part in the challenge that is Project Euler. My posts will spoil the solution for you. I’m posting my solutions to show one possible way on how to solve the problems in case you want to compare your solutions with mine or are stuck and gave up on solving the problem.

I’m finally back writing for my blog after this long hiatus. My absence was due to deadlines for school projects as well as me focusing on my exams. Since that is now over and the summer break has started I can focus more on polishing my blog and posting more regularly again.

I’ve decided to continue on the Project Euler problems, this post showing my solution approach for Problem 21. Since my last post it seemed, that someone from the outside earned access to the Project Euler user database, which resulted in the developers to remove that functionality temporarily. You can still check if your solution is correct, so this doesn’t stop us from continuing on with our work.

Problem 21 revolves around amicable numbers. These are number pairs, where the sum of proper divisors for each n (meaning that the number divides evenly and is smaller than n) resembles the other number. This has to hold true for both numbers in the pair, with the extra condition that the number pairs have to consist of two different numbers. Here’s an example: The number 220 has the following sum of proper divisors: 1 + 2 + 4 + 5 + 10 + 11 + 20 + 22 + 44 + 55 + 110 = 284. Now let’s look at the sum of proper divisors for 284: 1 + 2+ 4 + 71 + 142 = 220. This makes the set {220, 284} an amicable pair and both 220 and 284 are amicable numbers.

The solution number is then the sum of all amicable numbers until 10000. My approach is straightforward, I just find the sum of all the proper divisors of all numbers until 10000 and then search through my list of numbers and their sum of proper divisors if there are any amicable pairs. This is not the most clever or smart way to find the solution, but relying on formula that generate amicable pairs isn’t too reliable. There is for example Euler’s Rule, that produces amicable pairs according to a certain set of inputs, but it isn’t clear how to produce certain pairs and in a certain range. It also not sure if Euler’s Rule can produce all amicable pairs. This is the reason why I decided to take a more standard approach.

I also read out a txt file containing all prime numbers till 10000 to save some time when searching for proper divisors for prime numbers, since there is only the number 1 for prime numbers as a proper divisor.

So here’s the code:

"""Sum of all amicable numbers under 10000"""
import math
import time as tm

"""
Read the numbers of a txt file
	args:
		doc is the name of the txt file
"""

def readNums(doc):
	nums = []
	with open(doc) as f:
		nums = (f.read().split())
	for idx, i in enumerate(nums):
		if int(i) > 10000:
			del nums[idx:] 
	
	return nums


"""
Find and sum up all divisors of a certain range of numbers
	args:
		limit - upper limit for integer range to analyze
		primes - list of prime numbers to exclude 

	Return:
		list with all amicable numbers
"""

def findAmicables(primes, limit):
	# first create a list of lists holding number and divisor
	nums = []

	# list of found amicable numbers
	amicables = []

	for i in range(1,limit):
		nums.append([i,1])

	# find divisors
	for pair in nums:
		number = pair[0]
		# don't check divisors for prime numbers
		if number == primes:
			pair[1] += number
			continue 
		for i in range(2,math.ceil(number/2)+1):
			if(number % i == 0):
				pair[1] += i

	# check for amicable pairs
	for i in range(0,len(nums)):
		for j in range(0,len(nums)):
			if nums[i][1] == nums[j][0] and nums[i][0] == nums[j][1]:
				amicable_1 = nums[i][0]
				amicable_2 = nums[i][1]

				# check for not the same number
				if amicable_1 == amicable_2:
					break

				if not amicable_1 in amicables:
					amicables.append(amicable_1)
					print("New Amicable: " + str(amicable_1))
				if not amicable_2 in amicables:
					amicables.append(amicable_2)
					print("New Amicable: " + str(amicable_2))

	return amicables

#-----------------------------------------------------#

# start process time measure
t0 = tm.clock()
# start wall time measure
t1 = tm.time()

primes = readNums("primes1.txt")

amicables = findAmicables(primes, 10000)

print("Result: " + str(sum(amicables)))
print("Seconds process time " + str(tm.clock() - t0))
print("Seconds wall clock time " + str(tm.time() - t1))

#-----------------------------------------------------#

My program produces the following output:

Problem 21 Output

Problem 21 Output

 

 

 

 

 

My solution is a little slow with 26 seconds, but it suffices for this problem in my opinion. I’ll try to post more frequently in the future and also mix up my types of posts, since I think showing my Project Euler solution is getting a little boring. I will also be going on holiday soon, so there will be a short hiatus again.

Have a nice Sunday!

Project Euler 20: Factorial Digit Sum

Factorial Digit Sum

Warning: Please only read this post, when you are absolutely sure, that you don’t want to take part in the challenge that is Project Euler. My posts will spoil the solution for you. I’m posting my solutions to show one possible way on how to solve the problems in case you want to compare your solutions with mine or are stuck and gave up on solving the problem.

Problem 20 has you calculate the digit sum of “100!”. “100!” is the factorial of 100 and equates to 1 * 2 * 3 * … * 99 * 100. And from this product we want to know the digit sum. We have already calculated the digit sum of another long integer in a previous Project Euler problem (Problem 16),  therefore I’m going to keep this post rather short.

Since we don’t have to worry, that 100! overflows our range for integers in Python, we can simply calculate 100! and store it regularly in a variable. We are going to use the same trick as in Problem 16 to calculate the digit sum. We use the built-in function sum, that returns the sum of a given list. And the list we are going to give as parameter is the single digits of 100!. This is produced by using another built-in function called map(), which applies a given function, in our case int(), to a list. This parameter is going to be the str() of 100!, which can be seen as a list of single characters. This chain of function calls ultimately gives us the digit sum.

So here’s the code:

"""Digit sum of 100!"""

def euler20():
    """Calculates solution of Problem 20"""
    counter = 1
    for i in range(2, 101):
        counter *= i
    counter = sum(map(int,str(counter)))
    print("Result: " + str(counter))

euler20()

Problem 20 was really easy to solve, especially since we already knew how to quickly calculate the digit sum of a given number and didn’t have to worry for an overflow caused by storing 100!. The processing speed was sufficient and not really interesting, considering the simplicity of the problem solution.

Have a nice evening!

Project Euler 19: Counting Sundays

Counting Sundays

First Sundays

Modern Gregorian Calendar

Warning: Please only read this post, when you are absolutely sure, that you don’t want to take part in the challenge that is Project Euler. My posts will spoil the solution for you. I’m posting my solutions to show one possible way on how to solve the problems in case you want to compare your solutions with mine or are stuck and gave up on solving the problem.

Problem 19 has you count the number of Sundays, that fell on the first day of a month in the whole twentieth century. This problem can be solved rather easy, when using libraries that support date objects.

Solving this problem the complicated way by checking counting up the total days in the century in an integer and manually figuring out, if the given Sunday is a also the first day of a month can take a lot of effort, because there are so many exceptions you would have to pay attention to (number of days in a month, leap years,…).

Date

Python offers a datetime module that already contains a date object, that can represent a date in the Gregorian calendar format. The module also has a weekday() function, that returns the weekday that falls on that date and that is everything we need to solve this problem.

Code

My program starts at January 1st 1901 and from there on checks every first day of each month until December 31st 2000. The function weekday() returns a zero if the date given is a Sunday, so that’s when we increment our solution counter.

Here’s the code:

"""How many Sundays fell on the first of the month 
during the twentieth century (1 Jan 1901 to 31 Dec 2000)?"""

from datetime import *

counter = 0
year = 1901
month = 1

curr_day = date(year,month,1)

while(curr_day.year < 2001):
	if(curr_day.weekday() == 6):
		counter += 1
	if(month+1 == 13):
		month = 1
		year += 1
	else:
		month += 1
	curr_day = date(year,month,1)

print("Counter: " + str(counter))

The program needs 0.0149 seconds of processing time to calculate the solution.

This problem was really easy and short, but it showed that it is often smarter and faster to use preexisting libraries, instead reinventing the wheel by creating your own date class.

Hope it was a fun read, don’t be hesitant to send me suggestions on improving the algorithms via email at ratherreadblog@gmail.com or by leaving a comment at the post. You can also subscribe to my mailing list here if you always want to stay up-to-date with my newest posts and changes on the website.

 

 

Project Euler 18: Maximum Path Sum I

sketch of my solution algorithm

sketch of my solution algorithm

Maximum Path Sum I

Warning: Please only read this post, when you are absolutely sure, that you don’t want to take part in the challenge that is Project Euler. My posts will spoil the solution for you. I’m posting my solutions to show one possible way on how to solve the problems in case you want to compare your solutions with mine or are stuck and gave up on solving the problem.

So this Project Euler Problem asks you to find the largest sum that can be produced by adding up numbers through a chain with the adjacent numbers following in the row below in a triangle. The triangle we are supposed to solve for this problem looks like this:

Triangle from Problem 18

Number Triangle from Problem 18

There is a hint in the problem description mentioning, that this number triangle (with edge lengths of 15) only has 16384 possible routes from top to bottom. This number is small enough to be quickly solved by simply calculating and comparing each paths sum, but later on in Project Euler Problem 67 we get a triangle with side lengths of a hundred, giving it 299 possible paths, with is far too large to be calculated in a timely fashion anytime soon. So we are already going to try to solve Problem 18 in a smart way, to make our lives easier, when we arrive at Problem 67.

The Algorithm

There is a easy way to find the maximum path. We know, that the maximum path has to go through the very top value 75, since every possible path starts there. It would also mean that the maximum path subtracting the 75 would be the maximum path up to the second row, since the maximum path can only go through these two possible numbers in the second row to reach the 75. The true maximum path in the second row would therefore be the maximum between the maximum path that leads to 95 (the first value in the second row) and the maximum path that leads to 64 (the second value in the second row). These assumptions can be continued on until you reach the last row of the triangle. Our algorithm is going to take advantage of this knowledge.

Our algorithm would start in the second to last row of the triangle and replace each number with its maximum sum of itself and the bottom left value of the last row and of itself and the bottom right value of the last row (the sketch at the top of this post may help understanding this step). The calculated values now represent the maximum paths that lead to this value when starting from the last row.

We are going to repeat this step until we reach the top of our triangle and as the sketch shows, you’ll have the maximum of the complete triangle replace the top value. And by following the path of the maximum adjacent numbers from the top to the bottom of the triangle you also can backtrack the path that yields the maximum path.

(more…)

Project Euler 17: Number letter counts

Number letter counts

WordItOut-word-cloud-859435

Warning: Please only read this post, when you are absolutely sure, that you don’t want to take part in the challenge that is Project Euler. My posts will spoil the solution for you. I’m posting my solutions to show one possible way on how to solve the problems in case you want to compare your solutions with mine or are stuck and gave up on solving the problem.

To solve Project Euler Problem 17 one has to sum up the number of letters required to write out all natural numbers from 1 to 1000. So you are supposed to count the letters from “one”, “two”, “three” … “one-thousand”. You don’t count spaces and hyphens in, meaning that “one-thousand” only contains eleven letters.

(more…)

Project Euler 16: Power digit sum

Power digit sum

Warning: Please only read this post, when you are absolutely sure, that you don’t want to take part in the challenge that is Project Euler. My posts will spoil the solution for you. I’m posting my solutions to show one possible way on how to solve the problems in case you want to compare your solutions with mine or are stuck and gave up on solving the problem.

Problem 16 asks you to find the digit sum of 21000. This is really straightforward to solve and can be done in under 10 lines of code.

Digit sum

Digit Sum Calculation

Digit Sum Calculation

The digit sum of a given number is the sum of the individual digits in the given number. So for example the number 1284: the digit sum would be 1 + 2 + 8 + 4 = 15. The digit sum can be useful in some cases, for example if you want to know if a large number is divisible by 3, then you can simply calculate the digit sum and if the digit sum is divisible by 3, then you know that the initial number is divisible by 3.

Side note: Here’s a Wikipedia article showing the different divisibility rules for divisors up to 20, some are really interesting like the alternating sum for 7 or 13.

(more…)