Train Test Split

import pandas as pd
import matplotlib.pyplot as plt

Veri dosyasını indirmek için tıklayınız.

df = pd.read_csv("03a_carprices.csv")
df.head()

	Mileage	Age(yrs)	Sell Price($)
0	69000	6	18000
1	35000	3	34000
2	57000	5	26100
3	22500	2	40000
4	46000	4	31500

data=df.to_numpy()

data[:3]

array([[69000,     6, 18000],
       [35000,     3, 34000],
       [57000,     5, 26100]], dtype=int64)

milage=data[:,0]
age=data[:,1]
price=data[:,2]

plt.scatter(milage,price)
plt.show()

png

plt.scatter(age,price)
plt.show()

png

X = data[:,0:2]
y = data[:,2]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3) 

X_train.shape, y_train.shape

((14, 2), (14,))

X_test.shape, y_test.shape

((6, 2), (6,))

Linear regression

from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_train, y_train)

X_test

array([[82450,     7],
       [25400,     3],
       [69000,     5],
       [87600,     8],
       [46000,     4],
       [91000,     8]], dtype=int64)

reg.predict(X_test)

array([17224.19150145, 38357.0955021 , 21908.68515977, 15496.47763339,
       30601.10592035, 14169.86679672])

y_test

array([19400, 35000, 19700, 12800, 31500, 12000], dtype=int64)

# R2 score
reg.score(X_test, y_test)

0.9260837288108493

random_state argümanı ile her defasında aynı bölme işlemi sağlanabilir.

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=10)
X_test

array([[72000,     6],
       [83000,     7],
       [59000,     5],
       [52000,     5],
       [22500,     2],
       [87600,     8]], dtype=int64)

Kaynak:

https://www.youtube.com/watch?v=gmvvaobm7eQ&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw
https://github.com/codebasics/py/tree/master/ML