Daily Post #3 6/10/2024
- Varun Vuppaladadiyam
- Jun 10, 2024
- 2 min read
Howdy!
Had to start out a daily blog post with a howdy as a tamu student... anyways, I've learned
a lot more about Python today while I was learning more about ML! I had a lot of fun doing code that would put values into lists, like as follows:
y=[5, 50, 500, 5000]
l=[]
k=[]
for x in y:
z = x*3
print("x %d z: %d" %(x, z))
l.append(z)
k.append(x)
MAE=[]
MaxLeaf=[]
for max_leaf_nodes in [5, 50, 500, 5000]:
my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
print("Max leaf nodes: %d \t\t Mean Absolute Error: %d" %(max_leaf_nodes, my_mae))
MAE.append(my_mae)
print(MAE+"aahaha "+max_leaf_nodes)
candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
for max_leaf_nodes in candidate_max_leaf_nodes:
mae.append(get_mae(max_leaf_nodes,train_X,val_X,train_y,val_y))
Using for loops and using them to iterate through values to get a list of values to test for was nice and served as a good test for my current skills. Unfortunately, the last two bits of code were for trying to find the least amount of leaf nodes which required more dictionary comprehension, which is a category that I'm lacking in. I need to get better at dictionary comprehension as well as list comprehension, as those are crucial in making more efficient code. I learned more about random states as well, finding out that they're a seed to use. I learned more about the functions needed to split up data, which requires this command and will use whatever follows:
from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y= train_test_split(X,y, random_state=<how many values you want to create>)
I learned more about validity testing as well with it required only a command as follows:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(val_predictions, val_y)
which finds the mean absolute error between your predictions and the data in the data set. I also learned how to test overfitting vs underfitting, which requires a deeper understanding of dictionary comprehension from what I'm seeing. From my understanding, the number of leaves with the least MAE is the number of leaves you should use, which you can specify in the decision tree function by doing the following:
<insert model name to test> = DecisionTreeRegressor(max_leaf_nodes=<best/lowest MAE>)
Good day for Python skills, not a great day for others. I didn't have enough time today to learn more about SAS nor did I get a chance to look through more of the ML and statistics textbooks. I need to start working earlier in the day and need to start utilizing more time.
Goals for tomorrow:
Learn more SAS and get past lesson 4 and 5
Continue to learn more Python and ML
Read more about schools as I didn't get a chance to continue doing my project utilizing DiD
These goals are mostly there for the whole entire week, but I want to make sure that I have progress on each of these goals each day, no matter how little. I'm not disappointed in today though! I've been working and studying more than I used to and it'll take time to be as productive as I want to be. All that I need to focus on is getting more and more done each day, no matter how little. Just so long as I create a habit of getting more work done every day, I'm happy!
Comments