Ever since I was a child I loved playing video games. I never needed anything crazy, a simple mario platformer was enough to keep me engaged for weeks. As time went on and I learned how to program, I was immediately drawn to the idea of making a game. From creating my first simple platformers to building games in Unity, I began to develop my idea of a dream job as a game developer. However, as I began to grow old, and began to do more research on the topic, there seemed to be a common debate amongst programmers about the "dream". Many claimed that was all it was. A dream, and unlikely to become a reality. Others claimed that anyone could do it with some hard work. In this project, I aim to answer this question, or at least gain more insight on the game development dream. Is it realistic, or just the equivalent of a kid saying they want to fly to the moon?
To understand the market, we first need data. In this case, I am interested in two things. The game market as a whole, and what games were succeeding when. For this, I was able to find a data set that contained 16598 observations with information about the game, when it was released, its genre, and its sales. This was downloaded and imported into the project. The data will be stored in a pandas dataframe.
# Importing Librarys
import requests
import pandas as pd
import numpy as np
# Creating Dataframe (filtering out incomplete year)
all_time = pd.read_csv("vgsales.csv")
all_time = all_time[all_time['Year'] <= 2015]
Once the data was gathered, it was important to check for any missing data.
print(f"Total Observations: {all_time.count().max()}")
print(f"Total Missing Observations: {all_time.count().max()-all_time.dropna().count().max()}")
p = ((all_time.count().max()-all_time.dropna().count().max())/all_time.count().max())*100
print(f"Percentage of missing Observations: {p}")
Total Observations: 15979 Total Missing Observations: 34 Percentage of missing Observations: 0.21277927279554415
In this case, as can be seen above, only 0.21 percent of data is missing, which is below the threshold of 5 percent. The missing data can then be ignored as it will not have a significant impact on the results.
# Dropping missing data
all_time = all_time.dropna();
all_time.head()
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
In order to understand game development as a whole, It will be very important to first understand the market as a whole. Once the market is fully understood, we can then begin to look at what games succeed and which don't. In order to get a stronger understanding about the market, It can be helpful to study the trend of total game sales over time to see if the market is thriving or not. This will be done by first summing the data over each year.
# Summing all sales for each year
all_time.groupby('Year').sum().head()
Rank | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|
Year | ||||||
1980.0 | 29826 | 10.59 | 0.67 | 0.00 | 0.12 | 11.38 |
1981.0 | 190488 | 33.40 | 1.96 | 0.00 | 0.32 | 35.77 |
1982.0 | 149186 | 26.92 | 1.65 | 0.00 | 0.31 | 28.86 |
1983.0 | 56759 | 7.76 | 0.80 | 8.10 | 0.14 | 16.79 |
1984.0 | 22911 | 33.28 | 2.10 | 14.27 | 0.70 | 50.36 |
# Importing matplotlib and plotting data
from matplotlib import pyplot as plt
plot = all_time.dropna().groupby('Year').sum().reset_index().plot(x='Year', y='Global_Sales',
title = "Total Video Game Sales Over Time",
ylabel = "Total Sales",
figsize=(15, 5)
);
First let's talk about the total video game sales over time. One can notice the upward and then strict downward trend of the graph. This can be explained by a few different events. First of all, the strict increase can be explained by the emergence of video games as a market. This upward trend then peaks around 2008, which is known to many as the golden age of video games. This was when some of the best selling consoles of all time were released such as the Xbox360, the Wii, and the PS3. However, following 2008 there is a strict decline. Some argue this could be as a result of the financial crisis that occurred at the time, while it seems much more likely that it could be due to the oversaturation of the gaming market. This is much more likely due to its almost full decade long decline, making the financial crisis seem like an unlikely reason.
For some further analysis, it might be helpful to look at the graphs for each platform. This could give some insight on which platforms would be best to release on. What will be explored is only relevant consoles in recent years.
# Displating unique platforms
all_time["Platform"].unique()
array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA', '3DS', 'PS4', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne', 'GC', 'WiiU', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16', '3DO', 'GG', 'PCFX'], dtype=object)
relevant = ['PC', 'PS4', 'XOne', 'WiiU', 'X360', 'Wii', 'DS', 'PS2', 'PS3', 'XB', 'GC']
plt.figure(figsize=(15, 5))
# Only plotting relevant platforms
for platform in relevant:
temp = all_time[all_time["Platform"] == platform].groupby('Year').sum().reset_index()
temp = temp[temp['Year'] > 2000]
plt.plot(temp['Year'], temp['Global_Sales'], label = platform)
plt.legend()
plt.legend(bbox_to_anchor=(1.1, 1.05))
plt.xlabel("Year")
plt.ylabel("Sales")
plt.title("Game Sales per Year for Different Platforms");
plt.show()
A few important conclusions can be reached from this graph. First of all, consoles of the previous generation (Wii and Xbox360) are on a downward trend. It would be a waste of time and money to create a game for these. Another important thing to notice is the failure of the Wii U. Although a new console, It is clear that it does not match the popularity of the Xbox One or the PS4. One final insight gained from this graph is the very low success of the PC. Although the PC has been around longer than many consoles, and is the only platform that will remain relevant throughout generations, It does not come close to other game consoles.
Beyond the platforms it is released on, the game itself must be appealing. Although this data is unable to explain the exact game that would do best, it can at least point us in the direction through the games genre. For this next portion, It will be helpful to view the trends of each genre's total game sales over time.
# Displaying unique genres
all_time["Genre"].unique()
array(['Sports', 'Platform', 'Racing', 'Role-Playing', 'Puzzle', 'Misc', 'Shooter', 'Simulation', 'Action', 'Fighting', 'Adventure', 'Strategy'], dtype=object)
plt.figure(figsize=(15, 5))
# Plotting all genres
for platform in all_time["Genre"].unique():
temp = all_time[all_time["Genre"] == platform].groupby('Year').sum().reset_index()
plt.plot(temp['Year'], temp['Global_Sales'], label = platform)
plt.legend(bbox_to_anchor=(1.1, 1.05))
plt.xlabel("Year")
plt.ylabel("Sales")
plt.title("Game Sales per Year for Different Genres");
plt.show()
From this graph we can once again gain a lot of insight. Most importantly, notice the recent domination of the market by action and shooter games. While in the past the sports genre was clearly a top selling genre, Its popularity has dipped down significantly. We can also notice the consistently bad game genres such as platformers, fighting games, strategy games, and adventure games. Especially in relevant times, these genres do extraordinarily badly.
Now that we understand the game market and its important variables, It is time to begin some exploratory analysis.
Taking a minute to return to the idea of the game development dream, I would like to mention that I have already released my first large scale game. Sadly, after three years of development, the game did not do as well as I had hoped. When I began playtesting, I was told that although the game was fun, the genre and platform it was released on sealed its fate as being unsuccessful. This made me wonder to myself, is there a way to predict a games success purely based off of its genre and platforms? If so, could I have saved myself three years of development by predicting the low success of my game? Can I predict the success of my next game idea before beginning development? Answering these questions is the goal of this portion of the project.
To begin, it would be helpful to find a more relevant and exhaustive dataset. For this, RAWG API and python requests were used to grab the 10000 most popular games released between 2015 and 2020. This data is much more representative of the current market trends, and includes more information about game genres and platforms. The RAWG API offers a lot of data for each game, but the only data that will be collected is each game's title, release date, platforms, downloads, rating, tags, and genres. This will prove to be useful when creating our model.
# URL for request
URL = "https://api.rawg.io/api/games?page_size=20000&dates=2015-01-01,2019-12-31&ordering=-added"
# Setting api key for request
PARAMS = {'key' : "b084e7f4218f49f291800dfcca6a229c"}
r = requests.get(url = URL, params = PARAMS)
# Creating new dataframe and converting data to JSON
df = pd.DataFrame();
data = r.json();
try:
# Loop to continue grabbing data until error is thrown
while data['next']:
# Gathering only relevant data and placing in dataframe
for game in data['results']:
if len(df.index) == 0:
df = pd.DataFrame({
'Title': [game['name']],
'Released': [game['released']],
'Platforms': [None] if game['platforms'] == None else
[[platform['platform']['slug'] for platform in game['platforms']]],
'Downloads': [game['added']],
'Rating': [game['rating']],
'Tags': [None] if game['tags'] == None else
[[tag['slug'] for tag in game['tags'] if tag['language'] == 'eng']],
'Genres': [None] if game['genres'] == None else [[genre['slug'] for genre in game['genres']]]
})
else:
df = pd.concat([pd.DataFrame({
'Title': [game['name']],
'Released': [game['released']],
'Platforms': [None] if game['platforms'] == None else
[[platform['platform']['slug'] for platform in game['platforms']]],
'Downloads': [game['added']],
'Rating': [game['rating']],
'Tags': [None] if game['tags'] == None else
[[tag['slug'] for tag in game['tags'] if tag['language'] == 'eng']],
'Genres': [None] if game['genres'] == None else [[genre['slug'] for genre in game['genres']]]
}), df], ignore_index = True)
data = requests.get(url = data['next']).json()
except KeyError:
print("Done")
df.tail()
Done
Title | Released | Platforms | Downloads | Rating | Tags | Genres | |
---|---|---|---|---|---|---|---|
9995 | Fallout 4 | 2015-11-09 | [pc, xbox-one, playstation4] | 10785 | 3.79 | [singleplayer, steam-achievements, atmospheric... | [action, role-playing-games-rpg] |
9996 | DOOM (2016) | 2016-05-13 | [pc, xbox-one, playstation4, nintendo-switch] | 10906 | 4.39 | [singleplayer, multiplayer, atmospheric, great... | [shooter, action] |
9997 | Red Dead Redemption 2 | 2018-10-26 | [pc, playstation4, xbox-one] | 12006 | 4.58 | [singleplayer, multiplayer, atmospheric, great... | [adventure, action] |
9998 | Life is Strange | 2015-01-29 | [pc, xbox-one, playstation4, ios, android, mac... | 12544 | 4.11 | [singleplayer, steam-trading-cards, atmospheri... | [adventure] |
9999 | The Witcher 3: Wild Hunt | 2015-05-18 | [pc, playstation5, xbox-one, playstation4, xbo... | 16021 | 4.67 | [singleplayer, atmospheric, full-controller-su... | [adventure, action, role-playing-games-rpg] |
print(f"Total Observations: {df.count().max()}")
print(f"Total Missing Observations: {df.count().max()-df.dropna().count().max()}")
print(f"Percentage of missing Observations: {((df.count().max()-df.dropna().count().max())/df.count().max())*100}")
Total Observations: 10000 Total Missing Observations: 1 Percentage of missing Observations: 0.01
Once the data had been collected, as should always be done, the data was checked for any missing entries. In this case it can be observed that less than 5% of the data was missing, and for this reason the missing data will be ignored.
Now begins the process of preparing the data for our model. With the goal being to classify a game as successful or not based on its genres and platforms, a classifier model is ideal. One very popular and accurate classifier model is the Decision Tree. For this portion of the project, that will be the model used. However, now that a model was chosen, the data must be altered in order to be used. In this case, all inputs must be converted to continuous variables, while all outputs must be converted to discrete variables.
To begin this process, all discrete inputs will be converted to continuous variables using one-hot encoding. A helpful function for this is created below, which creates a list of all unique variables in a given category.
# Dropping missing data
df = df.dropna().sample(frac=1);
# Function designed to return all unique elements of a column
def get_unique(s):
ret = [];
for index, value in s.items():
ret.extend(value);
return np.unique(np.array(ret));
Using this function, it is easy to visualize the unique possibilities for each category. Below, the unique possibilities of the tag, genre, and platform categories are displayed.
unique_platforms = get_unique(df['Platforms']);
unique_tags = get_unique(df['Tags']);
unique_genres = get_unique(df['Genres']);
print(f"Unique Platforms:\n {unique_platforms}\n");
print(f"Unique Tags:\n {unique_tags}\n");
print(f"Unique Genres:\n {unique_genres}\n");
Unique Platforms: ['android' 'gamecube' 'genesis' 'ios' 'linux' 'macintosh' 'macos' 'nes' 'nintendo-3ds' 'nintendo-ds' 'nintendo-switch' 'pc' 'playstation1' 'playstation2' 'playstation3' 'playstation4' 'playstation5' 'ps-vita' 'psp' 'sega-master-system' 'web' 'wii' 'wii-u' 'xbox-old' 'xbox-one' 'xbox-series-x' 'xbox360'] Unique Tags: ['1-bit' '16-bit' '1960s' ... 'zelda-like' 'zelda-style' 'zombies'] Unique Genres: ['action' 'adventure' 'arcade' 'board-games' 'card' 'casual' 'educational' 'family' 'fighting' 'indie' 'massively-multiplayer' 'platformer' 'puzzle' 'racing' 'role-playing-games-rpg' 'shooter' 'simulation' 'sports' 'strategy']
One can notice from the above that there are far too many tags to be useful during classification, so the tags variable will be ignored to avoid overfitting the model. For the next step, the platform and genre data for each game will be converted to continuous data using a one-hot encoding method.
# Creating array of empty arrays
X = [[] for _ in range(df.count().max())]
# Looping through each column of interest
for column in ['Platforms', 'Genres']:
unique_set = get_unique(df[column]);
# Looping through each unique element of that column
for unique in unique_set:
# Adding a one to array if element present otherwise zero
for idx, elem in enumerate(df[column]):
if np.isin(unique, elem).any():
X[idx].append(1)
else:
X[idx].append(0)
Once the input data was successfully converted into continuous data, it was then time to create the output data. For this, rating will be used to determine whether or not the game was well received. The top 50% rated games will be labeled as successful and the other half will be labeled as unsuccessful. This is done below using the pandas "qcut" method.
# Creating copy of dataframe and converting to discrete
disc = df.copy();
disc['Rating'] = pd.qcut(disc['Rating'], q=4, duplicates='drop', labels = ['Disliked', 'Liked'])
disc.head()
Title | Released | Platforms | Downloads | Rating | Tags | Genres | |
---|---|---|---|---|---|---|---|
8459 | Scrap Mechanic | 2016-01-19 | [pc] | 514 | Liked | [singleplayer, multiplayer, atmospheric, steam... | [indie, adventure, action, simulation] |
655 | Isbarah | 2015-02-25 | [pc, macos, linux] | 43 | Disliked | [singleplayer, steam-achievements, steam-tradi... | [indie, action] |
1572 | Cursed Town | 2018-05-25 | [pc] | 52 | Disliked | [singleplayer, steam-achievements, rpg, story-... | [adventure, role-playing-games-rpg] |
2229 | Tori | 2018-04-19 | [pc, macos] | 58 | Disliked | [singleplayer] | [casual, indie, adventure] |
2013 | Final Warrior Quest | 2018-04-12 | [pc] | 56 | Disliked | [singleplayer, full-controller-support, rpg, c... | [indie, role-playing-games-rpg] |
Now the data is prepared for training. In order to test the model's accuracy, a K-Fold testing method with k = 10 will be used. This is done by using the sklearn built in KFold library. The decision tree model is also used from the sklearn library.
from sklearn.model_selection import KFold
from sklearn import tree
from sklearn.metrics import accuracy_score
# Gathering data
y = disc.copy()['Rating'].to_numpy();
print(y[1])
X = np.array(X);
# Splitting data
kf = KFold(n_splits=10)
kf.get_n_splits(X)
avg_acc = 0;
# Testing and training each fold
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# Creating and training model
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)
# Printing and saving accuracy
print(accuracy_score(y_test, clf.predict(X_test)), end = " | ")
avg_acc += (accuracy_score(y_test, clf.predict(X_test)));
# Printing average accuracy
print(f"\nAverage Accuracy: {100*(avg_acc/10)}%")
Disliked 0.792 | 0.797 | 0.79 | 0.802 | 0.773 | 0.786 | 0.804 | 0.791 | 0.785 | 0.7947947947947948 | Average Accuracy: 79.14794794794796%
After training, the model was 79.15% accurate on average which is a strong metric. This answers the first question proposed at the beginning of this section, which was is there a way to predict a games success purely based off of its genre and platforms? The answer seems to be yes based off of those results. Now that this model is understood to be accurate, it can be used to answer the other two questions.
As mentioned previously, I look to test my game idea on this model. The game I developed is called "Stretchy-Man" and is a puzzle platformer only available on pc. However, I can't help but wonder how the success of the game would change if it were created by a larger AAA studio. If this were done, the game would have been available on more platforms, and would no longer be categorized as an indie game.
The below code will convert those two game ideas into input data for the model.
# Original game in form of data
stretch = {'Platforms': ['pc'], 'Genres': ['platformer', 'puzzle', 'indie']}
stretch_cont = [];
# AAA version in form of data
new = {'Platforms': ['pc', 'xbox-one', 'playstation4', 'nintendo-switch'], 'Genres': ['platformer', 'puzzle']}
new_cont = [];
# One-Hot encoding
for column in ['Platforms', 'Genres']:
unique_set = get_unique(df[column]);
for unique in unique_set:
if np.isin(unique, stretch[column]).any():
stretch_cont.append(1)
else:
stretch_cont.append(0)
if np.isin(unique, new[column]).any():
new_cont.append(1)
else:
new_cont.append(0)
Now that we have the games in the correct format for input, the model will be trained on all available data and then will be used to predict the likability of both games.
# Fitting with all data
clf = clf.fit(X, y)
# Predicting success of both games
a = clf.predict([stretch_cont, new_cont])
print(f"Result of Stretchy-Man (indie): {a[0]}");
print(f"Result of Stretchy-Man (AAA): {a[1]}");
Result of Stretchy-Man (indie): Disliked Result of Stretchy-Man (AAA): Liked
As expected, the AAA version of the game resulted in a more liked game according to the model. In addition, it turns out I may have been able to save myself the effort of creating my game by testing it on this model.
However, these results beg the question, does a liked game necessarily mean a more successful game? That is, does a game's rating affect the number of downloads the game will receive. To answer this, it will first be helpful to visualize this data.
Now that we are able to predict a game's rating based on its features, another question is raised. Can we predict a game's success based on its rating? In this case we will be quantifying success by looking at the number of downloads a game gets after release. To begin, let's try to visualize the data to see if there are any visible trends between ratings and downloads. For this, the rating data is going to be converted into a discrete variable. Ratings will be binned based on the integers they lie between.
# Creating copy of data and converting the rating to discrete once again
disc = df.copy()
disc['Rating'] = pd.cut(disc['Rating'], [0, 1, 2, 3, 4, 5],
labels=['0-1', '1-2', '2-3', '3-4','4-5'],
include_lowest = True)
disc.head()
Title | Released | Platforms | Downloads | Rating | Tags | Genres | |
---|---|---|---|---|---|---|---|
8459 | Scrap Mechanic | 2016-01-19 | [pc] | 514 | 4-5 | [singleplayer, multiplayer, atmospheric, steam... | [indie, adventure, action, simulation] |
655 | Isbarah | 2015-02-25 | [pc, macos, linux] | 43 | 0-1 | [singleplayer, steam-achievements, steam-tradi... | [indie, action] |
1572 | Cursed Town | 2018-05-25 | [pc] | 52 | 0-1 | [singleplayer, steam-achievements, rpg, story-... | [adventure, role-playing-games-rpg] |
2229 | Tori | 2018-04-19 | [pc, macos] | 58 | 0-1 | [singleplayer] | [casual, indie, adventure] |
2013 | Final Warrior Quest | 2018-04-12 | [pc] | 56 | 0-1 | [singleplayer, full-controller-support, rpg, c... | [indie, role-playing-games-rpg] |
Now, all games in each rating range will have their downloads averaged. A bar plot will be created to visualize the relationship between average downloads and rating.
# Displaying bar graph
temp = disc.groupby('Rating').mean().reset_index();
plt.figure(figsize=(10, 5.5))
plt.bar(temp['Rating'], temp['Downloads'], color=['indianred', 'orange', 'yellow', 'greenyellow', 'lightgreen'])
plt.xlabel("Rating Range")
plt.ylabel("Average Downloads")
plt.title("Rating Effect on Average Downloads");
plt.show()
When looking at this graph, the relationship between average downloads and rating becomes clear. As can be seen, games with higher ratings tend to do better in terms of downloads. This makes sense in terms of the data because people are more likely to download a game if it has positive reviews.
Now that a relationship has been established, lets attempt to fit a model to it. As we hope to predict a continuous variable from a continuous variable, and the data above looks quite linear, a linear regression seems fitting. For this, the sklearn python library will be used.
# Importing linear regression model from sklearn
from sklearn.linear_model import LinearRegression
# Fitting to original data
reg = LinearRegression().fit(df['Rating'].array.reshape(-1, 1), df['Downloads'])
print(f"m = {reg.coef_[0]}")
print(f"b = {reg.intercept_}")
m = 201.51801452096663 b = 58.16598706508057
Above are the results of fitting the linear regression model. As was predicted, the linear regression resulted in a line with a positive slope. With this model, a game with a rating of zero is predicted to get about 58 downloads, while a game with a rating of 5 is predicted to get about 1064 downloads. Now that we have a linear model, it may be interesting to plot it alongside the data.
plt.figure(figsize=(11, 5))
# Plotting data points
plt.scatter(df['Rating'], df['Downloads'],c="lightcoral", s=30, alpha=0.2)
# Plotting line
x = np.linspace(0,5,100)
plt.plot(x, reg.coef_[0]*x+reg.intercept_, c = "#1f77b4", linewidth=2)
plt.xlabel("Rating")
plt.ylabel("Downloads")
plt.title("Effect of Rating on Downloads");
plt.show()
Here we can see the data and the model. Once again, a positive correlation can be seen. However, there are some really interesting traits about this graph that should be noted. Notice how low the linear regression line is on the graph. This can be explained by taking into account the density of points on the graph. Notice the density at the bottom of the graph compared to the top. With this in mind, the points very high up on the graph can be looked at as outliers. However, because the line is so low it brings into question the accuracy of the linear model. Below, the statsmodels python library will be used to quantify the models accuracy.
import statsmodels.api as sm
# Running another regression
X = df['Rating'].array.reshape(-1, 1)
est = sm.OLS(df['Downloads'], sm.add_constant(X))
print(est.fit().summary())
OLS Regression Results ============================================================================== Dep. Variable: Downloads R-squared: 0.149 Model: OLS Adj. R-squared: 0.149 Method: Least Squares F-statistic: 1757. Date: Mon, 16 May 2022 Prob (F-statistic): 0.00 Time: 17:43:53 Log-Likelihood: -81003. No. Observations: 9999 AIC: 1.620e+05 Df Residuals: 9997 BIC: 1.620e+05 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 58.1660 10.816 5.378 0.000 36.965 79.367 x1 201.5180 4.807 41.918 0.000 192.094 210.942 ============================================================================== Omnibus: 11818.495 Durbin-Watson: 1.999 Prob(Omnibus): 0.000 Jarque-Bera (JB): 1538866.816 Skew: 6.265 Prob(JB): 0.00 Kurtosis: 62.470 Cond. No. 3.35 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Here, the p-values of each variable is displayed, along with the F-statistic. Because both p-values are below the 5% threshold and the F-statistic is significant, the null hypothesis of no relationship between downloads and rating will be rejected.
I would like to take a second to return to the idea of the game development dream. What can be concluded about its practicality? When it comes to a games success, there are two very important variables to take into account. The popularity of its genre and platforms. These factors decide a game's rating and downloads. But suppose you do come up with a game concept that receives 5 star ratings. What should you expect in terms of success? According to the above analysis, one can expect only about 1064 downloads.
So what does this say about the dream? Statistically, it is unrealistic. One should not drop out of college in hopes of striking gold in the form of a successful game. If game development truely ones passion they should consider joining a larger studio. However, for as long as it does not get in the way of other work, and for as long as it is still one's passion, it cannot hurt to keep the dream alive a little longer in hopes your game is an outlier.
First Data Set: https://www.kaggle.com/datasets/gregorut/videogamesales
Second Data Set: https://rawg.io/apidocs