Help with Sentiment Analysis code is required - unexpected outcome!

**BarryA** · Jul 25 '23, 11:47 AM

Hello everyone,

I'm hoping you can help me with a difficulty I'm having with my Sentiment Analysis project. I've been attempting to apply a simple sentiment analysis model to a dataset of movie reviews, but I'm getting some surprising results. Here's the pertinent section of my code:

Code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

# Load the movie reviews dataset
data = pd.read_csv('movie_reviews.csv')

# Preprocess the data
# ... (code for data preprocessing)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data['review'], data['sentiment'], test_size=0.2, random_state=42)

# Vectorize the text data using CountVectorizer
vectorizer = CountVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train_vectorized, y_train)

# Evaluate the model
accuracy = model.score(X_test_vectorized, y_test)
print(f"Accuracy: {accuracy}")

When I run the code, the accuracy is constantly around 50%, which is dangerously similar to random guessing. I assume there's a problem with how I'm vectorizing the text input or training the model, but I can't figure out what's causing this problem.

I verified the dataset and read more about it in this article, and it appears to be successfully loaded with both'review' and'sentiment' columns. I also attempted a simple Naive Bayes classifier, but it didn't help much in accuracy.

Could you kindly evaluate the code and let me know if you find any flaws or improvements that may help me enhance the accuracy of my sentiment analysis model?

Thank you in advance for your help!