Naive Bayes - Multinomial Classification¶
['alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc']
From: dmcgee@uluhe.soest.hawaii.edu (Don McGee)
Subject: Federal Hearing
Originator: dmcgee@uluhe
Organization: School of Ocean and Earth Science and Technology
Distribution: usa
Lines: 10
Fact or rumor....? Madalyn Murray O'Hare an atheist who eliminated the
use of the bible reading and prayer in public schools 15 years ago is now
going to appear before the FCC with a petition to stop the reading of the
Gospel on the airways of America. And she is also campaigning to remove
Christmas programs, songs, etc from the public schools. If it is true
then mail to Federal Communications Commission 1919 H Street Washington DC
20054 expressing your opposition to her request. Reference Petition number
2493.
In order to use this data we need to convert the content of each string into a vector of numbers using TF-IDF vectorizer
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline
?Documentation for PipelineiFitted
Parameters
steps | [('tfidfvectorizer', ...), ('multinomialnb', ...)] | |
transform_input | None | |
memory | None | |
verbose | False |
TfidfVectorizer
?Documentation for TfidfVectorizer
Parameters
input | 'content' | |
encoding | 'utf-8' | |
decode_error | 'strict' | |
strip_accents | None | |
lowercase | True | |
preprocessor | None | |
tokenizer | None | |
analyzer | 'word' | |
stop_words | None | |
token_pattern | '(?u)\b\w\w+\b' | |
ngram_range | (1, ...) | |
max_df | 1.0 | |
min_df | 1 | |
max_features | None | |
vocabulary | None | |
binary | False | |
dtype | ||
norm | 'l2' | |
use_idf | True | |
smooth_idf | True | |
sublinear_tf | False |
MultinomialNB
?Documentation for MultinomialNB
Parameters
alpha | 1.0 | |
force_alpha | True | |
fit_prior | True | |
class_prior | None |
