The next article's about Text Mining 😂
Stages of analysis in text mining is to collect data and then extract the features to be used.
By applying processes in text mining, we will obtain data patterns, trends, and extraction of potential knowledge from text data. That's just a glimpse of information about text mining.
Let's discuss an important part of text mining, it was text pre-processing.
Stage of Text Pre-processing:
- Case Folding is a conversion process or simply the process of converting the entire text in a document into a standard form, usually lowercase.
- Tokenizing is the process of cutting the input string based on each word that makes it up, or the process of dividing sentences into tokens.
- Filtering is the process of taking important words from the token results. The filtering stage can be done by deleting the stoplist / stopword (removing the less important words).
- Stemming is the stage of returning the words obtained from the results of filtering to its basic form, eliminating the initial prefix (prefix) and the final prefix (suffix) so that the basic word is obtained.
Implementation
1. Import Library
import string
from openpyxl import load_workbook
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from Sastrawi.StopWordRemover.StopWordRemoverFactory import StopWordRemoverFactory
from Sastrawi.Stemmer.StemmerFactory import StemmerFactory
stemmer = StemmerFactory().create_stemmer()
remover = StopWordRemoverFactory().create_stop_word_remover()
translator = str.maketrans('', '', string.punctuation)
2. Load the Data
data
3. Build a Custom Library
porter = PorterStemmer()
stop = set(stopwords.words('english'))
text = text.lower()
text = [i for i in text.lower().split() if i not in stop]
text = ' '.join(text)
preprocessed_text = text.translate(translator)
text_stem = porter.stem(preprocessed_text)
return text_stem
def preprocessing(text):
text = text.lower()
text_clean = remover.remove(text)
text_stem = stemmer.stem(text_clean)
text_stem = stemming(text_stem)
return text_stem
4. Preprocessing
preprocessed = []
for dt in data['Corpus']:
preprocessed.append(preprocessing(dt))
preprocessed
Dataset Samples :
Suara seorang perempuan terdengar dari arah luar rumah. Dari caranya memanggil, bisa terlihat jelas sifat dan wataknya. “Kapan kamu mau melunasi utangmu? Sudah lebih dari enam bulan kontrakan rumah belum dibayar! Utangmu di warungku juga sudah numpuk! Janjinya bulan depan… bulan depan… bulan depannya lagi! Aku sudah muak dengan janji-janjimu!”
Ketika gagang pintu ditarik dan pintunya bergeser membuka dan membentuk sudut enam puluh derajat, Bu Rumi sudah berdiri tepat di tengah pintu. Seperti ratu kuntilanak menyeramkan. Mirip setan keorangan_bukan orang kesetanan_. Mukanya merah marah. Dua orang berkaos hitam ketat di samping kirinya. Badan kekar. Lengan penuh tato. Yang satu plontos, yang satunya rambut cepak mirip AKABRI masuk desa.
“Ummi, uang ini akan lebih bermanfaat untuk keluarga Bu Fatimah. Mungkin tidak akan ada lagi bulan depan untukku. Mohon pengertiannya, Mi. Ummi juga sudah dengar kata dokter dua minggu lalu.’’ Fatih menatap lekat-lekat Umminya. Mencoba memastikan. Memberikan pengertian sekaligus agar diberi izin memberikan uang yang sudah dipegangnya.
Text preprocessing results :
Very Informative and creative contents. This concept is a good way to enhance knowledge. Thanks for sharing. Continue to share your knowledge through articles like these.
ReplyDeleteData Engineering Services
Artificial Intelligence Services
Data Analytics Services
Data Modernization Services
Very Informative and creative contents. This concept is a good way to enhance knowledge. Thanks for sharing. Continue to share your knowledge through articles like these.
ReplyDeleteData Engineering Services
Data Analytics Solutions
Artificial Intelligence Solutions
Data Modernization Solutions