The error I receive is this: Traceback (most recent call last):įile "cluster_example.py", line 20, in get_wordsįile "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 183, in memoizeįile "cluster_example.py", line 14, in normalize_wordįile "/usr/local/lib/python2.7/dist-packages/nltk/stem/snowball. # called when you are classifying previously unseen examples!Ĭluster.classify(vectorspaced(title)) for title in job_titlesįor cluster_id, title in sorted(zip(classified_examples, job_titles)): # NOTE: This is inefficient, cluster.classify should really just be # cluster = KMeansClusterer(5, euclidean_distance)Ĭluster.cluster() To solve the error, specify the correct encoding, e.g. Any unicode value, whether in the method, the url, or the keys/values of the headers dict, will cause the entire body of the message to be 'promoted' to a unicode string. To fix this, you can specify the correct codec when opening the file or when calling the decode method. The Python UnicodeDecodeError: 'ascii' codec can't decode byte in position occurs when we use the ascii codec to decode bytes that were encoded using a different codec. Word in title_components and not word in stopwords The 'UnicodeDecodeError: 'ascii' codec can't decode byte' error occurs when trying to decode non-ASCII bytes using the ASCII codec. Now the demo code I'm trying to run is this: import sysįrom nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance Rules bands follow performing jägermeister stage I faced a few of those while porting to Python 3 (in the fetch result call) and will see if possible to fix at some point. There are a few tricky unicode places to handle correctly. So for example, my text file is something like this: belong finger death punch Indeed, this is has not been working for a while. I'm using NLTK to perform kmeans clustering on my text file in which each line is considered as a document. Subject: Re: Bug859294: hunspell-dict-ko: FTBFS: UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 38: ordinal not in range(128) Date: Mon, 04:13:29 +0900 OK, another encoding parameter is needed.
0 Comments
Leave a Reply. |