Tensorflow如何用于使用Python对与stackoverflow问题数据集相关联的文本数据进行矢量化处理?
Tensorflow是Google提供的一种机器学习框架。它是一个开放源代码框架,与Python结合使用以实现算法,深度学习应用程序等等。它用于研究和生产目的。它具有优化技术,可帮助快速执行复杂的数学运算。
ThisisbecauseitusesNumPyandmulti-dimensionalarrays.Thesemulti-dimensionalarraysarealsoknownas‘tensors’.Theframeworksupportsworkingwithdeepneuralnetworks.Itishighlyscalableandcomeswithmanypopulardatasets.ItusesGPUcomputationandautomatesthemanagementofresources.Itcomeswithmultitudeofmachinelearninglibrariesandiswell-supportedanddocumented.Theframeworkhastheabilitytorundeepneuralnetworkmodels,trainthem,andcreateapplicationsthatpredictrelevantcharacteristicsoftherespectivedatasets.
The‘tensorflow’packagecanbeinstalledonWindowsusingthebelowlineofcode−
pip install tensorflow
TensorisadatastructureusedinTensorFlow.Ithelpsconnectedgesinaflowdiagram.Thisflowdiagramisknownasthe‘Dataflowgraph’.Tensorsarenothingbutamultidimensionalarrayoralist.
我们正在使用Google合作实验室来运行以下代码。GoogleColab或Colaboratory可帮助在浏览器上运行Python代码,并且需要零配置并免费访问GPU(图形处理单元)。合作已建立在JupyterNotebook的基础上。
示例
以下是将文本数据向量化的代码片段-
print("The vectorize function is defined") def int_vectorize_text(text, label): text = tf.expand_dims(text, -1) return int_vectorize_layer(text), label print(" A batch of the dataset is retrieved") text_batch, label_batch = next(iter(raw_train_ds)) first_question, first_label = text_batch[0], label_batch[0] print("问题是: ", first_question) print("标签是: ", first_label) print("'binary' vectorized 问题是:", binary_vectorize_text(first_question, first_label)[0]) print("'int' vectorized 问题是:", int_vectorize_text(first_question, first_label)[0])
代码信用-https://www.tensorflow.org/tutorials/load_data/text
输出结果
The vectorize function is defined A batch of the dataset is retrieved 问题是: tf.Tensor(b'"function expected error in blank for dynamically created check box when it is clicked i want to grab the attributevalue.itis working in ie 8,9,10 but not working in ie 11,chrome shows function expected error..<input type=checkbox checked=\'checked\' id=\'symptomfailurecodeid\' tabindex=\'54\' style=\'cursor:pointer;\' onclick=chkclickevt(this); failurecodeid=""1"" >...function chkclickevt(obj) { . alert(obj.attributes(""failurecodeid""));.}"\n', shape=(), dtype=string) 标签是: tf.Tensor(2, shape=(), dtype=int32) 'binary' vectorized 问题是: tf.Tensor([[1. 1. 1. ... 0. 0. 0.]], shape=(1, 10000), dtype=float32) 'int' vectorized 问题是: tf.Tensor( [[ 37 464 65 7 16 12 879 262 181 448 44 10 6 700 3 46 4 2085 2 473 1 6 156 7 478 1 25 20 156 7 478 1 499 37 464 1 1846 1666 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]], shape=(1, 250), dtype=int64)
说明
二进制模式返回一个数组,该数组指示令牌的存在。
在int模式下,每个标记都被一个整数替换。
这样,订单将被保留。
定义了矢量化功能。
对数据样本进行矢量化处理,并在控制台上显示矢量化的“二进制”和“int”模式
可以在该特定层上使用“get_vocabulary”方法来查找该字符串。