Tensorflow และ Python สามารถใช้สร้างเมตริกซ์ที่ขาดๆ หายๆ จากรายการคำได้อย่างไร

สามารถสร้าง RaggedTensor ได้โดยใช้การชดเชยเริ่มต้นของคำในประโยค ประการแรก จุดรหัสของทุกตัวอักษรในทุกคำในประโยคถูกสร้างขึ้น ถัดไป จะแสดงบนคอนโซล กำหนดจำนวนคำในประโยคนั้น ๆ และกำหนดออฟเซ็ต

อ่านเพิ่มเติม: TensorFlow คืออะไรและ Keras ทำงานร่วมกับ TensorFlow เพื่อสร้าง Neural Networks อย่างไร

แสดงสตริง Unicode โดยใช้ Python และจัดการสตริงที่ใช้ Unicode ที่เทียบเท่า ในตอนแรก เราจะแยกสตริง Unicode ออกเป็นโทเค็นตามการตรวจจับสคริปต์โดยใช้ Unicode ที่เทียบเท่ากับ ops สตริงมาตรฐาน

เรากำลังใช้ Google Colaboratory เพื่อเรียกใช้โค้ดด้านล่าง Google Colab หรือ Colaboratory ช่วยเรียกใช้โค้ด Python บนเบราว์เซอร์และไม่ต้องมีการกำหนดค่าใดๆ และเข้าถึง GPU ได้ฟรี (หน่วยประมวลผลกราฟิก) Colaboratory สร้างขึ้นบน Jupyter Notebook

print("Get the code point of every character in every word")
word_char_codepoint = tf.RaggedTensor.from_row_starts(
   values=sentence_char_codepoint.values,
   row_starts=word_starts)
print(word_char_codepoint)
print("Get the number of words in the specific sentence")
sentence_num_words = tf.reduce_sum(tf.cast(sentence_char_starts_word, tf.int64), axis=1)

เครดิตโค้ด:https://www.tensorflow.org/tutorials/load_data/unicode

ผลลัพธ์

Get the code point of every character in every word
<tf.RaggedTensor [[72, 101, 108, 108, 111], [44, 32], [116, 104, 101, 114, 101], [46], [19990, 30028], [12371, 12435, 12395, 12385, 12399]]>
Get the number of words in the specific sentence

คำอธิบาย

โค้ดพอยต์สำหรับอักขระทุกตัวในทุกคำถูกสร้างขึ้น
สิ่งเหล่านี้จะแสดงบนคอนโซล
กำหนดจำนวนคำในประโยคนั้นๆ