ป้องกันข้อมูลรั่วไหลด้วยฟังก์ชัน Vercel Edge และ Upstash Redis เพื่อการกรองที่ปลอดภัยและมีความหน่วงต่ำ

ข้อมูลรั่วไหลเป็นปัญหาสำคัญบนอินเทอร์เน็ต According to Statista, over 400 million people were affected by a data leak in 2022 alone. ไม่มีใครอยากให้ข้อมูลของตนรั่วไหลไปทั่วอินเทอร์เน็ต ดังนั้นการสร้างแอปพลิเคชันที่ปลอดภัยซึ่งเคารพความเป็นส่วนตัวของผู้ใช้จึงเป็นสิ่งสำคัญในหลายอุตสาหกรรม

One way to fight leaks is to filter problematic data before delivering it to other services or users. วิธีการนี้ต้องใช้ตัวกรองที่ทันสมัยเพื่อให้แน่ใจว่าข้อมูลจะไม่หลุดออกไปและโครงสร้างพื้นฐานที่มีความหน่วงต่ำ เพื่อรักษาผลกระทบด้านประสิทธิภาพของการกรองให้น้อยที่สุดเท่าที่จะเป็นไปได้

ความสามารถที่เหนือกว่าสำหรับฟังก์ชัน Redis® และ Vercel Edge ก่อให้เกิดทีมที่ทรงพลังที่สามารถจัดการกับปัญหาในขณะที่ปฏิบัติตามข้อกำหนดทั้งสองประการ Both of these low-latency serverless solutions can be deployed close to our users. ด้วยฟีเจอร์ cron ใหม่ของ Vercel เราจึงสามารถมั่นใจได้ว่าข้อมูลตัวกรองจะได้รับการอัปเดตเป็นประจำ

เพื่อแสดงให้เห็นว่าตัวกรองดังกล่าวทำงานอย่างไร เราจะสร้างส่วนหน้าและส่วนหลังที่ใช้ประโยชน์จากเทคโนโลยี Serverless Edge นี้เพื่อกรองคำหยาบคาย

คุณสมบัติ

The app will use Vercel's cron feature to update an Upstash for Redis database with the current words from a remote API.

เราจะมีสามวิธีในการรับข้อมูล:

การแสดงเว็บไซต์ที่มีข้อความกรอง
Returning JSON with filtered text from a data store.
Accepting text and returning JSON with the filtered text.

เทคโนโลยี

We will build the app with Next.js and deploy it on Vercel; ด้วยวิธีนี้ เราจะมีประสบการณ์การพัฒนาแบบไร้เซิร์ฟเวอร์ที่ราบรื่นเมื่อใช้ฟังก์ชัน Edge

เราจะใช้ Upstash สำหรับ Redis® เป็นที่จัดเก็บข้อมูลเนื่องจากมีเวลาแฝงต่ำและใช้งานง่าย

บริการทั้งสองมาพร้อมกับระดับฟรีและมีราคาตามความต้องการ

ข้อกำหนดเบื้องต้น

เราต้องการบัญชีสำหรับบริการ:

GitHub เพื่ออัปโหลดโค้ดของเราเพื่อให้ Vercel สามารถดาวน์โหลดและปรับใช้ได้
Vercel เพื่อโฮสต์แอปพลิเคชันของเรา พร้อมด้วยฟังก์ชันโฮมเพจและ Edge
Upstash to store the list of words we want to filter

การใช้งาน

To get started, we create a new GitHub repository and ensure. เราทำเครื่องหมายที่ "เพิ่มไฟล์ README" ดังนั้นไฟล์จึงไม่ว่างเปล่า เนื่องจากพื้นที่เก็บข้อมูลไม่ว่างเปล่า GitHub จึงอนุญาตให้เราเริ่มต้น Codespace สำหรับพื้นที่เก็บข้อมูลดังกล่าว ซึ่งมาพร้อมกับการกำหนดค่าล่วงหน้าด้วย Node.js และการเชื่อมต่อ Git-to-GitHub

ป้องกันข้อมูลรั่วไหลด้วยฟังก์ชัน Vercel Edge และ Upstash Redis เพื่อการกรองที่ปลอดภัยและมีความหน่วงต่ำ รูปที่ 1:เริ่ม Codespace

ขั้นแรก เราต้องสร้างโปรเจ็กต์ Next.js ใหม่ และติดตั้งไคลเอ็นต์ Upstash Redis ด้วยคำสั่งต่อไปนี้:

$ npx create-next-app@latest --typescript
$ npm i @upstash/redis

Implementing the Refresh Function

คุณลักษณะแรกที่เราจะใช้คือฟังก์ชันที่จะรีเฟรชรายการคำที่ไม่ดีของเรา โดยสร้างไฟล์ใหม่ที่ 08 with the following content:

ไฟล์ 12 :

import { Redis } from "@upstash/redis";
 
export const config = { runtime: "edge" };
 
const redisClient = new Redis({
 url: process.env.UPSTASH_REDIS_URL,
 token: process.env.UPSTASH_REDIS_TOKEN,
});
 
export default async function handler() {
 const wordResponse = await fetch(
 "https://raw.githubusercontent.com/kay-is/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/master/en",
 );
 const words = await wordResponse.text();
 
 const redisCommands = redisClient.pipeline();
 words
 .trim()
 .split("\n")
 .forEach((word) => redisCommands.sadd("words", word));
 await redisCommands.exec();
}

เราเริ่มต้นด้วยการกำหนดค่าฟังก์ชันให้ทำงานบนขอบ นี่ไม่จำเป็นเนื่องจากฟังก์ชันนี้จะทำงานในเบื้องหลัง แต่ฟังก์ชัน Vercel Edge ขับเคลื่อนโดย Cloudflare Workers ซึ่งช่วยให้เราใช้วิธีดึงข้อมูล ซึ่ง Node.js ไม่รองรับโดยกำเนิด

ตัวจัดการจะโหลดไฟล์ข้อความที่มีคำที่ไม่เหมาะสมลงใน Upstash Redis The pipeline feature ensures we only send one request with all Redis commands.

We use a set to store the words, so we don't get duplicates. It also allows us to load all the words as an array of strings with one command later.

หากเราต้องการกรองข้อมูลส่วนบุคคล เราสามารถใช้ฐานข้อมูลบัญชีเพื่อรับอีเมล หมายเลขโทรศัพท์ ชื่อ และที่อยู่ของผู้ใช้ของเราเป็นพื้นฐานในการกรอง

เพื่อบอก Vercel ว่าเราต้องการให้ฟังก์ชันนี้เป็นฟังก์ชัน cron เราจำเป็นต้องสร้างไฟล์ vercel.json ในรูทโปรเจ็กต์โดยมีเนื้อหาดังต่อไปนี้:

ไฟล์ 24 :

{
 "crons": [
 {
 "path": "/api/refresh-list",
 "schedule": "0 10 * * *"
 }
 ]
}

This config will cause Vercel to execute the refresh-list function daily at 10:00 UTC.

Free Vercel accounts only support one automatic execution per day. For our example, that's enough, but if we have data that changes more often, we should increase the update rate.

Implementing the Filter Utility

คุณลักษณะถัดไปคือฟังก์ชันอรรถประโยชน์ที่จะปกปิดคำในข้อความเมื่อตรงกับคำในฐานข้อมูลของเรา Create a new file at utils/word-filter.ts and add the following code.

ไฟล์ 37 :

import { Redis } from "@upstash/redis";
 
const redisClient = new Redis({
 url: process.env.UPSTASH_REDIS_URL,
 token: process.env.UPSTASH_REDIS_TOKEN,
});
 
export async function filter(text: string) {
 const filteredWords = await redisClient.smembers("words");
 
 let maskedText = text;
 for (let word of filteredWords)
 maskedText = maskedText.replaceAll(new RegExp(word, "gi"), "[REDACTED]");
 
 return maskedText;
}

อีกครั้งที่ฟังก์ชันนี้ใช้ไคลเอ็นต์ Upstash Redis แต่คราวนี้จะโหลดข้อมูลที่เราบันทึกไว้ก่อนหน้านี้

เนื่องจากเราได้รับอาร์เรย์ของสตริง เราจึงสามารถวนซ้ำและเรียกใช้ฟังก์ชันการแทนที่ซึ่งจะแทนที่ทุกคำที่ไม่เหมาะสมในข้อความด้วย "[ข้อมูลปกปิด]"

ฟังก์ชันนี้ไม่สนใจประเภทของคำที่กรองออก ในกรณีนี้คำว่า "ไม่ปลอดภัยสำหรับการทำงาน" แต่กระบวนการกรองจะขึ้นอยู่กับข้อมูลที่เราเก็บไว้ก่อนหน้านี้เท่านั้น

การใช้งานหน้าแรก

หากต้องการดูการทำงานของตัวกรอง ให้แทนที่เนื้อหาของ 44 โดยมีดังต่อไปนี้:

ไฟล์ 57 :

import Head from "next/head";
import { filter } from "@/utils/word-filter";
 
export const config = { runtime: "experimental-edge" };
 
interface HomeProps {
 maskedText: string;
}
 
export default function Home(props: HomeProps) {
 return (
 <>
 <Head>
 <title>Text with Filtered Words</title>
 </Head>
 <div>
 <h1>Text with Filtered Words</h1>
 <p>{props.maskedText}</p>
 </div>
 </>
 );
}
 
export async function getServerSideProps(): Promise<{ props: HomeProps }> {
 const maskedText = await filter(
 "He slipped and fell on his butt. Well, that wasn't very sexy."
 );
 return { props: { maskedText } };
}

The config ensures everything is executed on edge, even the server-side rendering. คุณลักษณะ Vercel นี้ยังคงอยู่ในช่วงทดลอง

ส่วนที่น่าสนใจคือ 64 ฟังก์ชั่นซึ่งใช้ 75 ของเรา ฟังก์ชันจากก่อนหน้าบนข้อความคงที่ มีการเรียกบนเซิร์ฟเวอร์เท่านั้น ดังนั้นข้อมูลที่ไม่มีการกรองจึงไม่สามารถเข้าถึงไคลเอนต์ได้

In a real application, this text might come from a database with personal data that needs to be cleaned up before it's displayed.

การใช้งานเส้นทาง API แรก

เส้นทาง API แรกทำงานเหมือนกับหน้าแรก it returns JSON and no HTML. สร้างไฟล์ที่ 80 ด้วยรหัสนี้:

ไฟล์ 98 :

import { filter } from "@/utils/word-filter";
 
export const config = { runtime: "edge" };
 
export default async function handler() {
 const maskedText = await filter(
 "He slipped and fell on his butt. Well, that wasn't very sexy.",
 );
 
 return new Response(JSON.stringify({ text: maskedText }), {
 status: 200,
 headers: { "content-type": "application/json" },
 });
}

ขอย้ำอีกครั้งว่ารันไทม์เป็นแบบ Edge และเช่นเดียวกับฟังก์ชัน getServerSideProps เราใช้ข้อความคงที่

Implementing the Second API Route

เส้นทางนี้ยอมรับข้อความผ่านการร้องขอและส่งกลับเวอร์ชันที่กรอง สร้างไฟล์ใหม่ที่ 107 และเพิ่มรหัสต่อไปนี้:

ไฟล์ 116 :

import type { NextApiRequest } from "next";
 
import { filter } from "@/utils/word-filter";
 
export const config = { runtime: "edge" };
 
export default async function handler(request: NextApiRequest) {
 const { text } = await new Response(request.body).json();
 const maskedText = await filter(text);
 return new Response(JSON.stringify({ text: maskedText }), {
 status: 200,
 headers: { "content-type": "application/json" },
 });
}

This time, we must parse the body to get to the text we want to filter. In Vercel's Edge Functions, the 120 คือ 139; ถ้าเราแปลงมันเป็น 143 เราสามารถใช้ตัวแยกวิเคราะห์ JSON ดั้งเดิมเพื่อแยกข้อมูลได้

After we get the data from the request, everything works as before.

การเปลี่ยนแปลงรหัสพุช

เมื่อทุกอย่างถูกนำไปใช้แล้ว เราจำเป็นต้องพุชโค้ดไปยังที่เก็บ GitHub ของเราด้วยคำสั่งเหล่านี้:

$ git add -A
$ git commit -m "Init"
$ git push

หลังจากนี้ รหัสจะพร้อมใช้งานออนไลน์เพื่อให้ Vercel ดาวน์โหลดและปรับใช้

การปรับใช้

We need to create an Upstash Redis database, to get the credentials for the environment variables and a Vercel project.

การสร้างฐานข้อมูล Redis

เราสามารถสร้างฐานข้อมูล Redis ใหม่ใน Upstash Console ได้โดยคลิกปุ่ม "สร้างฐานข้อมูล" รูปที่ 2 แสดงการกำหนดค่า สำหรับตัวอย่างนี้ ฐานข้อมูลระดับภูมิภาคก็เพียงพอแล้ว แต่หากคุณมีผู้ใช้ที่กระจายไปทั่วโลกและต้องการให้เวลาแฝงต่ำ คุณสามารถเลือกประเภทสากลได้เช่นกัน

ป้องกันข้อมูลรั่วไหลด้วยฟังก์ชัน Vercel Edge และ Upstash Redis เพื่อการกรองที่ปลอดภัยและมีความหน่วงต่ำ

Figure 2:Create a new database

หลังจากการสร้าง เราสามารถค้นหา URL และโทเค็นที่จำเป็นสำหรับตัวแปรสภาพแวดล้อมของเราภายใต้หมวดหมู่ REST API มันดูเหมือนในรูปที่ 3

รูปที่ 3:ข้อมูลรับรองฐานข้อมูล

การสร้างโครงการ Vercel

หากต้องการสร้างโปรเจ็กต์ Vercel ใหม่ ให้เปิด Vercel Dashboard ในเบราว์เซอร์ แล้วคลิก "สร้างโปรเจ็กต์ใหม่" ที่ตรงกลาง After connecting Vercel with your GitHub account, you can choose a repository to import.

We can keep the default configuration and add our environment variables with the Upstash Redis credentials from above. Figure 4 shows Vercel's creation UI for reference.

รูปที่ 4:การสร้างโปรเจ็กต์ Vercel

The names of the environment variables are 158 และ 162 . We use the values from the previous step to create them.

 
 หลังจากคลิก "ปรับใช้" Vercel จะดาวน์โหลดและปรับใช้โค้ดจากที่เก็บ GitHub ของเรา
 การทดสอบแอป
 
 หลังจากการปรับใช้ แอปจะยังคงแสดงคำที่ไม่มีการกรองต่อไปเนื่องจากงาน cron ยังไม่ได้ทำงาน แต่เราสามารถทำการดำเนินการครั้งแรกได้ด้วยตนเอง Click the "Continue to Dashboard" button and choose the "Cron Jobs" tab.
 
 ที่นี่เราเห็น 173 ของเรา ฟังก์ชั่นด้วยปุ่ม "Run" เพื่อคลิก
 
 When the function finishes, we navigate to the "Project" tab and click on one of the URLs under "Domain". This will open the website with filtered text in the browser; มันควรจะมีลักษณะเหมือนในรูปที่ 5
 
 รูปที่ 5:เว็บไซต์ที่ถูกกรอง  

 หากเราเพิ่ม 183 ไปยัง URL ของเรา เราจะเห็นว่าสิ่งนี้ใช้ได้กับการตอบสนองของ API ด้วย ควรมีลักษณะเหมือนตัวอย่างต่อไปนี้:
 {
 "text": "He slipped and fell on his [REDACTED]. Well, that wasn't very [REDACTED]y."
}
 
 สุดท้ายนี้ หากเราส่งคำขอผ่าน cURL ไปที่ 195 ปลายทาง เราได้กรองข้อความที่กำหนดเองแล้ว ตรวจสอบให้แน่ใจว่าได้แทนที่ 206 กับโปรเจ็กต์ Vercel ของคุณ
 $ curl -X POST https://<PROJECT>.vercel.app/api/filter \
 -H "Content-Type: application/json" \
 -d '{"text":"He fell on his butt."}'
 
 การตอบสนอง:
 {
 "text": "He fell on his [REDACTED]."
}
 อะไรต่อไป
 
 หลังจากบทช่วยสอนนี้ คุณอาจถามตัวเองว่าจะเกิดอะไรขึ้นหากข้อมูลได้รับการอัปเดต แต่งาน cron ไม่ได้อัปเดตฐานข้อมูล สังเกตดีๆ!
 
 งาน cron จะทริกเกอร์ฟังก์ชันรีเฟรชของเรา แต่ยังคงเป็นฟังก์ชัน API ทั่วไป ดังนั้นเราจึงสามารถเรียกมันได้ตามต้องการ
 
 ในกรณีของตัวกรองข้อมูลจริง เราอาจต้องการทริกเกอร์ฟังก์ชันเพื่อตอบสนองต่อการเปลี่ยนแปลงข้อมูล แต่รายละเอียดการใช้งานนั้นขึ้นอยู่กับพื้นที่จัดเก็บข้อมูลที่เราใช้เป็นฐานสำหรับตัวกรองของเราเป็นอย่างมาก ดังนั้น โปรดจำไว้เสมอเมื่อสร้าง!
 แหล่งข้อมูลเพิ่มเติม
 
 คุณสามารถดูโปรเจ็กต์ทั้งหมดได้ที่ GitHub