PII Detector for LLMs — Governance
1 min readJul 30, 2024
After “LLM Part 7 | Governance” I got some questions on how to build the PII/SPI detector for LLM outputs. In the following video, I am using presidio, spacy, and guardrails (however, imho, nemo’s sensitive_data_detection is better option).
# pip install presidio-analyzer presidio-anonymizer -q
# python -m spacy download en_core_web_lg -q
# pip install guardrails-ai
# guardrails hub install hub://guardrails/detect_pii --quiet
from guardrails.hub import DetectPII
from guardrails.types import OnFailAction
import guardrails as gr
from rich import print
# Create Guard object with this validator One can specify either pre-defined set of PII or SPI (Sensitive Personal
# Information) entities by passing in the `pii` or `spi` argument respectively. It can be passed either during
# initialization or later through the metadata argument in parse method. One can also pass in a list of entities
# supported by Presidio to the `pii_entities` argument.
pii_guard = gr.Guard().use(DetectPII(pii_entities="pii", on_fail=OnFailAction.FIX))
# Parse the text
pii_text = ("My email address is me@chrisshayan.com and my phone number is 1234567890")
pii_output = pii_guard.parse(llm_output=pii_text,)
print(pii_output)
spi_text = ("My email address is me@chrisshayan.com, my credit card is 012345678912")
spi_guard = gr.Guard().use_many(DetectPII(pii_entities="spi", on_fail=OnFailAction.FIX))
spi_output = spi_guard.parse(llm_output=spi_text, )
print(spi_output)