PII Detector for LLMs — Governance

Chris Shayan
1 min readJul 30, 2024

--

After “LLM Part 7 | Governance” I got some questions on how to build the PII/SPI detector for LLM outputs. In the following video, I am using presidio, spacy, and guardrails (however, imho, nemo’s sensitive_data_detection is better option).

#  pip install presidio-analyzer presidio-anonymizer -q
# python -m spacy download en_core_web_lg -q
# pip install guardrails-ai
# guardrails hub install hub://guardrails/detect_pii --quiet

from guardrails.hub import DetectPII
from guardrails.types import OnFailAction
import guardrails as gr
from rich import print

# Create Guard object with this validator One can specify either pre-defined set of PII or SPI (Sensitive Personal
# Information) entities by passing in the `pii` or `spi` argument respectively. It can be passed either during
# initialization or later through the metadata argument in parse method. One can also pass in a list of entities
# supported by Presidio to the `pii_entities` argument.
pii_guard = gr.Guard().use(DetectPII(pii_entities="pii", on_fail=OnFailAction.FIX))

# Parse the text
pii_text = ("My email address is me@chrisshayan.com and my phone number is 1234567890")
pii_output = pii_guard.parse(llm_output=pii_text,)
print(pii_output)

spi_text = ("My email address is me@chrisshayan.com, my credit card is 012345678912")
spi_guard = gr.Guard().use_many(DetectPII(pii_entities="spi", on_fail=OnFailAction.FIX))
spi_output = spi_guard.parse(llm_output=spi_text, )
print(spi_output)

--

--

Chris Shayan
Chris Shayan

Written by Chris Shayan

Head of AI at Backbase The postings on this site are my own and do not necessarily represent the postings, strategies or opinions of my employer.