Regular Expression - How to Remove PII Data from File | How to Masked Personal Information from Json Logs | Regex Json Value and Remove #LinuxTopic

Linuxtopic
2
Tags: regular expression, regex, regex and remove, regex and replace, regex exact value and replace, regular expression to print all value,  sed search and replace, sed remove, sed replace regex, sed replace
regular expression, regex, regex and remove, regex and replace, regex exact value and replace, regular expression to print all value,  sed search and replace, sed remove, sed replace regex, sed replace
regular expression - regex exact value and replace
Introduction :

To remove the personal information from the json output logs using the sed command and share with third party vendor  for troubleshooting the issue.

In this tutorial we will learn blow points

1 - how to remove personals information of customer from the log file ?
2 - how to mask PII data in json logs output ?
3 - how to masked personal information in file and share with vendor ?
4 - how to regex json value and replace ?
5 - grep value and remove from file
6 - how to use regular expression in linux

We we need to remove Personals data  ?

Suppose you we working on an application, while working we got an error and we tried to resolve but cant, so we need application vendor help and have to involve. 

Vendor need logs to understanding the issue, here is a big challenge to share the logs direclty because many organization did not allow to share customer information to third party/vendor.

So  will masked the  personal data from logs and share with vendor, for the single or double entity we can do it by manually but multiple or big logs file we can not do it easily means it is  too time consuming.. 

OS / Tools / Command

1 - Linux
2 - Bash
3 - grep
4 - sed
5 - logs file ( json logs )

Step 1:

Copy the logs file, which  content a personal information for removing, in this tutorial we have log file in blow directory and file name

log file path :  /var/log/event.log

cp -rv /var/log/event.log /tmp/



review logs file and notedown which value you want to remove 

cat /var/log/event.log


In the above we have some PI Field and can not share this info outside of organization line, like
  • accountNumber
  • operator 
  • custName 
  • custID
  • cardHolder 
  • card 
  • Phone 
  • Address
We will pick one value and print using grep command, like accountNumber, in this field we have only numeric value / digit value / number value.

"accountNumber":"350211212093"

we will grep only  accountNumber first in above logs, below grep command will print only match word
 
grep -o "accountNumber" /tmp/event.log


Now we will grep ":"

grep -o "accountNumber\":\"" /tmp/event.log


In the above example we use backslash  to escape double quote because we have double quote after the number and then colon (:) and again backslash double quote.

Note: We used backslash to escape any special character in grep  

In the next step we will use regular expression to grep number/digit value, we used [0-9] regex in below command with grep.

grep -o "accountNumber\":\"[0-9]" /tmp/event.log


Above example printed a only one digit. we will use  ( + ) plus to print all digit between double quote.
 
grep -o "accountNumber\":\"[0-9]\+" /tmp/event.log


In the final step we will print last double quote

grep -o "accountNumber\":\"[0-9]\+\"" /tmp/event.log


So we printed actual value from the logs file, now we will use sed command to replace or remove the account number.
 
sed 's/accountNumber\":\"[0-9]\+/accountNumbter\":\"REMOVED/g' /tmp/event.log


We removed account number value using sed in dry run of sed command, we can replace using below command
 
sed  -i 's/accountNumber\":\"[0-9]\+/accountNumbter\":\"REMOVED/g' /tmp/event.log

Value B - alphanumeric and special character 

We will choose a address from the logs file below the address have charator, numbere and special character

"Address":"235/2 Street ll PBL"


We will use regular expression in square brackets below

a-z   -  lower case character 
A-Z -  upper case character 
0-9  -  digit
/ -_@,  - special character

[a-zA-Z0-9\s /_-,]\+

grep -o "Address\":\"[a-zA-Z0-9\s /_-,]\+\"" /tmp/event.log


We got the address value, now again we will use sed command

sed -e 's/Address\":\"[a-zA-Z0-9\s /_-,]\+/Address\":\"REMOVED/g' /tmp/event.log


Replace address value :

sed -i 's/Address\":\"[a-zA-Z0-9\s /_-,]\+/Address\":\"REMOVED/g' /tmp/event.log

Note: As my suggestion we can use  [a-zA-Z0-9\s /_-,]\+ regular expression for replace all above fields, it does not matter character / number / special present in value or not.

Script :

1 - Create a script file as you wish with below content

vi /tmp/mask.sh

# In the script we create a backup first in /tmp directory and then we used a for loop using a sed command.

#!/bin/bash

cp -rv /var/log/event.log /tmp/event.log

for i in accountNumber custID Address operator custName cardHolder card Phone Address
do
sed -i "s/$i\":\"[a-zA-Z0-9/ _-,@*[\s]\+/$i\":\"REMOVED/g" /tmp/event.log

done
Visit Method 2 - Click Here
https://www.linuxtopic.com/2021/05/02-regular-expression-regex-replace-all.html




Thanks you !! 

I hope this topic gave you all the information you needed. If you have any further questions or would like more detailed directions feel free to contact us using any of the following sources. We look forward to talking to you.

Post a Comment

2Comments

  1. When your website or blog goes live for the first time, it is exciting. That is until you realize no one but you and your. file upload

    ReplyDelete
Post a Comment

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Check Now
Ok, Go it!