Sensitive information in source code - Git history
Need
Protection of sensitive information in source code history
Context
• Usage of Python 3 for developing applications and scripts
• Usage of Django for building web applications in Python
• Usage of psycopg2 for connecting to and interacting with PostgreSQL databases
Description
1. Non compliant code
# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'mydatabase',
        'USER': 'mydatabaseuser',
        'HOST': 'localhost',...The above Python code represents a Django settings file where sensitive information such as database credentials and the secret key for the application are hardcoded into the source code. The `DATABASES` dictionary contains the configuration for the database connection, including the username (`mydatabaseuser`), password (`mypassword`), and other details. The `SECRET_KEY` is a unique key used by Django for various cryptographic signing tasks. The vulnerability lies in the fact that these sensitive details are stored directly in the source code. This means that anyone who can access this file (for example, if it is pushed to a public Git repository) can gain access to these sensitive details. This could lead to unauthorized access to the database and potentially allow an attacker to perform actions with the same permissions as the `mydatabaseuser` user. Furthermore, because Git tracks the history of all changes, even if these details are later removed from the source code, they will still be accessible in the Git history. This could be exploited by an attacker who clones the repository and checks out a previous commit where the sensitive details were still present.
2. Steps
• Remove the sensitive information from the code. This includes the database credentials and the secret key.
• Use environment variables to store sensitive information. This way, the sensitive data is not hard-coded into the application.
• Use a .env file to store these variables and add this file to the .gitignore to prevent it from being tracked by Git.
• Use a package like python-decouple to read the environment variables in your settings.py file.
• Rewrite the Git history to remove traces of the sensitive data. You can use a tool like BFG Repo-Cleaner or the 'git filter-branch' command.
• After cleaning the Git history, force push the changes to the remote repository. Be aware that this will rewrite history for any collaborators as well.
• Rotate any credentials that were exposed. This includes the database password and the secret key.
• Finally, ensure that all team members are aware of the change and the new way of handling sensitive data.
3. Secure code example
# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'mydatabase',
        'USER': os.environ.get('DB_USER'),
        'HOST': 'localhost',...The sensitive information (database credentials and secret key) has been removed from the settings.py file. Instead, these values are now stored in environment variables, which are read from a .env file. This file is not tracked by Git, preventing the sensitive data from being exposed in the Git history. To access these environment variables, we use the os.environ.get() function. This function returns the value of the specified environment variable, or None if the variable is not set. After making these changes, you should rewrite the Git history to remove any traces of the sensitive data. You can use a tool like BFG Repo-Cleaner or the 'git filter-branch' command for this. After cleaning the Git history, force push the changes to the remote repository. Be aware that this will rewrite history for any collaborators as well. Finally, any credentials that were exposed should be rotated. This includes the database password and the secret key. All team members should be made aware of the change and the new way of handling sensitive data.