GitHub & Open Source Presence

GitHub Best Practices for Job Seekers

5 min read

Commit Activity Strategy

Green Squares Matter Recruiters look at contribution graphs.

Optimal Cadence:

  • 3-5 commits per week (shows consistency)
  • Avoid long gaps (> 2 weeks)
  • Quality > quantity (don't spam empty commits)

What to Commit:

Good:
✓ Project improvements
✓ New features
✓ Bug fixes
✓ Documentation updates
✓ Open source contributions

Bad:
✗ Daily "updated README" commits
✗ Empty commits just for green squares
✗ Committing homework assignments

Repository Organization

Use Descriptive Names

Good:
✓ customer-churn-prediction-api
✓ llm-financial-qa-chatbot
✓ image-classification-cnn

Bad:
✗ project1
✗ ml_stuff
✗ final_version_2_actual_final

Archive Old/Unfinished Projects

  • Don't delete (shows growth)
  • Archive to hide from main profile
  • Keep only your best 10-15 repos visible

README Best Practices

Every ML Project Must Have:

  1. One-liner Description

    Customer churn prediction API using XGBoost, deployed on AWS Lambda
    
  2. Demo/Screenshots

    • GIF of live demo (< 5MB)
    • Or screenshot of UI
    • Or code snippet showing usage
  3. Quick Start

    pip install -r requirements.txt
    python train.py
    python app.py
    
  4. Tech Stack

    **ML:** scikit-learn, XGBoost
    **API:** FastAPI
    **Deployment:** Docker, AWS Lambda
    **Monitoring:** MLflow
    
  5. Key Results

    - Achieved 94% accuracy (baseline: 78%)
    - Reduced prediction latency to 50ms
    - Handles 1000 requests/second
    

Code Quality Signals

Recruiters Look For:

Type hints (shows modern Python knowledge)

def predict(features: np.ndarray) -> List[float]:
    return model.predict(features).tolist()

Docstrings (shows documentation skills)

def train_model(X: pd.DataFrame, y: pd.Series) -> RandomForestClassifier:
    """
    Train random forest classifier.

    Args:
        X: Feature matrix
        y: Target labels

    Returns:
        Trained classifier
    """

Tests (shows production mindset)

tests/
  test_model.py
  test_preprocessing.py
  test_api.py

Config files

  • requirements.txt or pyproject.toml
  • .gitignore (don't commit .env, .DS_Store)
  • Dockerfile
  • CI/CD (.github/workflows/)

What NOT to Commit

Security:

# Add to .gitignore
.env
*.pem
*.key
credentials.json
api_keys.txt

Large Files:

# Don't commit:
*.csv (> 10MB)
*.h5 (model weights)
*.pkl (trained models)

# Use instead:
- Git LFS
- HuggingFace Model Hub
- Google Drive links in README

Generated Files:

__pycache__/
*.pyc
.ipynb_checkpoints/
.pytest_cache/

Profile Polish

Bio Section:

Good: ML Engineer | PyTorch, LLMs, MLOps | Building AI products
Bad: Coding enthusiast | Learning ML | Check out my repos!

Location & Contact:

  • Add city (shows you're local or willing to relocate)
  • Add email (makes it easy for recruiters)
  • Add LinkedIn and portfolio website links

GitHub Organizations:

  • Create org for larger projects
  • Shows team collaboration experience
  • Example: "MyStartup-AI" org with multiple repos

Consistency Checklist

Before job applications, audit your GitHub:

  • All pinned projects have detailed READMEs
  • No .env files committed
  • Contribution graph shows activity (not blank)
  • Profile picture and bio are professional
  • Top 3 projects are deployed with live demos
  • Code follows PEP 8 style guide
  • All projects have licenses (MIT/Apache 2.0)

:::

Quiz

Module 4: GitHub & Open Source Presence

Take Quiz