Data Pipeline
  • Data Pipeline
    • About Data Pipeline
    • Design Philosophy
    • Low Code Visual Authoring
    • Real-time and Batch Orchestration
    • Event based Process Orchestration
    • ML and Data Ops
    • Distributed Compute
    • Fault Tolerant and Auto-recovery
    • Extensibility via Custom Scripting
  • Getting Started
    • Homepage
      • Create
        • Creating a New Pipeline
          • Adding Components to Canvas
          • Connecting Components
            • Events [Kafka and Data Sync]
          • Memory and CPU Allocations
        • Creating a New Job
          • Job Editor Page
          • Spark Job
            • Readers
              • HDFS Reader
              • MongoDB Reader
              • DB Reader
              • S3 Reader
              • Azure Blob Reader
              • ES Reader
              • Sandbox Reader
              • Athena Query Executer
            • Writers
              • HDFS Writer
              • Azure Writer
              • DB Writer
              • ES Writer
              • S3 Writer
              • Sandbox Writer
              • Mongodb Writer
              • Kafka Producer
            • Transformations
          • PySpark Job
          • Python Job
          • Python Job(On demand)
          • Script Executer Job
          • Job Alerts
        • Register as Job
        • Exporting a Script From Data Science Lab
        • Utility
        • Git Sync
      • Overview
        • Jobs
        • Pipeline
      • List Jobs
      • List Pipelines
      • Scheduler
      • Data Channel & Cluster Events
      • Trash
      • Settings
    • Pipeline Workflow Editor
      • Pipeline Toolbar
        • Pipeline Overview
        • Pipeline Testing
        • Search Component in Pipelines
        • Push & Pull Pipeline
        • Pull Pipeline
        • Full Screen
        • Log Panel
        • Event Panel
        • Activate/Deactivate Pipeline
        • Update Pipeline
        • Failure Analysis
        • Delete Pipeline
        • Pipeline Component Configuration
        • Pipeline Failure Alert History
        • Format Flowchart
        • Zoom In/Zoom Out
        • Update Component Version
      • Component Panel
      • Right-side Panel
    • Testing Suite
    • Activating Pipeline
    • Pipeline Monitoring
    • Job Monitoring
  • Components
    • Adding Components to Workflow
    • Component Architecture
    • Component Base Configuration
    • Resource Configuration
    • Intelligent Scaling
    • Connection Validation
    • Readers
      • GCS Reader
      • S3 Reader
      • HDFS Reader
      • DB Reader
      • ES Reader
      • SFTP Stream Reader
      • SFTP Reader
      • Mongo DB Reader
        • MongoDB Reader Lite (PyMongo Reader)
        • MongoDB Reader
      • Azure Blob Reader
      • Azure Metadata Reader
      • ClickHouse Reader (Docker)
      • Sandbox Reader
      • Azure Blob Reader (Docker)
      • Athena Query Executer
    • Writers
      • S3 Writer
      • DB Writer
      • HDFS Writer
      • ES Writer
      • Video Writer
      • Azure Writer
      • ClickHouse Writer (Docker)
      • Sandbox Writer
      • MongoDB Writers
        • MongoDB Writer
        • MongoDB Writer Lite (PyMongo Writer)
    • Machine Learning
      • DSLab Runner
      • AutoML Runner
    • Consumers
      • GCS Monitor
      • Sqoop Executer
      • OPC UA
      • SFTP Monitor
      • MQTT Consumer
      • Video Stream Consumer
      • Eventhub Subscriber
      • Twitter Scrapper
      • Mongo ChangeStream
      • Rabbit MQ Consumer
      • AWS SNS Monitor
      • Kafka Consumer
      • API Ingestion and Webhook Listener
    • Producers
      • WebSocket Producer
      • Eventhub Publisher
      • EventGrid Producer
      • RabbitMQ Producer
      • Kafka Producer
      • Synthetic Data Generator
    • Transformations
      • SQL Component
      • File Splitter
      • Rule Splitter
      • Stored Producer Runner
      • Flatten JSON
      • Pandas Query Component
      • Enrichment Component
      • Mongo Aggregation
      • Data Loss Protection
      • Data Preparation (Docker)
      • Rest Api Component
      • Schema Validator
    • Scripting
      • Script Runner
      • Python Script
        • Keeping Different Versions of the Python Script in VCS
    • Scheduler
    • Alerts
      • Alerts
      • Email Component
    • Job Trigger
  • Custom Components
  • Advance Configuration & Monitoring
    • Configuration
      • Default Component Configuration
      • Logger
    • Data Channel
    • Cluster Events
    • System Component Status
  • Version Control
  • Use Cases
Powered by GitBook
On this page
  • Configuring the Git Sync inside a DSL Project
  • Sample Git commands
  1. Getting Started
  2. Homepage
  3. Create

Git Sync

PreviousUtilityNextOverview

Last updated 11 months ago

Git Sync feature allows users to import files directly from their Git repository into the DS Lab project to be used in the subsequent processes within pipelines and jobs. For using this feature, the user needs to configure their repository in their DS Lab project.

Prerequisites:

  • To configure GitLab/GitHub credentials, follow these steps in the Admin Settings:

    • Navigate to Admin >> Configurations >> Version Control.

    • From the first drop-down menu, select the Version.

    • Choose 'DsLabs' as the module from the drop-down.

    • Select either GitHub or GitLab based on the requirement for Git type.

    • Enter the host for the selected Git type.

    • Provide the token key associated with the Git account.

    • Select a Git project.

    • Choose the branch where the files are located.

    • After providing all the details correctly, click on 'Test,' and if the authentication is successful, an appropriate message will appear. Subsequently, click on the 'Save' option.

  • To complete the configuration, navigate to My Account >> Configuration. Enter the Git Token and Git Username, then save the changes.

Configuring the Git Sync inside a DSL Project

Please follow the below-given steps to configure the Git Sync in the DS Lab project:

  • Navigate to the DS Lab module.

  • Click the Create Project to initiate a new project.

  • Enter all the required fields for the project.

  • Select the Git Repository and Git Branch.

  • Enable the option Sync git repo at project creation to gain access of all the files in the selected repository.

  • Click the Save option to create the project.

  • After creating the project, expand the Repo option in the Notebook tab to view all files present in the repository.

  • The Git Console option, accessible by clicking at the bottom of the page, allows the user to run Git commands directly. This feature enables the user to execute any Git commands as per their specific requirements.

  • After completing all the process, the users can export their scripts to the Data Pipeline module and register it as a job according to their specific requirements.

  • For instructions on exporting script to the pipeline and registering it as a job, please refer to the link provided below:

Sample Git commands

git init: #Initializes a new Git repository in the current directory.
git clone <repository-url>: #Clones a remote repository into a new local directory.
git add <file>: #Stages changes for the next commit.
git commit -m "Commit message": #Records staged changes in the repository with a descriptive message.
git status: #Displays the status of changes as untracked, modified, or staged.
git log: #Shows a commit history with commit IDs, authors, dates, and messages.
git branch: #Lists all branches in the repository.
git checkout <branch-name>: #Switches to the specified branch.
git merge <branch-name>: #Combines changes from the specified branch into the current branch.
git pull: #Fetches changes from a remote repository and merges them into the current branch.
git push: #Pushes local changes to a remote repository.
git remote -v: #Lists remote repositories linked to the local repository.
git fetch: #Retrieves changes from a remote repository without merging them.
git diff: #Shows the differences between working directory and last commit.
git rm <file>: #Removes a file from the working directory and stages the removal.
git stash: #temporarily store changes that are not ready to be committed yet.

Please Note: Files with the .ipynb extension can be exported for use in pipelines and jobs, while those with the .py extension can only be utilized as utility files. Additional information on utility files can be found on the page.

Exporting a Script from DS Lab
Register as Job
Utility
Setting GitLab/GitHub credentials under My Account
Setting GitLab/GitHub credentials under My Account>>Configuration
Creating a Git Sync project
Accessing all the files in the repository
Git Console