In this section, you will learn about all the tools available in CodeGPT Studio.

To get started, sign in at this link: https://app.codegpt.co/en

At CodeGPT Studio, you can:

Interact with expert agents in our Marketplace
Create your agents by adding context and a knowledge base.
Learn how conversations are built and evaluate the results.

Let's get started!

Expert Agents in the Marketplace

Don't forget that the number of marketplace agents you can use varies depending on your payment plan. For more information, please refer to the Pricing page.

Transcription

At "CodeGPT", you can interact with various expert agents. If you want to use an expert agent created by us, click on the "Explore" tab. There, you will find a variety of agents specialized in models, programming languages, frameworks, and various tools. 

The advantage of using an expert agent is that you can interact and get responses without having to create contexts or specify, for example, language or software versions each time you ask a question. Also, you don't need to worry about writing the context every time the agent forgets it. For instance, if you want to interact with an agent specialized in Python 3.10, just search for it on the panel and start asking questions directly. Rest assured that it will respond within the context of Python 3.10. 

You can also share this agent with others by clicking on the "Share" button and rate it with the "Rate me" button. Keep in mind that the message history of these agents is not saved to your account.

Create your own agent

You can create your own agent to interact with specific information, become an expert in information that only you handle, or adapt an expert in some technology to your own documentation.

Your agents are based on the LLM, and you can choose the one that best suits your needs. After reviewing the documentation, evaluate it.

Transcription

In this tutorial, we will learn how to create and customize your own agent in "CodeGPT Studio". 

- Go to the "Agents" tab on the platform. 
- Click on the button located in the upper right corner of the screen and select "New Agent". 
- Here, you can customize your agent, change its name, and start interacting directly with the default model. For example, if you want to write a function to check if a word is a palindrome, the default model will do it in Python. 

At "CodeGPT", you have the option to choose from different Large Language Models (LLMs). You can perform tests with the same questions to determine which model is most suitable for your needs. 

However, the choice of the model should be based on its characteristics and capabilities. To learn about the benefits and features of each model, we recommend reading the documentation provided by the models and their providers. 

For example, if we search for the documentation of "DeepSeek R1" online, we will find several important aspects to consider. Let's highlight three of them: 
- Number of Input Tokens or Context Window: This indicates how much information the model can receive. 
- Number of Output Tokens: This corresponds to the amount of information the model can generate. 
- Training Data Cutoff Date: This defines what specific information needs to be added and what information the model does not contain. 

With these steps, you can create and customize your own agent on "CodeGPT".

To customize your agents and include specific knowledge, you can set up three options:

Conversation style
Instructions or prompts
Knowledge base, which can include text files or code repositories.

Let's learn how!

Conversation Style

Transcription

The second aspect is adjusting the conversation style. To do this, we set how much the model can go off-context. If we leave it at 0 and ask a question, we will get an answer based on specific knowledge 100%. If we set it to 1 and ask the same question, the answer will be based on the model's context and will completely ignore the specific context. In this case, since we haven't added any files, both responses are the same. 

Let's add a specific context file to manage the content of the responses. Although most models are trained with information from the Internet, especially Wikipedia, it is possible to add a document that focuses solely on the information we want it to display and ignore the model's information, or at least take it as a secondary priority. 

Let's add the Wikipedia article about France and bias the response generation towards this document. In the next video, we will see how to add specific information and discuss the knowledge base in more detail. 

So, let's ask about the capital of France, Paris, and this would be the response that completely ignores the context. Now we set it to 1 and ask the same question. The answer will be different since it is framed within the information we have just given the model. 

By setting the conversation style, we allow the model to consider, to a certain extent, the specific information from the knowledge base versus the default training information that the model possesses.

Prompt - Instructions

Transcription

The third important aspect when defining an AI agent is creating the prompt. We have defined it in "CodeGPT" as instructions. If you don't have experience creating a prompt, click on the "Prompt Documentation" button and you will be able to access articles that can help you.

To build a good prompt, we will need six important elements: the role of the agent and its characteristics, the context, specific instructions, the expected response format, error handling, and some response examples. It is recommended to start the format of the prompt for each of these sections as XML.

In this way, the model will be able to understand the structure of the instructions defined by the content of each of these sections. We recommend the attached article in the video or in the description so that you can learn more details about prompt engineering.

You can define the context of the agent and its capabilities. For example, in the exercise we are doing about the "API" of "CodeGPT", in the context section we can include the main capabilities and functions, such as explaining the points of the "API", providing code examples in the integration of the "API", and detailing the most common errors that may occur when there are no correct answers when calling the API service.

You can also define within the context the level of communication or the communication characteristics. For example, speaking clearly and technically, appropriately referencing the documentation, defining best practices, and of course, defining the response format. Here we must specify how each time the user asks the model, the agent can respond in a specific way.

Then, the response format can define what the most common errors are and give alternatives to the model to respond appropriately. For example, in any question related to a topic that has nothing to do with the "CodeGPT API", the agent should omit that response and insist on starting the conversation about the CodeGPT API.

There are also security considerations, for example, to prevent the leakage of unnecessary data for users and to always respect the response limit regarding internal documentation terms, among other things.

Once we have defined the prompt, we can ask again and expect the response format to be different this time and follow the specifications we have provided.

In the next video, we will learn how to add specific knowledge from the "AI Knowledge" tab.

Knowledge Base

The knowledge base is information that the LLM uses to construct responses. RAG (Retrieval Augmented Generation) is an innovative technique that combines approaches from information retrieval and data generation in natural language processing.

RAG plays a crucial role in the use of language models as it makes them more accurate regarding documented facts. This is achieved by allowing the model to have more up-to-date information, fed by the users. Essentially, it improves the accuracy and relevance of the responses generated by the language models.

The Potential of RAG in CodeGPT

At CodeGPT, from the start, we have recognized the potential of RAG to develop specialized agents in a straightforward manner. So, let's add files and code as a knowledge base and get started!

Files

Transcription

The fourth aspect is to add specific knowledge; we can include two types of information: files or code repositories. In this section, we will add files by clicking on "Add New File". We can add files in PDF, DOCs, or web page format, or also files of different types such as TXT and CSV, respecting a maximum and minimum file size.

You can also create your own documents or content from "Create New File". We can add a file from the server by selecting the "Upload File" option. Once the document is uploaded, click on "Start Training", and we will have this document added to our agent. Now, let's add the content of a web page. We go back to "Add File" and before that, we delete the example article about France that we used earlier to leave only the API documentation.
We add the content of the web page, in this case, the web page of the API documentation. We enter the URL and click on "Import". The "Import" button performs web scraping of the current page and obtains the content of that web page. As with documents, we can add additional information by clicking on the "Show Advanced Settings" button. We add the document metadata from the "Generate" button, which generates it automatically.

In the "Chunks" tab, we can specify the text segmentation strategy, which reduces the context window when performing a semantic search, where the model compares the user's query with the content of the documents in the base. You can decrease or increase the number of tokens or words within the chunk, and also adjust the "Chunk Overlap", which are the words that provide continuity between each of the chunks. For example, the last 10 words of a segment could be the first 10 of the next text segment. We apply the changes and click on "Start Training".

From now on, the agent will have in its knowledge base those documents related to the CodeGPT API. The number of documents will be displayed in the "Knowledge" tab with a small number.

Now, let's ask again about how to create an agent in the CodeGPT API, and this time the response will be framed within our endpoint. As we can see, the information is specific and refers to the correct CodeGPT endpoint.

In the next video, we will see how to add more data.

Keep in mind that very long paragraphs may exceed the maximum token limit and cause a typical error when trying to use RAG by exceeding the context window. That's why it's advisable to analyze the text and check the size of the paragraphs and the separators.

Code Repositories (represented as graphs)

Transcription

In CodeGPT, you can add code to the knowledge base. To do this, click on "Add Knowledge Base" and select the "Graphs" tab. In this section, you can add a code repository as a knowledge base. Remember that this is only available for projects written in certain programming languages, so it is important to review the documentation.

It is necessary to add a connection beforehand. Click on "Add Connection". Currently, it is available on platforms such as GitHub, GitLab, Bitbucket, AWS, and Azure. We add the connection, and once the necessary permissions are granted, we proceed to create the graphs.

We wait for confirmation of a successful connection and click on "Create New Graph". Then, we select the repository provider configured previously, in this case, GitHub. We enter the user, the name of the repository, and the branch. If you can't find the repository name on the list, you can enter it manually. Similarly, enter the branch and click on "Create Graph". We wait for confirmation, and now we have the graph. The status of the graph can change from "pending" to "completed" depending on its size.

Let's visualize the graph by clicking on "View Graph". We wait a moment, and we have the graph of the code repository available. You can navigate through the nodes. Each node represents a method, function, instance, or file within the repository. The layout of the graphs corresponds to the files and sub-files that represent a dependency. You can navigate between them and make queries to each node. The size of the nodes is irrelevant and depends on the font size of the package or file name.

Now we can add the graph to an agent. Click on "Link to Agent" and select the agent to which you want to add this graph. Then, go back to the agent, and you can start asking questions about the code knowledge base you just added. In this example, we have added the repository of the CodeGPT API documentation, and we will ask about some functions. If we want to know how to add documents to an agent, we will now get the answer within the context of CodeGPT. Now let's ask about the SDK, and we will get the response, which also indicates the location of this file within the repository as a reference source.

That's how you can interact with your repository through an AI agent.

Remember to check the size and programming language of your repository to avoid any issues when uploading the knowledge base.

Temporary Documents

Transcription

You can enter specific context-related knowledge on a temporary basis. For example, let's analyze a temporary file by clicking on the "Temporary Context" button on the left of the "Send" button. Then, we make a query about this documentation. 

For instance, we ask how to report a bug in CodeGPT. That information is specific, and the model can recognize it as context because we have added a temporary file. Remember that once the chat is closed, the temporary file will no longer exist in the agent's memory.

Chat History

Transcription

To access the conversation history of the agents, click on "Chat History". This is a way to understand how the responses were constructed based on the queries and the knowledge base content. In this menu, you will find four tabs: "Messages", where all the messages are located; "Message History", which allows you to filter by specific dates; and the option to navigate to the left to retrieve messages and bring them back to the chat. 

The "Debugging" tab shows the elements of the context or knowledge base that have been used to generate the responses. In the "Embedding" tab, you can see the vectorized message and the semantic similarity between the message and the knowledge base. 

Messages with the symbol or label "Rag" indicate that they have been generated from a knowledge base. This information is useful for evaluating your agents.

Terms

Chunks: text segments within a document, not necessarily equivalent to a paragraph. A set of tokens.
Semantic Similarity: To explain this, semantic similarity is a measure that identifies how similar two tokens (semantic units such as words or terms) or texts are in meaning. It's not about the words being identical but rather conveying similar ideas. For example, "almost" and "home" are thematically similar because they both refer to a place where a person lives.

✨ Code Graphs

🤖 AI Agents

How are interactions and credits counted for agents in CodeGPT?

1️⃣Quick Access Guide

What's new in CodeGPT? July

2️⃣CodeGPT Studio

Expert Agents in the Marketplace

Create your own agent

Conversation Style

Prompt - Instructions

Knowledge Base

Files

Code Repositories (represented as graphs)

Temporary Documents

Chat History

Terms