Introduction
Microsoft Power BI is originally a data visualization tool. It cannot function without a data source and therefore it is crucial to think carefully about the type of data that will be fed to Microsoft Power BI.
Microsoft Power BI can handle many types of data sources. This can vary from an Excel file to data on an SQL server to a WEB API. A data source comes with benefits and risks. For example, the Excel data source is a very flexible data source. It’s very easy to make changes. But flexibility also comes with a risk. The risk of someone entering the wrong data or making a mistake. An SQL server is less flexible. It will take longer and more effort to make changes to this data source. This does reduce the risk of errors, because of the effort and controls
To make a good choice for a data source, it is important to have insight into the possibilities that Power BI offers for this. Power BI has over 45 connectors, ranging from folder structures to relational databases, to web resources. The reason there are so many is because a company or user can build his personal connector and have it validated by Microsoft. It will then become an official connector. Connectors that have not yet been validated can also be used, but because they are still unofficial, this is
not recommended.
It’s not possible to discuss each connector separately in this article. This would simply make this article too long. The common thread for connectors is not the connector itself, but the category to which the data source belongs. The question can be asked; Where is the data stored?
The following structure can be used for each data source category.
- Take stock of the Power BI maturity profile of the organization.
- Make an inventory of the size of the data (including future perspective).
- Make an inventory of possibilities such as connecting a gateway or the presence of a SharePoint environment.
- Relationship to Power BI.
Folder structure
In a folder structure, the data is stored by means of various folders containing, for example, excel, csv or Json files. This can be stored locally on a hard drive or in a cloud source such as SharePoint or OneDrive.
The advantage of a folder structure is the flexibility it comes with. The end user can directly influence the data source. Modifying or adding files and/or data in a folder (file) is therefore often
faster than, for example, a relational database.
The disadvantage of this is that the checks and balances are less. An adjustment is easily made, but whether it is correct is another story. In addition, Power BI works in the background via a step-by-step plan. It follows a structured step-by-step plan and repeats this step-by-step plan with each renewal. For example, if a column name changes, this does not correspond to the original step-by-step plan and the renewal results in an error message.
Maturity profile (low)
A folder structure is ideal for organizations with a low maturity profile. With a low maturity profile, there will be a lot of trial and error. The flexibility is high, and the end user has a lot of influence on the data quickly. As a result, the associated risk profile is high.
Large of data (small amount)
A folder structure is not suitable for large amounts of data. By working with a folder structure with different files, the structure can quickly become cluttered. In addition, the flexibility of this structure creates an increased risk. A large amount of data will only increase this risk.
Gateway (local only)
A folder structure can take place locally on a hard disk, or in a cloud source such as SharePoint or one-drive. It is advised to go for a cloud source if possible. In addition to the fact that this often provides an automatic backup, a cloud source also has an automatically built-in gateway. This is not the case with a local folder structure and still must be set up and managed yourself.
Relation to Power BI
Loading (import only) of a folder structure in Power BI will mainly be done via the connector MAP or SharePoint Folder (contents). Power BI then connects to this folder and combines all the files that are in the folder. Power BI will do this with the help of a helper query. This is an example file that Power uses to combine the other files. A helper query consists of four queries. Each folder that is loaded will come with four queries. This can quickly cause confusion. It is recommended to structure the Power Query properly and keep it clean, for example by creating a general helper query folder.
Relational database
In a relational database, the data is stored in different tables with columns that are connected by means of relationships. Power BI can also be compared to a relational database, but in terms of Power BI data sources, we are talking about SQL server, Oracle database, Snowflake, etc.
The advantage of a relational database is that it can often handle large amounts of data. This large amount of data is transformed into the relational database, making it usable as Power BI input. The checks and balances are also present to a greater extent. After all, the data is already transformed and checked in the relational database.
The disadvantage of a relational database is the degree of flexibility. In contrast to a folder structure, a change in the data will not be immediately visible. After all, a relational database will also have to be refreshed before the data is visible. In addition, the creation and management of a relational database is often a specialism, the administrator of the data will not be the same
person as the administrator of the relational database.
Maturity profile (medium/high)
A relational database such as a SQL server or Oracle database requires a higher maturity profile than a folder structure. In this case, the data will already be structured and cleaned. The degree of flexibility is lower, which means that trial and error will also take place less. Often the organization starts with a folder structure and if this becomes too unclear, they switch to a relational database.
Large of data (large amount)
A relational database can handle a large amount of data. This is ideal for companies with large amounts of data, both in the present and in the future.
Gateway (local only)
A relational database can take place locally or as a cloud source. A local database requires a gateway* as a connection between the database and the Power BI service.
*See the definition of a gateway
Relationship to Power BI
In relation to Power BI, making a connection with a relational database can be easier. Often, relational databases already have standard Power BI connectors to connect. The data has already been modified in the database, so the data can often be used one-on-one. It is important to mention that a relational database is often not managed by the end user or the Power BI developer, but by a data scientist. This is another party or specialist that must be considered in the process.
Import vs direct query
A relational database can be loaded by import. The data is imported and stored as a snapshot in the Power BI model. With direct query, the data is queried directly from the source. Because a relational database is separate from Power BI and the queries can’t always work efficiently for Power BI, direct query is often slow. It is therefore recommended to use import. People have more control over this and therefore often have a better performance.
Power Query vs. Native Query
With a relational database as a data source, most data transformation will happen in the data source itself. However, it is possible that several changes still need to be made to get the Power BI working. For example: Combining different views into a table.
Combining views into a table can take place in Power Query (append or merge) where the different views are imported and combined, but this can also take place as a native query. In the case of a native query, the data will be loaded with a custom SQL query. Power BI then loads the data directly from the source. Because the data comes directly from the source, Power BI doesn’t need to take any extra steps. This can have a major positive impact on performance.
ERP package
An ERP package is a form of a relational database. In this case, the customer purchases an ERP package from an ERP supplier. In this ERP package, the customer fills in the data and the data will be often extracted via this ERP package. The customer often has no insight into the underlying structure of the package and therefore has very limited influence on structural changes.
Maturity profile (Low / medium / high )
If the customer uses an ERP package, without the intervention of an additional relational database, it is important to first investigate the connection for Power BI. Is it possible to make a connection, how will the data come out of this, what are the associated risks. Depending on the outcome, it can be decided to load the data directly into Power BI. If this proves difficult, for example in the case of SAP with a cube structure, you can choose to set up a folder structure with data extraction or a relational database as an intermediate station between the ERP package and Power BI.
The maturity profile of the organization is leading here. If an organization is just starting out with Power BI, setting up a folder structure by means of data extractions is a better idea than, for example, having a custom connector built. The question that often needs to be answered: What are the customer’s needs?
Large of data (large amount)
ERP packages are often large relational databases. These can handle a large amount of data. It differs per ERP package how the data can be extracted and how much data can be extracted.
Gateway (local only)
An ERP package can take place locally or as a cloud source. A local database requires a gateway as a connection between the database and the Power BI service. A cloud source often has a built-in connection, but it also differs per ERP package. It is advisable to investigate this prior to the trial.
Relationship to Power BI
Well-known ERP packages, such as Salesforce, have their own Power BI connector. This makes it easy to extract data from the ERP package and load it into Power BI. Packages without a standard connector often work with an API for data extraction. Some ERP packages such as SAP work in the background with a cube structure to organize their data. This does not match well with Power BI principles and therefore it can sometimes be decided to set up a folder structure with data extracts. If that becomes too large, the next step is to place a relational database such as a SQL server between SAP and Power BI.
Web resources
Web resources are cloud-based relational databases. In some cases, the customer uses a cloud-based source. As with an ERP package, this is often not done in-house.
Maturity profile (low / medium / high)
If the customer uses a web resource, without the intervention of an additional relational database, it is important to first investigate the connection with Power BI. Is it possible to make a connection, how will the data come out of this, what are the associated risks. Depending on the outcome, it can be decided to load the data directly into Power BI. If this proves difficult, for example in the case of SAP with a cube structure, you can choose to set up a folder structure with data extraction or a relational database as an intermediate station between the ERP package and Power BI.
Size of data (low amount)
Cloud-based sources often use an API for data extraction. An API is known for having a limited capacity for data extraction. When using a web source, it is therefore very important to examine volume of data before starting the process.
Gateway (local only)
A web resource does not require a gateway. Using the API will retrieve the data. However, it is important to write the API in such a way that the Power BI service can recognize this. This can be done by means of a relative path. Keep in mind that Power BI doesn’t accept dynamic data sources. A dynamic data source is a data source that uses one or more variables in the query instead of hardcoded values.
In relation to Power BI
For a web resource, the Power BI developer needs knowledge of APIs. The Power BI developer must consider the fact that the data is managed by an external party and therefore has less insight into the input. In addition, an API only has a limited possibility of data extraction.

