- Q. What other resources of this sort can I use?
- A. STRING, PubGene, IntNetDB.
- Q. Why should I use FunCoup?
- A. FunCoup derives novel functional links from mostly raw high-throughput data or large scale database annotations (UniProt, eSLDB, IntAct etc.). FunCoup estimates each piece of information by relevance and reliability.
Moreover, FunCoup employs carefully tested algorithms of across-species data transfer via orthologs and is Eukaryota-wide -- networks for multiple organisms are available and comparable.
Hence, you are not supposed to get what has been known before - and is widely available at GO, KEGG, BioCarta, Swiss-Prot, iPath and other public resources of biological knowledge. Neither it is a set of protein profiles (GEO, SMD) or protein-protein links (HPRD, IntAct, BIND) collected from various sources and delivered to the user "as is". Also, FunCoup tries to differentiate between the types of functional coupling (physical interaction, metabolic, signaling etc.), and respective links can be seen individually or together. Although, such differentiation is probabilistic and usually stems from the same evidence.
Hence, the brief answer is: you interact with FunCoup to create new biological knowledge.
- Q. How many links are there for each organism?
- A. Usually (in not specified otherwise) it is proteome-wide. We process all possible links for a given proteome, and present ones with at least weak likelihood of functional coupling. For example in the human, all 181,918,275 potentiallinks between 19,075 ENSEMBL genes have been evaluated, and ~1,800,000 were retained for having a Final Bayesian Score (FBS) >3.00. For each pair, all available evidence from 7 eukaryotes (22 sets of physical interaction data, 19 sets of mRNA expressuion profiles, 6 sub-cellular localization tables etc.) was employed. Now, users can select a stricter threshold to see fewer links of higher confidence. Also, a set of query genes can be used to specify the wanted sub-network. One can also exclude/include particular evidence and thus obtain e.g. a pure network of protein-protein interactions, or a mammalian network etc. See help page and "Getting started" for more details.
- Q. How evaluation/prediction of individual gene-gene links is done?
Each kind of potential evidence (e.g. Pearson correlation coefficient on mRNA expression from Mouse Tissue Atlas in interval r=[0.75...0.90]) has been pre-evaluated with a set of known examples of functional coupling. The occurrence of this high co-expression in the set is compared to that in general, i.e. among all links. A ratio higher than 1 (if significant!) is accepted as positive evidence. The opposite -- more frequent in general than among the functionally coupled -- becomes negative evidence and is employed as well. These ratios (log-transormed) are stored as likelihood estimates of functional coupling given data from each evidence source in each interval.
The actual evidence (mRNA expression profiles for both genes) is retrieved and the respective metric is calculated (e.g. r=0.81).
The stored likelihood estimate matching this value is assigned to the evaluated link.
The procedure is repeated for as many data sources as available for these two proteins. The likelihoods are summed up into the Final Bayesian score (FBS).
Orthologs in other available species are found, and the procedure is applied to their data as well. The FBS value is being incremented.
If an FBS above a certain threshold has been accumulated, the pair of proteins/genes is declared likely functionally coupled and stored in our database.
Additionally, we convert the likelihood sum FBS into a probabilistic confidence score with a function plotted here. The confidence score should -- ideally -- convey the probability pfc=[0...1] of the link to be true. However, it is biased for a number of practical reasons, and we do not recommend any quantitative conclusions based on either pfc or FBS. The only true fact is that the higher score corresponds to higher likelihood of functional coupling -- given the data.
- Q. How can I narrow down the output to a network environment I'm interested in?
- A. The best way is to ask about the gene of your interest and a gene/group of genes that share some functional properties. For example, querying about RAS3A protein and KEGG05020 pathway members may tell you about the RAS3A role in Alzheimer's disease. You can also use OMIM identifiers as queries: all the reported disease genes genes will be shown on the same screen. Note that you can use such additional IDs both in the yellow box (main query) and the magenta box (query context). See "Getting started" for more details.
- Q. Why evidence components (protein interactions, mRNA co-expression etc.) won't sum up to 100% in some cases?
- A. First, it may be a rounding error. Second, some components are negative and cannot be shown as graph lines.